It has been a while.. Something you might have read before on a personal tech blog. My devops professional life at this moment can be summarized as follows: get rid of virtual machines and old platforms and replace them with kubernetes.

Platform design

Anno 2024 I prefer to build self contained small platforms with internal (hyperconverged) storage versus platforms that have inter dependencies and some traditional central storage solution.

The benefit should be clear: when shit really hits the fan (f.e. fatal storage crash), I expect (hope) to loose a smaller set of data and spend a few weeks less on data recovery from backups.

Nowadays there are plenty of software storage solutions available and building high available platforms never has been easier thanks to Kubernetes.

So no surprise: when possible I rely on Kubernetes on bare metal. We have plenty of older servers laying around that are still performant when they get stuffed with a few of those fancy NVMe SSDs. All you need are a few free (4 lane) PCI-E slots, cheap adapter cards, and the slightly more expensive disks (M.2 or U.2) of course.

kVirt: Virtualization - Kubevirt - k3s

Who needs VMs nowadays?

A lot of services such as gitlab, nexus, container registry, … can run just as well in containers so our current office oVirt platform with SAN storage is on the brink of getting obsolete, finally.

There are many other reasons why VMs can get scrapped. Infra is just getting simpler and needs less moving parts (or rather it shifts to k8s complexity). For example, we used to host quite some high available reverse proxy servers, mainly for TLS offloading using FreeIPA certificates. Recent FreeIPA versions support the ACME protocol so kubernetes cert-manager can just happily hand out company certificates at will. Bye bye fleet of nginx servers.

Some remaining core services such as FreeIPA run more comfortably in a VM. Furthermore we have a lot of secondary VMs running some service the company needs and are hard to containerize or run some exotic operating system such as Windows.

So we still need them! And the ansible glue (albeit this code base is shrinking) to manage them, unfortunately.

Kubevirt as solution

We established virtualization is still a requirement. Kubevirt is the new kid on the block and it is a bunch of Kubernetes operators, so me like.

OKD can bear most load here but there are some services that OKD depends on to boot: Nexus, container registry, FreeIPA, … so we need a separate platform with virtualization capability.

There are a lot of interesting projects in the Suse/Rancher space and Harvester is their kubevirt solution. But since it is still somewhat early day and you don’t learn much with just running someone elses k8s yaml, I decided to do this myself.

Building blocks are k3s, multus networking CNI and Longhorn. So far it is working out fine. I tend to use this in production soon. Kubevirt lacks some features you might grown used to using your favorite virtualization platform, such as hot adding CPU and memory but that is just a matter of time.

Somewhat critical features such as live migration, snapshotting are already there and work perfectly. Tested both on Ceph and Longhorn. Also backup solutions are easier to achieve than for example oVirt, at least to me.

kStack: Openstack helm

https://wiki.openstack.org/wiki/Openstack-helm

This is a great platform, essentially a collection of cleverly built helm charts, to build your modern Openstack platform. It needs some tinkering with values and scripts but if you happen to touch k8s and helm charts, you will manage just fine.

We like Openstack cluster mainly for CI/CD testing of freshly built Linux images or when a colleague needs a test environment. We can replace this partly by kubevirt but all the tooling against the Openstack api is not rewritten in a day. Openstack is also still superior for the automatic testing use case in my opinion. The integration with Ceph storage is seamless (snapshots, backups, …) for example.

Storage

In short: a lot of Ceph (rook operated) and first experience with Longhorn.

OKD

What about the OKD platforms you might wonder… well, they are still around of course. I consider them the core platforms of our infrastructure. They finally get a bit more boring… in the good sense: finally stable.

Since version 4.12 or so the annoying Crio bugs seem to be resolved. We had some instability but that was due to some hardware and software issues, or OKD configuration. OKD is not really to blame here.

For example, I learned the hard way Ceph needs hard limits for its OSD services otherwise it just chunks through all your server memory until the OOM reaper wreaks havoc.

The biggest risk here remains the very fresh kernels Fedora CoreOS ships with their releases. One of these days I should move to the SCOS variant where the kernel versions are a lot more conservative. Another risk is where the project will go to in the future. At the moment there seems to be a standstill due to some internal transitions, temporarily I hope. It is open source and we are of course grateful for the great work. I try to contribute a little bit in the issues and Slack but I should do more.

Our platforms will get some fresh hardware updates: I prefer to run masters on bare metal so Openshift does not depend on some external virtualization platform (and storage). Thanks to metall-lb and something like kube-vip the platforms don’t require some external load balancer(s) to achieve high availability on control and data planes.

So again, the OKD platforms are self contained. They still do rely on some core services, obviously, such as FreeIPA, Nexus and a container registry.