It is a crazy world we live in. Yesterday we use to test Kubernetes on our trusty virtualization platform. Today we are very busy containerizing workloads, getting rid of virtualization and old infrastructure in the process.

Tomorrow we control all infrastructure using the k8s api? I surely hope so: we like to push some change to the infra repository and lean back while the k8s reconciliation loop is configuring infrastructure accordingly. I choose this over traditional config management any day!

KubeVirt and OKD Virtualization allow you to host the occasional virtual machine you can’t get rid of.

If you have no idea what KubeVirt is, please read about my introduction with the product: https://maarten.gent/posts/kubernetes/running-virtual-machines-in-kubernetes/

Check this image out and try to wrap your head around what is going on:

picture 1

In case you are not so familiar with VMWare and/or Openshift, this is the ESXi (core VMWare virtualization) interface. If you still don’t get it: the image suggests we can create VMs, in a VM, in an Openshift pod.

How about running a k8s cluster in our brand new virtualization platform? This is a joke but it is mind boggling, right? Real Inception-style stuff going on here.

To be clear: we really do have a useful use case: consolidating some ancient machine that is just there to occasionally build or test something against ESXi.

Turns out this works perfectly. No trickery was involved creating the above image. Let’s see how we got there in this post!

It makes sense

Nested virtualization is probably around since virtualization is a thing. I have always known it as a somewhat hidden feature of any virtualization technology.

I can think of some use cases:

  • You want to use KubeVirt on the public cloud or your virtualized k8s cluster.
  • development VM and you want to run the popular Vagrant/VirtualBox combo.
  • Cost effective way of testing virtualization infra (my use case)
  • Nerd out Inception style. Because it is possible and educational.

Since a container is basically a process with some special kernel features and a pod is a collection of containers, there is no reason why we could not do VMs. Then nested virtualization on Openshift makes sense too.

How?

Your host needs to be configured to support nested virtualization. You need to configure your VM correctly and there are some limitations with virtualized hardware support in ESXi.

Host config

Nested virtualization must be enabled on the (bare metal) host system for it to work. Turns out OKD does this out of the box:

cat /sys/module/kvm_intel/parameters/nested
Y

So nothing to do here, assuming Red Hat does not change this behavior. Nested Virtualization is not officially supported by Red Hat.

KubeVirt

CPU passthrough

I was stuck for quite some time: nested virtualization was enabled, ESXi was installed, but VMs crashed with a kernel panic right away in the boot process.

Turns out you need to enable CPU ‘passthrough’ mode: https://kubevirt.io/user-guide/virtual_machines/virtual_hardware/#cpu-model-special-cases

Similar with what you have to do in OVirt. When this hurdle is taken, nothing stops you from starting VMs in your VM.

Prepare VM

I am not experienced using KubeVirt so my procedure to install ESXi might be a little unorthodox but it works and is easy. I used the user friendly OKD web interface wizard.

Start from a Linux template, I chose ‘CentOS Stream 9’.

Boot source

I uploaded the ESXi ISO file to an internal web server and in the ‘Boot source’ step I filled out the ISO URL. Don’t forget to check the box ‘This is a CD-ROM boot source’. The size can be small, 1G for example.

You can download an ISO from VMWare and use it for evaluation for free, after registration.

The next step can be skipped mostly, just fill out a sensible name. It is important you customize first, so make sure you don’t start it right away.

Be sure to select sufficient CPU and RAM resources. If the existing flavours are too limited, it is also possible to update the VM yaml manually.

Network

Here I assume OKD is configured to use different internal networks (vlans) in Openshift. How to do this, is for some other time. It involves NMState and NetworkAttachment definitions if you can’t wait.

Important here is that you NOT enable the ‘mac spoofing check’ in the multus configuration. Our nested VMs will emit network traffic using a different MAC address from the ‘physical’ VM.

I opted to add a second network interface next to the standard ‘k8s internal’ interface. Select the defined NetworkAttachment to attach the VM to some physical network.

It is important that you change the network ‘model’ from virtio to e1000e. You need to do this for both network interfaces. I also had to set networkInterfaceMultiqueue: false in the VM yaml source.

Disks

Next to the CD-ROM disk, we need two extra disks:

  • Root disk: 4G is sufficient
  • datastore disk: storage for your VMs: f.e. 100G

Here you have to make sure the interface is sata instead of virtio.

You can delete the cloud-init disk since we don’t need it, but is added because of the template.

ESXi

As you noticed we had to select specific ‘hardware’ for the disks and networking. ESXi is very limited in the hardware it supports.

The ESXi setup is very easy: just start the VM and the ESXi setup will pop up after the boot process. Follow the wizard to choose things such as the root password. Select the small disk as root disk.

Finally start your ESXi VM. After about a minute, the URL will be visualized that you can visit to manage your brand new virtualization platform. Use the root credentials that you provided earlier in the setup wizard.

Datastore

You can easily create a new datastore using your large disk (f.e. 100G) to be used for your VMs in ESXi.

vSwitch

I disabled security checks in the default vSwitch while debugging but this is probably not necessary. Unless you do nesting one level deeper of course…

Conclusion

Again, I was pleasantly surprised how little configuration was required to do crazy stuff like this. We can finally retire yet another bare metal server and replace it by a k8s pod.

There are a few bumps that can stop you from doing this, but I believe I listed them all here.

To be clear, you obviously sacrifice some performance in a setup like this, but it is still surprising fast.

Here a concluding shot of an ubuntu VM running in the ESXi web interface:

picture 2