This has been a week of thinking I wanted to do one thing, not being sure how to do it, and finding out that there was a better way. I'll write it up in the order it happened.
Last week, I described that I wanted to implement a virtio proxy to be able to allow a kernel in an application VM to use a virtual device in another VM. I was wondering how to manage virtio buffers, and thought that I probably wanted an allocator to be able to manage throwing buffers of different sizes around.
This turned out to be a case of the XY problem. I couldn't find a good solution, but it turned out that an allocator wasn't what I wanted anyway. edef pointed out that I could just make the shared memory I allocated as big as necessary to hold buffers of the maximum size I wanted to support. The kernel will only actually allocate pages as they are written to, and I could use fallocate with FALLOC_FL_PUNCH_HOLE to tell the kernel it can drop pages when I'm done with them. This would mean that an unusually large buffer would only take up lots of memory while it was in use, and as soon as it was done with, the kernel would be able to take back the memory. So exactly what I wanted from an allocator, but with no need for an allocator at all!
This made the implementation much simpler, and by Friday I was able to get the proxy into a state where it could pass unit tests that transported messages in both directions through it.
And then it was suggested to me that maybe a virtio proxy is not what I want after all.
The main disadvantage to a virtio proxy is that it requires context switching to the host to send data between VMs. This is a trade-off I was aware of, but a virtio proxy is pretty straightforward to write as inter-VM communication systems go, and I was not aware of anything else that would be up to the job. As it turns out, there is something.
vhost-user is a mechanism for connecting, say, a virtio device to a userspace network stack in a performant way. I was aware of this, but what I was not aware of was virtio-vhost-user. virtio-vhost-user is a proposed mechanism to allow a VMM to forward a vhost-user backend to a VM. This means that two VMs could directly share virtqueues, with no host copy step. This would mean there would be no opportunity for the host to mediate communication between two guests, but that wasn't really on the cards anyway -- if it's ever required, a virtio proxy would probably be the way to go. For all the other cases, virtio-vhost-user would be a faster, cleaner way of sharing network devices between VMs.
The main problem with virtio-vhost-user is that it's still in its infancy. There's a patchset implementing it for QEMU that's a couple of years old, but that has not been accepted upstream. The main blocker for this seems to be first standardising it in the Virtio spec. The good news here is that the standardisation process seems to be progressing actively at the moment. It's being discussed on the virtio-dev mailing list basically right now, with the most recent emails dated Friday (unfortunately, I don't know of a good web archive with virtio-dev, but you can find the thread on Gmane if you're interested but not subscribed to the list).
The good news is that virtio-vhost-user mostly works by composing things that already exist. There's no kernel work required, because devices are just exposed by the VMM as regular virtio devices. The frontend VM (i.e. the one that uses the virtual device, as opposed to the one that provides it) doesn't need any special virtio-vhost-user support, because it just needs to speak normal vhost-user. Only the backend VM needs support for virtio-vhost-user, because its VMM needs to expose the vhost-user backend from the host to that VM.
This means that provisionally using virtio-vhost-user in Spectrum actually looks very feasible, with a couple of compromises. For evaluation purposes, it's not worth writing a virtio-vhost-user device for crosvm. But, the VMs that need that device are the ones that are very specialised -- VMs that manage networking or block devices or similar. So for these VMs, for now, we could use QEMU, with the virtio-vhost-user patch. I investigated what it would take to port it to the most recent QEMU version, and the answer appears to be "not much at all". Obviously having two VMMs in the Trusted Computing Base (TCB) isn't something we'd want in the long term, but it would be fine for, say, reaching the next funding milestone. If we decide that virtio-vhost-user is the way to go after all, support in crosvm can be added then -- in general, adding a new virtio device to crosvm isn't a huge undertaking.
Earlier, I said that the application side of the communication doesn't need anything special, because to that it's just regular vhost-user. This is true, but I glossed over there that crosvm doesn't actually implement vhost-user. Implementing vhost-user in crosvm would probably be a big deal at this stage, and not something I feel would be a good use of my time. BUT! Remember, crosvm has two children: Amazon's Firecracker, and at so-called "serverless" computing; and Intel's Cloud Hypervisor, which aims at traditional, full system server virtualisation. And both of these children inherited the crosvm device model from their parents, and Cloud Hypervisor implements vhost-user. So I _think_ it should be possible to pretty much lift the vhost-user implementation from Cloud Hypervisor, and use it in crosvm. Pretty neat!
So, the setup I'd like to evaluate is QEMU with the virtio-vhost-user patch on one side, and crosvm with Cloud Hypervisor's vhost-user implementation on the other.
It might well be that there are complications here. If there are, I'll probably just finish the proxy and move on for now, because I want to keep up the pace. I do think that virtio-vhost-user is probably the way to do interguest networking in the long-term, though.
Another thing that I've realised is that I don't need to worry about pulling bits out of crosvm to run in other VMs. I focused a lot on that towards the beginning of the year, mostly motivated by Wayland, because the virtio wayland implementation in crosvm is the only one there is. Now that that works in a different way, though, there's no need to continue down this path, because things like networking can be done in more normal ways through virtio and the device VM kernel.
: https://en.wikipedia.org/wiki/XY_problem : https://man7.org/linux/man-pages/man2/fallocate.2.html : https://wiki.qemu.org/Features/VirtioVhostUser : https://github.com/stefanha/qemu/compare/master...virtio-vhost-user : https://lists.nongnu.org/archive/html/qemu-devel/2019-04/msg03082.html : https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.h... : https://firecracker-microvm.github.io/ : https://github.com/cloud-hypervisor/cloud-hypervisor : https://github.com/cloud-hypervisor/cloud-hypervisor/blob/b4d04bdff6a7e2c3da...
Overall, it's been frustrating for me to try things, and discover they're not going to work, or not going to work as well as some other thing, and make a call on whether to keep going on something I know is the worse option or switch to the better thing. I have to keep reminding myself that Spectrum is a research project, and there are always going to be false starts like this. Lots of what we're doing is either very unusual (virtio-vhost-user) or brand new (interguest Wayland), after all.