This Week in Spectrum, 2020-W29

20 Jul 2020

      This has been a week of thinking I wanted to do one thing, not being
sure how to do it, and finding out that there was a better way.  I'll
write it up in the order it happened.

crosvm
------

Last week, I described that I wanted to implement a virtio proxy to be
able to allow a kernel in an application VM to use a virtual device in
another VM.  I was wondering how to manage virtio buffers, and thought
that I probably wanted an allocator to be able to manage throwing
buffers of different sizes around.

This turned out to be a case of the XY problem[1].  I couldn't find a
good solution, but it turned out that an allocator wasn't what I wanted
anyway.  edef pointed out that I could just make the shared memory I
allocated as big as necessary to hold buffers of the maximum size I
wanted to support.  The kernel will only actually allocate pages as they
are written to, and I could use fallocate[2] with FALLOC_FL_PUNCH_HOLE
to tell the kernel it can drop pages when I'm done with them.  This
would mean that an unusually large buffer would only take up lots of
memory while it was in use, and as soon as it was done with, the kernel
would be able to take back the memory.  So exactly what I wanted from an
allocator, but with no need for an allocator at all!

This made the implementation much simpler, and by Friday I was able to
get the proxy into a state where it could pass unit tests that
transported messages in both directions through it.

And then it was suggested to me that maybe a virtio proxy is not what I
want after all.

The main disadvantage to a virtio proxy is that it requires context
switching to the host to send data between VMs.  This is a trade-off I
was aware of, but a virtio proxy is pretty straightforward to write as
inter-VM communication systems go, and I was not aware of anything else
that would be up to the job.  As it turns out, there is something.

vhost-user is a mechanism for connecting, say, a virtio device to a
userspace network stack in a performant way.  I was aware of this, but
what I was not aware of was virtio-vhost-user[3].  virtio-vhost-user is
a proposed mechanism to allow a VMM to forward a vhost-user backend to a
VM.  This means that two VMs could directly share virtqueues, with no
host copy step.  This would mean there would be no opportunity for the
host to mediate communication between two guests, but that wasn't really
on the cards anyway -- if it's ever required, a virtio proxy would
probably be the way to go.  For all the other cases, virtio-vhost-user
would be a faster, cleaner way of sharing network devices between VMs.

The main problem with virtio-vhost-user is that it's still in its
infancy.  There's a patchset[4] implementing it for QEMU that's a couple
of years old, but that has not been accepted upstream.  The main blocker
for this seems to be first standardising it in the Virtio spec[5][6].  The
good news here is that the standardisation process seems to be
progressing actively at the moment.  It's being discussed on the
virtio-dev mailing list basically right now, with the most recent emails
dated Friday (unfortunately, I don't know of a good web archive with
virtio-dev, but you can find the thread on Gmane if you're interested
but not subscribed to the list).

The good news is that virtio-vhost-user mostly works by composing things
that already exist.  There's no kernel work required, because devices
are just exposed by the VMM as regular virtio devices.  The frontend VM
(i.e. the one that uses the virtual device, as opposed to the one that
provides it) doesn't need any special virtio-vhost-user support, because
it just needs to speak normal vhost-user.  Only the backend VM needs
support for virtio-vhost-user, because its VMM needs to expose the
vhost-user backend from the host to that VM.

This means that provisionally using virtio-vhost-user in Spectrum
actually looks very feasible, with a couple of compromises.  For
evaluation purposes, it's not worth writing a virtio-vhost-user device
for crosvm.  But, the VMs that need that device are the ones that are
very specialised -- VMs that manage networking or block devices or
similar.  So for these VMs, for now, we could use QEMU, with the
virtio-vhost-user patch.  I investigated what it would take to port it
to the most recent QEMU version, and the answer appears to be "not much
at all".  Obviously having two VMMs in the Trusted Computing Base (TCB)
isn't something we'd want in the long term, but it would be fine for,
say, reaching the next funding milestone.  If we decide that
virtio-vhost-user is the way to go after all, support in crosvm can be
added then -- in general, adding a new virtio device to crosvm isn't a
huge undertaking.

Earlier, I said that the application side of the communication doesn't
need anything special, because to that it's just regular vhost-user.
This is true, but I glossed over there that crosvm doesn't actually
implement vhost-user.  Implementing vhost-user in crosvm would probably
be a big deal at this stage, and not something I feel would be a good
use of my time.  BUT!  Remember, crosvm has two children: Amazon's
Firecracker[7], and at so-called "serverless" computing; and Intel's
Cloud Hypervisor[8], which aims at traditional, full system server
virtualisation.  And both of these children inherited the crosvm device
model from their parents, and Cloud Hypervisor implements vhost-user[9].
So I _think_ it should be possible to pretty much lift the vhost-user
implementation from Cloud Hypervisor, and use it in crosvm.  Pretty
neat!

So, the setup I'd like to evaluate is QEMU with the virtio-vhost-user
patch on one side, and crosvm with Cloud Hypervisor's vhost-user
implementation on the other.

It might well be that there are complications here.  If there are, I'll
probably just finish the proxy and move on for now, because I want to
keep up the pace.  I do think that virtio-vhost-user is probably the
way to do interguest networking in the long-term, though.

Another thing that I've realised is that I don't need to worry about
pulling bits out of crosvm to run in other VMs.  I focused a lot on that
towards the beginning of the year, mostly motivated by Wayland, because
the virtio wayland implementation in crosvm is the only one there is.
Now that that works in a different way, though, there's no need to
continue down this path, because things like networking can be done in
more normal ways through virtio and the device VM kernel.

[1]: https://en.wikipedia.org/wiki/XY_problem
[2]: https://man7.org/linux/man-pages/man2/fallocate.2.html
[3]: https://wiki.qemu.org/Features/VirtioVhostUser
[4]: https://github.com/stefanha/qemu/compare/master...virtio-vhost-user
[5]: https://lists.nongnu.org/archive/html/qemu-devel/2019-04/msg03082.html
[6]: https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.h...
[7]: https://firecracker-microvm.github.io/
[8]: https://github.com/cloud-hypervisor/cloud-hypervisor
[9]: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/b4d04bdff6a7e2c3da...

Overall, it's been frustrating for me to try things, and discover
they're not going to work, or not going to work as well as some other
thing, and make a call on whether to keep going on something I know is
the worse option or switch to the better thing.  I have to keep
reminding myself that Spectrum is a research project, and there are
always going to be false starts like this.  Lots of what we're doing is
either very unusual (virtio-vhost-user) or brand new (interguest
Wayland), after all.

This Week in Spectrum, 2020-W29

Alyssa Ross