This (and Last) Week in Spectrum, 2020-W34 & 2020-W35

24 Aug 2020

      Last week I wasn't feeling well, so there was no This Week in Spectrum.

crosvm
------

Where we left off, I had been attempting to port vhost-user-net support
from cloud-hypervisor to crosvm.  I'd been trying to port the first
incarnation of the code in cloud-hypervisor to the contemporary version
of crosvm from when it was added, thinking that that would be easier
because the two codebases together.  But I ran into the problem that
this earliest incarnation of the vhost-user-net code from
cloud-hypervisor didn't actually work (at least with the backend I was
attempting to test it with).  I'd been attempting to figure out exactly
which changes were required to make it work, but hadn't been successful
with that yet, and I thought I'd probably need to start the port over,
from the latest cloud-hypervisor and crosvm code.

The next day, I decided to give my previous strategy one more try,
though, and an hour or two later, I found the required cloud-hypervisor
change, applied it to crosvm, and it worked!  So I now have a crosvm
tree capable of vhost-user-net[1].

This means that it's looking good for my plans for inter-guest
networking, and network hardware isolation.  With that in place, I
decided to start thinking about other kinds of hardware isolation and
inter-VM communication, and that's what I did for most of the last two
weeks.  Let's go through them:

Files will be shared between VMs using virtio-fs.  This has the
unique feature of (soon) being able to bypass guest page caches, and
have only a single shared cache between VMs.  This brings a performance
improvement, but as I understand it, should also reduce memory
consumption because each VM won't have to maintain its own copy of a
disk-backed page.  Of course, this feature (DAX) is also a big side
channel, so it won't be appropriate for all use cases.  But I think for
some things people want to do with Spectrum, this will be very
important.

The problem with this is that, because it uses the page cache of the
host kernel, the host has to know about the filesystem that's being
shared -- there's no running virtiofsd in a VM if we want DAX.  But I'd
really like it if a (non-boot) block device could be used as a
filesystem without the host having to actually talk to the device.  I
was stuck here, but edef pointed out to me that we could use the
kernel's 9P support to attach the block device to a VM, and then
mounting the filesystem in the host over 9P, either over a network
connection or (ideally) vsock.  It looks like the kernel should be able
to handle 9P over vsock, but I haven't tested yet.  We can use existing
virtiofsd and 9P software (there are promising Rust implementations of
each), and harden them against potential vulnerabilites like directory
traversals using kernel features like RESOLVE_BENEATH and
RESOLVE_NO_XDEV.  For the boot device, maybe there's no reason not to
just mount it using the host kernel, or maybe there's something to be
gained by just reading a small bootstrap payload into memory from the
start of the disk once, and then making all future communication go via
a VM.  I'm not really sure yet.  But the important thing is we'll have
mechanisms for all this in place.  Maybe we'll decide that non-boot
devices should just go over inter-VM 9P, but in any case, we'll still
need all these pieces.

GPU isolation should be possible by forwarding the GPU to a VM, but
there are a few problems here.  The first is that it would mean rendered
surfaces have to be copied via shared memory to the VM with the GPU,
before being sent to the GPU.  Additionally, sharing the GPU between VMs
for rendering at all would require significantly more work.  The result
of this is that graphics performance using an isolated GPU will probably
be poor, at least for now.  The final problem is that passthrough of
integrated GPUs seems to be very difficult to get right.  I will
probably need to acquire some hardware that I've sene a report of this
working on, so I can figure out what I've been doing wrong on the two
computers I've tried it on so far.  I suspect that I will get GPU
isolation working, but I'm not sure how reliable or performant it will
be.

For generic USB devices, I expect to be able to take an approach similar
to Qubes[2], having a VM to handle interactions with the hardware USB
controller, and exposing individual USB devices over USB/IP to other
VMs.  It would be nice if I could use vsock for this too.

[1]: https://spectrum-os.org/git/crosvm/?h=vhost-user-net
[2]: https://www.qubes-os.org/doc/usb-devices/

spectrum-os.org
---------------

Philipp registered a Matrix room and bridged it to the #spectrum IRC
channel.  I'm told that this should make it easier for Matrix users to
join the room, since some bug in Matrix's IRC bridge prevents people
from joining from Matrix the usual way.  Philipp also sent a patch[3] to
improve the instructions for Matrix users joining the channel on the
website.  Thanks Philipp!

[3]: https://spectrum-os.org/lists/archives/spectrum-devel/87wo247zu7.fsf@alyssa....

QEMU
----

I sent the previously requested patch[4] to resolve ambiguities in the
vhost-user spec.  No response yet, though.  I'll probably resend it some
time soon.

[4]: https://lore.kernel.org/qemu-devel/20200813094847.4288-1-hi@alyssa.is/

I'm finding it hard to keep going at the moment.  The stuff I'm doing
now is probably the hardest part of implementing Spectrum, and it's
frustrating to realise that not everything I want to do is going to be
possible.  So much of the KVM ecosystem assumes that things will be
host<->guest, and there's not always an easy solution.  But, whatever we
end up with, it's going to be a lot better than what I'm using today,
and what lots of other people are using today.  I think I'm going to be
able to deliver a good experience with a fairly high degree of
protection against malicious hardware.  But it's not going to be
perfect.

I'm pushing quite hard to make it over the line with my hardware
isolation funding milestone.  I'm so close, and I'm about to need the
money.  But once I've hit that, I think I'm going to need a break.  This
stuff is gruelling.

This (and Last) Week in Spectrum, 2020-W34 & 2020-W35

Alyssa Ross