Last week I wasn't feeling well, so there was no This Week in Spectrum.
Where we left off, I had been attempting to port vhost-user-net support from cloud-hypervisor to crosvm. I'd been trying to port the first incarnation of the code in cloud-hypervisor to the contemporary version of crosvm from when it was added, thinking that that would be easier because the two codebases together. But I ran into the problem that this earliest incarnation of the vhost-user-net code from cloud-hypervisor didn't actually work (at least with the backend I was attempting to test it with). I'd been attempting to figure out exactly which changes were required to make it work, but hadn't been successful with that yet, and I thought I'd probably need to start the port over, from the latest cloud-hypervisor and crosvm code.
The next day, I decided to give my previous strategy one more try, though, and an hour or two later, I found the required cloud-hypervisor change, applied it to crosvm, and it worked! So I now have a crosvm tree capable of vhost-user-net.
This means that it's looking good for my plans for inter-guest networking, and network hardware isolation. With that in place, I decided to start thinking about other kinds of hardware isolation and inter-VM communication, and that's what I did for most of the last two weeks. Let's go through them:
Files will be shared between VMs using virtio-fs. This has the unique feature of (soon) being able to bypass guest page caches, and have only a single shared cache between VMs. This brings a performance improvement, but as I understand it, should also reduce memory consumption because each VM won't have to maintain its own copy of a disk-backed page. Of course, this feature (DAX) is also a big side channel, so it won't be appropriate for all use cases. But I think for some things people want to do with Spectrum, this will be very important.
The problem with this is that, because it uses the page cache of the host kernel, the host has to know about the filesystem that's being shared -- there's no running virtiofsd in a VM if we want DAX. But I'd really like it if a (non-boot) block device could be used as a filesystem without the host having to actually talk to the device. I was stuck here, but edef pointed out to me that we could use the kernel's 9P support to attach the block device to a VM, and then mounting the filesystem in the host over 9P, either over a network connection or (ideally) vsock. It looks like the kernel should be able to handle 9P over vsock, but I haven't tested yet. We can use existing virtiofsd and 9P software (there are promising Rust implementations of each), and harden them against potential vulnerabilites like directory traversals using kernel features like RESOLVE_BENEATH and RESOLVE_NO_XDEV. For the boot device, maybe there's no reason not to just mount it using the host kernel, or maybe there's something to be gained by just reading a small bootstrap payload into memory from the start of the disk once, and then making all future communication go via a VM. I'm not really sure yet. But the important thing is we'll have mechanisms for all this in place. Maybe we'll decide that non-boot devices should just go over inter-VM 9P, but in any case, we'll still need all these pieces.
GPU isolation should be possible by forwarding the GPU to a VM, but there are a few problems here. The first is that it would mean rendered surfaces have to be copied via shared memory to the VM with the GPU, before being sent to the GPU. Additionally, sharing the GPU between VMs for rendering at all would require significantly more work. The result of this is that graphics performance using an isolated GPU will probably be poor, at least for now. The final problem is that passthrough of integrated GPUs seems to be very difficult to get right. I will probably need to acquire some hardware that I've sene a report of this working on, so I can figure out what I've been doing wrong on the two computers I've tried it on so far. I suspect that I will get GPU isolation working, but I'm not sure how reliable or performant it will be.
For generic USB devices, I expect to be able to take an approach similar to Qubes, having a VM to handle interactions with the hardware USB controller, and exposing individual USB devices over USB/IP to other VMs. It would be nice if I could use vsock for this too.
Philipp registered a Matrix room and bridged it to the #spectrum IRC channel. I'm told that this should make it easier for Matrix users to join the room, since some bug in Matrix's IRC bridge prevents people from joining from Matrix the usual way. Philipp also sent a patch to improve the instructions for Matrix users joining the channel on the website. Thanks Philipp!
I sent the previously requested patch to resolve ambiguities in the vhost-user spec. No response yet, though. I'll probably resend it some time soon.
I'm finding it hard to keep going at the moment. The stuff I'm doing now is probably the hardest part of implementing Spectrum, and it's frustrating to realise that not everything I want to do is going to be possible. So much of the KVM ecosystem assumes that things will be host<->guest, and there's not always an easy solution. But, whatever we end up with, it's going to be a lot better than what I'm using today, and what lots of other people are using today. I think I'm going to be able to deliver a good experience with a fairly high degree of protection against malicious hardware. But it's not going to be perfect.
I'm pushing quite hard to make it over the line with my hardware isolation funding milestone. I'm so close, and I'm about to need the money. But once I've hit that, I think I'm going to need a break. This stuff is gruelling.