I sent the last This Week in Spectrum in August. Since then, it's been pretty quiet. What happened?
Towards then end of last year, I was aggressively pushing to achieve a funding milestone, but was starting to struggle to keep myself going. After that last TWiS, I decided to stop posting them for a while to fully concentrate on the push. I continued like that for two more months, before getting to the point where I was so burned out I just couldn't keep going, with the milestone still not done, the work I had done so far on it unpublished (and not in a good state to be published), and time running out.
Fortunately, I was able to talk to NLnet, and they agreed to extend my funding deadlines and renegotiate the milestones. With that sorted out, in late November I began a hiatus from Spectrum to allow myself to recover from the burnout. I'm very grateful to my GitHub Sponsors, thanks to whom I was able to take this time to recover.
I tried to come back at the end of January, but after a single day of Spectrum work I was feeling much the same as I had been in November.
This week, though, something changed! I've been working on Spectrum again for the last week, and I've been feeling great about it!
Now, there were three months of work between the last TWiS in August, and the start of my hiatus. That's a lot of work I haven't written about. But I think the only way this is really going to work is if I leave that largely unwritten about for now, because TWiS tends to be thousands of words long even when it's covering a single week of work. So I'm just going to talk about what I've done in the last week, explaining context from previous work as necessary.
With all that in mind, here we go!
"Qubes-lite with KVM and Wayland" ------------------------------------
First up is something somebody else did, which is always nice! Thomas has been doing his own great work replacing his Qubes system with a system built on NixOS, using KVM and Wayland.
Thomas got in touch on the Spectrum discuss mailing list in January asking some questions about how some things were implemented, and we went back and forth a bit then (although not as much as I'd have liked, due to my burnout :/). Since then, he's come up with some awesome stuff, like an OCaml implementation of a virtio-wayland proxy (equivalent to Sommelier in Chromium OS).
In our further conversation prompted by Thomas's article, Thomas raised a further interesting idea -- what if we had a Sommelier-like Wayland proxy, but that ran in the same security domain as the compositor. That way, we could filter to allow only the protocol extensions we want to. We could implement our own permissions when the compositor doesn't. This code would be responsible for securely handling Wayland messages from an untrusted guest, which would be beneficial because it would be a lot less code to audit, and because it could be written in a memory-safe language. This would avoid the need to write or find a secure Wayland compositor, and would in fact allow us to use virtually any compositor without having to worry nearly as much about its security credentials.
I'm very excited by that idea, and I've been asking myself for the past few days why I didn't think of it before! I've thought several times "it's quite nice that Sommelier gives us a known subset of Wayland, but it's a shame we can't rely on it since it runs inside a application guest". But for some reason it just never occurred to me to follow that idea one step further, to "what if we had a similar program that run in the compositor's security domain".
: https://roscidus.com/blog/blog/2021/03/07/qubes-lite-with-kvm-and-wayland/ : https://spectrum-os.org/lists/archives/spectrum-discuss/CAG4opy_hz_ESEpY0TqJ... : https://spectrum-os.org/lists/archives/spectrum-discuss/CAG4opy_JztAH3tD+ZUq...
ucspi-vsock is a program I started writing in September. VSOCK is a special Linux socket type that allows for communication between VMs, and I'm using it to allow services running in VMs (a USBIP daemon, for example), to notify the host (and by extension other VMs), when the service in the VM becomes ready to accept connections. USCPI is a standard command interface for making it easy to do socket communication using standard IO primitives, without having to teach programs how to do socket connections for every kind of socket. There are several UCSPI implementations for common socket types like Internet and Unix domain sockets, but until now there was not one for VSOCK.
So, for example, in a VM we can do:
vsockclient 2 $port sh -c 'echo >&7'
Which will connect to the host over VSOCK (in VSOCK the host is always address 2), write a newline, and close the connection. UCSPI lets us implement this with standard shell tools. There's an equivalent "vsockserver" program for the other end, that will run a command whenever a connection is received.
Anyway, this week I went back to ucspi-vsock to fix some bugs. As usual, (although I'm happy to realise this is still the case after all these months!) Cole was kind enough to review my patches.
: https://spectrum-os.org/git/ucspi-vsock/ : https://man7.org/linux/man-pages/man7/vsock.7.html : https://cr.yp.to/proto/ucspi.txt : https://spectrum-os.org/lists/archives/spectrum-devel/20210309154048.14474-1... : https://spectrum-os.org/lists/archives/spectrum-devel/20210309171816.8589-1-... : https://spectrum-os.org/lists/archives/spectrum-devel/20210310204516.20041-1... : https://spectrum-os.org/lists/archives/spectrum-devel/20210310204555.20725-1...
Interguest networking ---------------------
The bulk of my work this week has been in Spectrum's Nixpkgs, which is where (for now, at least) all the VM definitions and stuff live. Mostly, I've been trying to clean up months of frantic work into something I can actually publish as a series of patches, so it's out there instead of just on my computer!
My focus at the moment is on doing this with the interguest networking implementation I wrote last year. The plan for this remains the same as last year: have a VM that manages all network hardware access and acts as a router for other VMs that need to talk to the outside world. Eventually, this will hopefully use virtio-vhost-user, which will allow us to avoid going through a networking stack on the host at all (or even having one), but until virtio-vhost-user is further along, we'll use bridge devices on the host to connect client VMs to the router. A very basic version of the latter is implemented and working for me locally -- I can run an application VM, and have it connect to another VM which runs the drivers for my ethernet port using VFIO (PCI passthrough).
This code was a big ball of mud, and I'd also subtly broken it when I moved on from interguest networking last year to try to get a proof of concept going for another form of interguest communication. So I spent this week trying to page all the context I'd lost in the last few months back into my brain, making lots of partial git commits, writing commit messages, and fixing it up so that it works again.
I think what I have is about ready to publish -- hopefully you'll see some patches next week. But what exists at the moment is still very limited -- the biggest limitation is that the router VM doesn't support hotplugging, so a VM that wants to connect to it actually has to be started first, before the router VM. Not good for something that should be run as a system service and likely started on boot! To overcome this limitation, I'll probably have to add support in crosvm for adding network devices at runtime using the control socket. I think this should be easy enough, and I haven't looked at what's happened in crosvm + rust-vmm since I've been away, so it's even possible somebody else will have done this already.
So, the todo list for next week is getting the initial interguest networking PoC patchset posted. That'll come with a nice little demo other people will be able to try out if they want to. And then, I'll work on improving it further to the point where it's actually practical. I don't want to spend /too/ much time on this, since ultimately I do want to be using a virtio-vhost-user stack, which is quite different, but it's important to have an implementation of this so there's something to test further development (that doesn't care about the implementation details) against.
My biggest worry at the moment is burning out again. Working in moderation isn't easy for me (I tend to get very sucked into things), but it's important for the project that I find a way to keep my work sustainable, so we don't lose time like this again.
I'm still a little hesitant to say "this is it, I'm back for real". But the signs are looking good.
Thank you for sticking with me through this.