Some time in the next week, the #spectrum:libera.chat Matrix room is
going to stop working, as a result of a dispute between Libera.Chat and
matrix.org[1].
So, unfortunately, if you want to keep using Matrix to participate in
the Spectrum chat, you need to take some action:
1. Leave #spectrum:libera.chat
2. Join #spectrum:fairydust.space
It's very important to do it in that order, because otherwise the bridge
gets confused and bad things happen.
If something goes wrong, rejoin either channel, ask for help, and we'll
figure it out.
On the positive side, the new Matrix room is controlled by me, rather
than by the administrators of the Matrix-Libera bridge. This means we
can have nice things like room history, and if something like this
happens again (needing to use a different bridge, moving IRC network,
etc.), I should be able to use my room admin powers to change the setup
transparently, without any action being required from individual Matrix
users. So this is hopefully the last time I'm going to have to ask
Matrix users to join a new room.
[1]: https://libera.chat/news/matrix-deportalling
Hi all, I thought I'd try a different format of update. It's
difficult to find the time for the big This Week in Spectrum updates
I've tried to do before, but I'd like to provide some sort of account
of what I've been doing.
So, here's an overview of what I did in March. I'm happy to expand on
any of it that sounds interesting — just hit Reply All and tell me
what you'd like to hear more about!
I'm also interested to hear what you think of this status update
format. I'd like to get better about communicating what I'm working
on, in a sustainable way. Let me know if you have any suggestions!
Miscellaneous
-------------
• Edited and published demo video[1]
• Set up an IRC bot to post incoming mailing list messages
• Switched from Busybox's modprobe to kmod
• Removed unused dependencies
• Various other cleanups and fixes
• Started work towards CI for Spectrum
• Prototyped a shared base image for application VMs
[1]: https://diode.zone/w/dWAWHR38Zu3feRtDKjVEJb
virtiofs investigation
----------------------
• Prototyped virtiofs VM filesystem access
• Reported a bug: "Can't run unprivileged any more due to setgroups"[2]
• Participated in discussion and testing of Musl port[3][4][5]
[2]: https://gitlab.com/virtio-fs/virtiofsd/-/issues/36
[3]: https://github.com/slp/capng/pull/2#issuecomment-1059976861
[4]: https://github.com/slp/capng/pull/3
[5]: https://github.com/rust-lang/libc/pull/2713
Spectrum-related upstream Nixpkgs commits
-----------------------------------------
• lvm2: don't use targetPlatform (05a6c124e65)
• coreutils: add debug output (e30f0f31e8d)
• pkgsMusl.systemd: fix build for 250.4 (39eee39fd92)
• nghttp2: only run tests on GNU (8685cea963b)
• python3.pkgs.importlib-metadata: fix cross (3c7b77e638b)
• spidermonkey: use the same LLVM as rustc (3ff5f0eb764)
• pkgsStatic.stdenv.cc.cc: put static libs in $lib (12c37aec377)
• Revert "gcc: Always pass `--enable-shared` by default" (c6dd11ca39a)
• libudev-zero: 1.0.0 -> 1.0.1 (c7b7ad77985)
• linux_latest: 5.16.14 -> 5.17 (58ae11758e8)
• crosvm: 81.12871.0.0-rc1 -> 99.14468.0.0-rc1 (6aefdafbed9)
• shadow: 4.8.1 -> 4.8.11 (8d35d7e2bf1)
• pkgsMusl.libnetfilter_conntrack: fix build (2cc5ec86571)
• pkgsMusl.systemdMinimal: fix build (b8734c50e29)
• linux.configfile: fix alts containing "/m" (fb079c3110d)
• cloud-hypervisor: 21.0 -> 22.0 (36a211e1ee3)
• edk2: 202108 -> 202202 (9222b68380e)
• kmod: add dev and lib outputs (dc1303185f8)
• systemd: update patchShebangs comment (a0bfc8e7c1f)
• systemd: fix a whole bunch of typos (479b1cb510b)
Pending Spectrum related Nixpkgs PRs
------------------------------------
• crosvm: add support for virgl_renderer{,_next} (#165128)
• qemu: 6.2.0 -> 7.0.0 (#165291)
Spectrum infra related upstream Nixpkgs commits
-----------------------------------------------
• irccat: init at 0.4.8 (ce8cbe3c01f)
• git: enable debug info (4345b27dedf)
• cgit-pink: init at 1.3.0 (deab83e1167)
• mailman-web: fix django version check removal (3512f5b7075)
Demo video related upstream Nixpkgs commits
-------------------------------------------
• ccsymbols: init at 2020-04-19 (cf7556eea5a)
Hi all,
For the last couple of weeks I've been working on a video demo of some
things I've been working on in Spectrum:
https://diode.zone/w/dWAWHR38Zu3feRtDKjVEJb
Happy to answer any questions, either here, in the comments, or on IRC.
Alyssa
Hi everyone, especially people who're new to the project after seeing it
on Hacker News recently.
Those of you who've been around for a while will remember that the end
of last year was a busy time for Spectrum because of funding cycles, and
that's going to be the case again this year.
As a result of this, I'm going to have my head down until the end of the
year, and am not planning on doing more This Week in Spectrum this year.
I love TWiS, but it takes hours to write every week I manage to get it
done, and I need to spend those hours elsewhere at the moment. It'll be
back next year.
If you do still want to keep up with development (and I hope you do!),
I'm trying to talk in more detail about what I'm working on in the
#spectrum IRC channel. There's been a good reaction to this so far, and
the conversation has been extremely lively for the last few days. There
are consistently over 120 people in the channel now, which is amazing!
You can find information about how to join the channel through Matrix or
IRC on the website[1].
Hope to see you there!
[1]: https://spectrum-os.org/participating.html#irc
In reference to your upcoming vaccination ...
Most people don't experience side effects, but
some do; young adults are the most often affected.
Apparently the spike proteins synthesized and
released into the bloodstream like to attach to
epithelial cells, particularly in ovaries and
artery walls, attracting unwelcome attention of
the immune system to those loci.
A paper published last week covered results of a
randomly-controlled-trial administering arginine
supplements along with vaccinations, that proved
to substantially reduce incidence of unwelcome
side effects.
The good news is you can buy arginine right off
the shelf, and cheaply. You might not find out
if you *would have* got those side effects, which
may be disappointing, but that is much better
than actually having them. If in fact you would
not have, the excess arginine is at absolute worst
totally harmless.
If you don't feel like buying arginine tablets,
you can fill up on walnuts, almonds, or even
peanuts, which all deliver a nice surplus of
arginine.
(This will be my only mention of vaccines here.
I hope the mention was not unwelcome.)
Hi! I've had a busy day today and am pretty tired, so I'm not sure how
coherent my writing is at the moment. But I'd rather get this out on
time, especially since tomorrow is also busy today.
As I said last week[1], I took some time off this week as a preventative
measure against burnout.
[1]: https://spectrum-os.org/lists/archives/spectrum-devel/87lf51ybwk.fsf@alyssa…
rust-vmm
--------
One of my patches[2] was accepted, but another[3] is still waiting. As
I said last week, I have more fixes for rust-vmm planned, but want to
let them catch up with the changes I've already sent upstream first so I
know what base I'm using for the next stuff. I expect the one that's
still waiting to be accepted next week.
[2]: https://github.com/rust-vmm/vmm-sys-util/pull/135
[3]: https://github.com/rust-vmm/vhost/pull/69
spectrum-live
-------------
Last week, I'd just integrated dm-verity into the Spectrum live image
I've been working on. When it came time to work on the actual root
filesystem, instead of the initramfs, I hit a bit of a brick wall. I
realised that trying to generate a whole operating system image using
Nix was giving me real writers' block. There was too much inbetween me
and how the files ended up on disk, and that meant there was too much
overhead to keep in mind when I was thinking about how things should be
designed and laid out. It might feel like making a Linux root
filesystem should be a solved problem, but Spectrum has a bunch of
special requirements. You might want to do something like start a VM
for each hardware device of a specific type, and that's something that
isn't really addressed by most standard stuff. All this stuff is
definitely solveable, but it requires some experimentation to get right,
and Nix was getting a bit in the way of that.
So I created a new directory, and I wrote a Makefile that builds an
ext4 image, and I just started putting files in an etc/ directory. This
made reasoning about the system way easier, and I was immediately making
progress.
Nix is great for building known targets, and for making customisable
systems (there's no way my Makefile-based system would allow the amount
of customisation I'd easily be able to provide with Nix), but for
experimentation, it's a lot nicer to be closer to the end product. So
once I know how all this should look I'll make it Nix-aware.
Currently, I have a root filesystem with a service manager that can
respond to hardware appearing and disappearing. The next (more exciting)
step will be to have it start some VMs, and assign hardware to them
appropriately. I'm looking forward to getting to that next week. An
interesting challenge I'll have to solve will be figuring out simple
categories (e.g. "ethernet device") from the huge amount of very
specific information the kernel provides. I think I might be able
abuse the modules.alias file from the kernel, that defines the mappings
from PCI etc. information to default drivers. Then all I'll have to do
will be to write mappings from default drivers to whatever categories I
come up with, or what VM I want to assign them to, or whatever.
One neat thing I'm using for the first time here is tar2ext4[4], a
utility program that's part of a larger Microsoft open source project I
don't entirely understand the purpose of. It's really useful for me
it builds ext4 images entirely in userspace, which will be great for
using in Nix derivations where it's not possible to just mount an ext4
image and write directly to it. For previous Spectrum experiments, I'd
always used SquashFS[5], entirely because I already knew of a tar2sqfs
program[6] that made creating filesystem images really easy. I've added
tar2ext4 to Nixpkgs[7], which will hopefully help other people who have
similar problems discover it.
[4]: https://github.com/microsoft/hcsshim/blob/master/cmd/tar2ext4/tar2ext4.go
[5]: https://en.wikipedia.org/wiki/SquashFS
[6]: https://github.com/AgentD/squashfs-tools-ng
[7]: https://github.com/NixOS/nixpkgs/pull/134650
musl
----
While I was working on the root filesystem, I noticed that mount -a,
which mounts all filesystems described in fstab(5), wasn't working.
This turned out to be because of a bug in Musl's implementation of
getmntent(3), a libc function for parsing fstab files.
So I wrote some tests and a fix and sent them to Musl[8].
[8]: https://inbox.vuxu.org/musl/20210821085420.474615-1-hi@alyssa.is/
Hope that all made sense.
Next week, I'll continue working on Spectrum live, and maybe fix the
next rust-vmm issue I have on my todo list if my final outstanding PR
is merged.
It's a short update this week, because most of what I did was a
continuation of stuff from last week.
rust-vmm
--------
Last week, I mentioned I'd identified some Rust safety issues in
rust-vmm. Most of the patches for these are now up[1][2][3]. The first
has been accepted already, and I expect another to be accepted later
today. There's still a UB issue I'm aware of and haven't sent a fix for
yet, because there are a number of ways to fix it and I wanted to get my
other patches in first before I decided how to fix that one.
I deliberately haven't made any progress on using cloud-hypervisor's
vhost-user-net backend with crosvm, which is what got me looking at this
code in the first place. I want to make sure I can work on
rust-vmm-adjacent things at a pace where I don't get overwhelmed with
having to keep track of loads of patches and whether I've got them
upstream yet. So I'll be putting that work on hold until the current
round of patches are upstreamed.
[1]: https://github.com/rust-vmm/vhost/pull/68
[2]: https://github.com/rust-vmm/vmm-sys-util/pull/135
[3]: https://github.com/rust-vmm/vhost/pull/69
spectrum-live
-------------
For the past little while, in the time when I wasn't writing regular
updates, I've been working on a live system for testing Spectrum. This
will be especially useful for testing things like GPU support, because I
can just build a live image with everything I might need, plug it into
all the computers I want to test, and have everything be automatic from
there. It will also probably evolve directly into what becomes the
Spectrum base system that we'll hopefully all be running as the host
system on our machines at some point.
I shifted my focus back to this this week because of wanting to not get
ahead of myself with rust-vmm. (I have a funding milestone for GPU
support, so getting that checked off soon would be good.) The main
thing I did this week was integrate dm-verity[4], which I did mostly for
fun and to satisfy my curiosity.
dm-verity is a Linux mechanism to efficiently ensure that a read-only
filesystem hasn't been tampered with, by constructing a Merkle tree out
of filesystem block hashes, and providing the root hash to the kernel
when the filesystem is mounted. dm-verity is a _great_ fit for Nix,
because we can generate the hashes at the same time as creating the
filesystem image, and then embed the hash into the initramfs we're also
building. Getting this all working took less than a day. The idea is
that (long) in the future, we'll also implement Secure Boot, which will
make sure the kernel and initramfs haven't been tampered with, and
dm-verity will extend that integrity guarantee to the host system's root
filesystem. I recommend reading "Producing a trustworthy x86-based
Linux appliance"[5] by Matthew Garrett for an overview of how this all
comes together.
dm-verity is something that's particularly exciting to me, because it's
very useful to us, but it's something that's generally used to frustrate
end user attempts to control computers they own. In Spectrum, it's
instead a tool that protects the end user against malicious filesystem
changes, while being almost completely transparent to the user if they
do want to modify their own system.
Protecting against root filesystem tampering (which would require a VM
escape or physical device access) is hardly the biggest priority for
Spectrum, but integrating dm-verity was fun, interesting, and provided
good motivation for working on the live image, which is one of the
highest priority bits of the system. (Because I'm tired of having to
say "you can't" when people ask me how they can try out Spectrum.)
[4]: https://lwn.net/Articles/459420/
[5]: https://mjg59.dreamwidth.org/57199.html
This week, I'm going to take a bit of time off as an anti-burnout
defense, but probably not the whole week. I'll still keep an eye on the
rust-vmm patches throughout this time as well, to make sure they're not
delayed in getting accepted upstream.
What a week. Progress has felt a bit slow, but the work has been
consistently interesting, and there have been some exciting developments
in the ecosystem.
Nixpkgs
-------
Last week I'd posted the patches required to get all of Spectrum working
with a more up-to-date Nixpkgs. After last week's email, Cole reviewed
the patch set, so I applied it, and merged the nixpkgs-update branch
into master, so now Spectrum is using a reasonably up-to-date Nixpkgs
for the first time in a long time.
I also said
> Once that's done, it's time for another chromiumOSPackages upgrade, but
> that should be pretty easy this time because we're only one version behind.
Alas, it's never that simple.
Firstly, the chromiumOSPackages update script broke, because the
information about the currently released Chromium OS build seems to have
gone missing. Google is apparently[1] serving build number 13982, but
their published build metadata[2] includes builds 13981 and 13983, but
not 13982. This means it's not possible for me to know what Git
revisions are used in the currently released Chromium OS. Assuming this
is just a one time thing, I hacked the update script to just look at the
previous build, but we should keep an eye on this. If it ever happens
again we should probably implement some sort of mitigation in the update
script.
Once I had new versions of the Chromium OS packages it was time to get
them to build, which was straightforward enough for everything except
crosvm. For Spectrum, we have a patch[3] for crosvm to make it support
VIRTIO_NET_F_MAC[4], which is a mechanism by which the host system can
indicate to the guest kernel what it should set as the MAC address of a
virtual network device. After the update, this patch no longer applied,
because all of a sudden crosvm has two different virtual network device
implementations.
This turns out to be because crosvm has implemented vhost-user[5], a
protocol to allow virtual devices to be implemented outside the VMM
program! This is great news, but it's also surprising. virtiofs[6] was
designed to be implemented with vhost-user, but when crosvm implemented
it a while ago, they became the only implementer to do so in-VMM. It's
great to see them moving in the vhost-user direction, because it makes
it much easier to mix and match virtual device implementation and VMMs.
Most excitingly, I saw a reference to vhost-user-wl, meaning a
standalone implementation of Virtio Wayland. This would allow us to use
Virtio Wayland with other, non-crosvm VMMs, which is great because I
think cloud-hypervisor[7] is probably going to end up being a better fit
for Spectrum, but Virtio Wayland was crosvm's killer app. I'd even
thought about trying to port crosvm's virtio wayland implementation to
vhost-user myself, so it's great to know that when the time comes,
that'll already have been done for me.
While I could have fixed my crosvm patch for all this new virtual
network device code, the introduction of vhost-user support means we
should be able to drop the patch altogether. cloud-hypervisor provides
a vhost-user-net implementation[8] that already supports
VIRTIO_NET_F_MAC, so if we can just get crosvm to talk to
cloud-hypervisor's vhost-user-net implementation we shouldn't have to
carry any patch for this any more.
[1]: https://cros-updates-serving.appspot.com/
[2]: https://chromium.googlesource.com/chromiumos/manifest-versions
[3]: https://spectrum-os.org/git/nixpkgs/tree/pkgs/os-specific/linux/chromium-os…
[4]: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html
[5]: https://qemu.readthedocs.io/en/latest/interop/vhost-user.html
[6]: https://virtio-fs.gitlab.io/index.html#overview
[7]: https://github.com/cloud-hypervisor/cloud-hypervisor
[8]: https://github.com/cloud-hypervisor/cloud-hypervisor/tree/db2159b5638eb4ccf…
cloud-hypervisor / rust-vmm
---------------------------
So I started looking at cloud-hypervisor to try to hook this all up, but
it looks like cloud-hypervisor still has some issues where it doesn't
quite follow the vhost-user specification. As I was trying to debug
this, I noticed some UB in rust-vmm[9] (the shared utility code project
for crosvm and its derivatives like cloud-hypervisor and
Firecracker[10]).
Being a good citizen, I started working on a fix for this, and then I
encountered some more issues with functions that should have been marked
as unsafe but weren't. So that's going to need to be fixed too.
But the rust-vmm code is otherwise very high quality and easy to work
with, and they're very responsive, so it shouldn't take long to get
these issues fixed. The affected code is fortunately all to do with
communication between vhost-user backends and the VMM, so it's very very
unlikely that it's anything that could be exploited by a guest.
These sorts of problems are exactly the sort of thing that Rust is
supposed to prevent, so it was disappointing to discover these issues.
But on the other hand, Rust made them stand out like sore thumbs to me,
so even though these issues managed to sneak in, there's definitely
still a huge benefit to using Rust for these sorts of programs.
[9]: https://github.com/rust-vmm
[10]: https://firecracker-microvm.github.io/
In the next week, I'm hoping to get all the rust-vmm issues I've
discovered fixed, and maybe get cloud-hypervisor's backends to speak
proper vhost-user. I'm getting my second vaccination next week as well
though, so we'll see. I might just end up being sick.
Hi, it's been a long time since I've done one of these, but so much
interesting stuff happened this week, and there's been a bit of renewed
interest in the project, that it _really_ needed an update. So here it
is. As always when I'm not keeping up with TWiS, this update is limited
to only things that happened in the last week, because otherwise it
would just be far too long.
Wayland
-------
A problem I've been thinking about for a long time is Wayland access
control. The Linux desktop ecosystem is moving towards access controls
for most functionality through xdg-desktop-portal[1] (which sadly
doesn't seem to have a website). But it doesn't cover stuff that's part
of the core Wayland protocol, like access to the clipboard. And some
compositors (wlroots-based ones) provide extra Wayland protocols for
things like screenshots. And up to now, Wayland compositors haven't
really done any authorization for these potentially dangerous APIs.
(All of this is only really meaningful for sandboxed applications --
running in Flatpak or a Spectrum VM or something. Linux isn't
really set up to do separation between processes running as the same
user without namespaces.)
So earlier this week, I posted[2] to the Wayland mailing list with an
idea I'd been thinking about for a while[3], which was to place a proxy
program in front of the Wayland compositor, that would intercept
client->compositor requests and handle access control. I was quickly
convinced that a proxy wasn't a good idea, but there was a lot of
discussion, and it was really helpful to me figuring out what the right
way to do it might be.
There are really two problems to be solved here, one of which I hadn't
even thought much about. The first is securely identifying a Wayland
client. A compositor needs to be able to form the question "Should
client X be able to do this?", and to do that it needs to be able to
identify a client as client X, and know if it tries to interact with
client X later, that it'll be talking to the same client. In a
non-virtualized system, the obvious way to do this would be getting the
pid of the client from the connection socket and then looking it up in
proc(5) to find out executable path, but this approach is fundamentally
insecure[4].
The way forward here (and one that would work for Spectrum) appears to
be the proposed Security Contexts protocol[5], which would allow a
sandbox implementation to provide a security context identifier for a
client before handing off the Wayland connection to the client. Once
the security context had been set up, it wouldn't be allowed to be
modified, so once the sandboxed client was given the connection, it
wouldn't be able to change the security context identifier.
In Spectrum, the security context identifier here would likely be a
unique, user-provided name for the VM, and the security context setup
would be done by the virtio wayland implementation in the VMM.
The second part of this puzzle is how the compositor should decide
whether a client should be allowed to perform a particular operation,
like a paste or going fullscreen (which is risky because it might spoof
a desktop).
There was actually a previous attempt at this a long time ago.
libwsm[7] (short for Wayland Security Module) was a library that
compositors would have integrated to make authorization decisions. But
it wasn't adopted by compositors, and it some things we now know[8] to
be bad ideas, like setting policy based on the executable path. It also
made compositors responsible for any sort of authorization UI. In my
opinion, it's better to have that done by the external piece, so that
compositors have as little work to do as possible and therefore
authorization is implemented as widely as possible.
The compositor could implement an authorization system entirely on its
own, but this would be a lot of code for each compositor to write, and
it would limit the user to whatever permissions system the compositor
came up with, which might not be able to accommodate their needs. (An
example of this would be a Spectrum user that wanted to allow pasting
between two applications, but not allow pasting between either of those
and any other application.)
It could also be implemented by having the compositors integrate with
something like polkit[6], but compositor authors are reluctant to
integrate directly with a single system like that, and even polkit might
not support everything that would be desirable in an authorization
system. (For example, it might be nice to implement libwsm's concept of
a soft allow, where an action is permitted, but a notification is shown
so the user is aware it's happened.)
So a third solution, suggested by ifreund, a Wayland compositor author,
in the Spectrum IRC channel, would be to have a privileged Wayland
client that the compositor could ask authorization questions to, with a
new protocol. The compositor would know that this particular client
should be privileged because it either wouldn't be in a security
context, or would be given a special security context identifier known
to the compositor ahead of time. Then, implementations of this protocol
could do authorization however they wanted, with the only limitation
being the questions the compositor was programmed to ask them. I think
this is a good way forward, but it'd be important to discuss with more
compositor authors before getting to excited about it.
Next steps from here with Wayland are:
* Figure out what needs to happen to move the security contexts
proposal forward. If it needs an experimental implementation, maybe
we could help with that?
* Inquire about the authorization protocol idea, and see how other
compositor offers would feel about it. If there's a generally
positive reaction, figure out how to move forward with it.
[1]: https://github.com/flatpak/xdg-desktop-portal/
[2]: https://lists.freedesktop.org/archives/wayland-devel/2021-July/041896.html
[3]: https://spectrum-os.org/lists/archives/spectrum-discuss/8735ueudel.fsf@alys…
[4]: https://gitlab.freedesktop.org/wayland/weston/-/issues/206#note_176699
[5]: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/68
[6]: https://www.freedesktop.org/software/polkit/docs/latest/polkit.8.html
[7]: https://github.com/mupuf/libwsm
[8]: https://lore.kernel.org/lkml/87r1xplsku.fsf@x220.int.ebiederm.org/
Nixpkgs
-------
I created a branch in Spectrum's nixpkgs repository[9] for the
long-overdue merge with upstream, did the merge, and posted some
patches[10] to spectrum-devel that fix all the builds that broke as a
result. I've been letting them sit for a few days hoping for a review.
Once that's done, it's time for another chromiumOSPackages upgrade, but
that should be pretty easy this time because we're only one version behind.
[9]: https://spectrum-os.org/git/nixpkgs
[10]: https://spectrum-os.org/lists/archives/spectrum-devel/20210729100928.196534…
Thanks for keeping up with Spectrum. :)
Surprise!
I sent the last This Week in Spectrum in August. Since then, it's
been pretty quiet. What happened?
Towards then end of last year, I was aggressively pushing to achieve a
funding milestone, but was starting to struggle to keep myself going.
After that last TWiS, I decided to stop posting them for a while to
fully concentrate on the push. I continued like that for two more
months, before getting to the point where I was so burned out I just
couldn't keep going, with the milestone still not done, the work I had
done so far on it unpublished (and not in a good state to be
published), and time running out.
Fortunately, I was able to talk to NLnet, and they agreed to extend my
funding deadlines and renegotiate the milestones. With that sorted
out, in late November I began a hiatus from Spectrum to allow myself
to recover from the burnout. I'm very grateful to my GitHub Sponsors,
thanks to whom I was able to take this time to recover.
I tried to come back at the end of January, but after a single day of
Spectrum work I was feeling much the same as I had been in November.
This week, though, something changed! I've been working on Spectrum
again for the last week, and I've been feeling great about it!
Now, there were three months of work between the last TWiS in August,
and the start of my hiatus. That's a lot of work I haven't written
about. But I think the only way this is really going to work is if I
leave that largely unwritten about for now, because TWiS tends to be
thousands of words long even when it's covering a single week of
work. So I'm just going to talk about what I've done in the last
week, explaining context from previous work as necessary.
With all that in mind, here we go!
"Qubes-lite with KVM and Wayland"[1]
------------------------------------
First up is something somebody else did, which is always nice! Thomas
has been doing his own great work replacing his Qubes system with a
system built on NixOS, using KVM and Wayland.
Thomas got in touch on the Spectrum discuss mailing list in January[2]
asking some questions about how some things were implemented, and we
went back and forth a bit then (although not as much as I'd have
liked, due to my burnout :/). Since then, he's come up with some
awesome stuff, like an OCaml implementation of a virtio-wayland proxy
(equivalent to Sommelier in Chromium OS).
In our further conversation prompted by Thomas's article, Thomas
raised a further interesting idea[3] -- what if we had a
Sommelier-like Wayland proxy, but that ran in the same security domain
as the compositor. That way, we could filter to allow only the
protocol extensions we want to. We could implement our own
permissions when the compositor doesn't. This code would be
responsible for securely handling Wayland messages from an untrusted
guest, which would be beneficial because it would be a lot less code
to audit, and because it could be written in a memory-safe language.
This would avoid the need to write or find a secure Wayland
compositor, and would in fact allow us to use virtually any compositor
without having to worry nearly as much about its security credentials.
I'm very excited by that idea, and I've been asking myself for the
past few days why I didn't think of it before! I've thought several
times "it's quite nice that Sommelier gives us a known subset of
Wayland, but it's a shame we can't rely on it since it runs inside a
application guest". But for some reason it just never occurred to me
to follow that idea one step further, to "what if we had a similar
program that run in the compositor's security domain".
[1]: https://roscidus.com/blog/blog/2021/03/07/qubes-lite-with-kvm-and-wayland/
[2]: https://spectrum-os.org/lists/archives/spectrum-discuss/CAG4opy_hz_ESEpY0Tq…
[3]: https://spectrum-os.org/lists/archives/spectrum-discuss/CAG4opy_JztAH3tD+ZU…
uscpi-vsock
-----------
ucspi-vsock[4] is a program I started writing in September. VSOCK[5]
is a special Linux socket type that allows for communication between
VMs, and I'm using it to allow services running in VMs (a USBIP
daemon, for example), to notify the host (and by extension other VMs),
when the service in the VM becomes ready to accept connections.
USCPI[6] is a standard command interface for making it easy to do
socket communication using standard IO primitives, without having to
teach programs how to do socket connections for every kind of socket.
There are several UCSPI implementations for common socket types like
Internet and Unix domain sockets, but until now there was not one for
VSOCK.
So, for example, in a VM we can do:
vsockclient 2 $port sh -c 'echo >&7'
Which will connect to the host over VSOCK (in VSOCK the host is always
address 2), write a newline, and close the connection. UCSPI lets us
implement this with standard shell tools. There's an equivalent
"vsockserver" program for the other end, that will run a command
whenever a connection is received.
Anyway, this week I went back to ucspi-vsock to fix some
bugs[7][8][9][10]. As usual, (although I'm happy to realise this is
still the case after all these months!) Cole was kind enough to review
my patches.
[4]: https://spectrum-os.org/git/ucspi-vsock/
[5]: https://man7.org/linux/man-pages/man7/vsock.7.html
[6]: https://cr.yp.to/proto/ucspi.txt
[7]: https://spectrum-os.org/lists/archives/spectrum-devel/20210309154048.14474-…
[8]: https://spectrum-os.org/lists/archives/spectrum-devel/20210309171816.8589-1…
[9]: https://spectrum-os.org/lists/archives/spectrum-devel/20210310204516.20041-…
[10]: https://spectrum-os.org/lists/archives/spectrum-devel/20210310204555.20725-…
Interguest networking
---------------------
The bulk of my work this week has been in Spectrum's Nixpkgs, which is
where (for now, at least) all the VM definitions and stuff live.
Mostly, I've been trying to clean up months of frantic work into
something I can actually publish as a series of patches, so it's out
there instead of just on my computer!
My focus at the moment is on doing this with the interguest networking
implementation I wrote last year. The plan for this remains the same
as last year: have a VM that manages all network hardware access and
acts as a router for other VMs that need to talk to the outside world.
Eventually, this will hopefully use virtio-vhost-user[11], which will
allow us to avoid going through a networking stack on the host at all
(or even having one), but until virtio-vhost-user is further along,
we'll use bridge devices on the host to connect client VMs to the
router. A very basic version of the latter is implemented and working
for me locally -- I can run an application VM, and have it connect to
another VM which runs the drivers for my ethernet port using VFIO (PCI
passthrough).
This code was a big ball of mud, and I'd also subtly broken it when I
moved on from interguest networking last year to try to get a proof of
concept going for another form of interguest communication. So I
spent this week trying to page all the context I'd lost in the last
few months back into my brain, making lots of partial git commits,
writing commit messages, and fixing it up so that it works again.
I think what I have is about ready to publish -- hopefully you'll see
some patches next week. But what exists at the moment is still very
limited -- the biggest limitation is that the router VM doesn't
support hotplugging, so a VM that wants to connect to it actually has
to be started first, before the router VM. Not good for something
that should be run as a system service and likely started on boot!
To overcome this limitation, I'll probably have to add support in
crosvm for adding network devices at runtime using the control
socket. I think this should be easy enough, and I haven't looked at
what's happened in crosvm + rust-vmm since I've been away, so it's
even possible somebody else will have done this already.
[11]: https://wiki.qemu.org/Features/VirtioVhostUser
So, the todo list for next week is getting the initial interguest
networking PoC patchset posted. That'll come with a nice little demo
other people will be able to try out if they want to. And then, I'll
work on improving it further to the point where it's actually
practical. I don't want to spend /too/ much time on this, since
ultimately I do want to be using a virtio-vhost-user stack, which is
quite different, but it's important to have an implementation of this
so there's something to test further development (that doesn't care
about the implementation details) against.
My biggest worry at the moment is burning out again. Working in
moderation isn't easy for me (I tend to get very sucked into things),
but it's important for the project that I find a way to keep my work
sustainable, so we don't lose time like this again.
I'm still a little hesitant to say "this is it, I'm back for real".
But the signs are looking good.
Thank you for sticking with me through this.