Earlier I was sent a link to a paper called "FlexOS: Towards Flexible OS
Isolation". I thought the paper was great and subscribers to this
list might be interested in it and what I thought about it.
Like Spectrum, FlexOS is an operating system focused on
compartmentalization, but it takes a very different approach. It's a
library operating system (LibOS, also known as a unikernel). These
operating systems are linked with a single program, and produce an
operating system that only runs that single program. This can make for
very efficient VMs. As a result, FlexOS isn't about isolation between
programs, like Spectrum is, but instead tries to introduce isolation
within a single program, generally at library boundaries. Therefore,
Spectrum and FlexOS are highly complementary.
FlexOS can use several different isolation strategies, including running
different libraries in different VMs. Naturally there's a big
performance overhead to this but amazingly, it seems like it can be on
par with running the program normally on Linux, because the baseline
performance is so much faster running on a unikernel than on Linux (not
even in a VM AFAICT). Of a SQLite benchmark, the paper says: "Somewhat
surprisingly, FlexOS with EPT2 [libraries in different VMs] performs
almost identically to Linux. This is because the syscall latency is
almost identical to the EPT2 gate latency on this system".
Another inspired part of this paper is "Exploration with Partial Safety
Ordering". I don't want to go into too much detail about this because
the paper does such a good job of explaining it, but essentially you can
give FlexOS a performance budget and a benchmark, and it will identify
for you five or so of the best security configurations that meet your
performance needs, across different combinations of isolation primitive,
software hardening, etc.
The biggest limitation I can see to FlexOS is that it is based on C
source code transformations. Programs need to be specifically ported to
it (although it seems like porting an individual program isn't too hard
as long as it has good test coverage), and I don't know how easy it
would be to integrate with another language. But it's definitely
something to keep an eye on that could be useful if we ever find
ourselves with a big C program doing important work in Spectrum. The
use cases presented in the paper are quite compelling, especially the
one about isolating exploitable libraries. And unikernels more
generally are definitely worth keeping in mind if the performance
results in this paper are anything to go by, although A Linux in
Unikernel Clothing is an interesting counterpoint.
Finally, reminder that I collect interesting papers, blog posts, talks
etc. in the Spectrum bibliography, where I've just added a link to
this paper. If you come across interesting stuff like this that might
be relevant to my work, please send it my way!
I've been running Qubes for a few years now and I'd like to give
Spectrum a try, as I've been having some hardware and performance
problems with Qubes. Is there some up-to-date guide I can follow? I
found https://alyssa.is/using-virtio-wl/#demo and was able to see the
weston terminal. I also tried updating to the latest commit and was
able to get a nested wayfire window with:
nix-build . -A spectrumPackages && ./result-3/bin/spectrum-vm
(I'm fairly new to Nix, so not sure if this is the right way to do things)
I managed to change the keyboard layout, mount a tmpfs for home, and
increase the memory enough to start firefox, but I haven't managed to
get much further. Things I tried so far:
- I tried replacing wayfire with weston-terminal, to avoid the nested
session. But sommelier segfaults when I do that.
- I tried adding `--shared-dir /tmp/ff:ff:type=9p` to share a host
directory. Then `mount -t 9p -o trans=virtio,version=9p2000.L ff /tmp`
in the VM seemed to work, but `ls /tmp` crashed the VM.
- I tried using `-d /dev/mapper/disk` to share an LVM partition, but
`mount -t ext4 /dev/vdb /tmp` refused to mount it.
- I tried enabling networking with `--host_ip 10.0.0.1`, etc, but it
said it couldn't create a tap device. I guess it needs more
Ideally, I'd like to run a VM with each of my old Qubes filesystems,
to get back to where I was with my Qubes setup, before investigating
new spectrum stuff (e.g. one app per VM). Do you have any advice on
this? I see these lists are a bit quiet - I hope someone is still
working on this because it sounds great :-)
talex5 (GitHub/Twitter) http://roscidus.com/blog/
GPG: 5DD5 8D70 899C 454A 966D 6A51 7513 3C8F 94F6 E0CC
Table of Contents
1. Display controllers and renderers
2. Overview of GPU virtualization technologies
.. 1. Direct passthrough
.. 2. Hardware virtualization and paravirtualization
..... 1. Intel
..... 2. NVIDIA
..... 3. AMD
..... 4. ARM
.. 3. Virgil 3D
3. Seamless windowing
.. 1. virtio-gpu 2d
.. 2. virtio-wl
.. 3. virtio-gpu context types
4. Putting it together — GPUs in Spectrum VMs
This report documents the current state of the GPU virtualization
ecosystem as it relates to Spectrum.
GPU virtualization is a very controversial topic in the world of
compartmentalized operating systems. Qubes doesn't support GPU
acceleration of anything but the GUI VM (the equivalent of Spectrum's
compositor VM), and (I think as a result of this) a way to provide GPU
accelerate to applications is *the* most requested Spectrum feature by a
But there are serious security concerns with GPU virtualization. GPUs
have a frankly awful track record when it comes to isolation. This
might improve going forward, especially with SR-IOV-based solutions,
because business demand for secure GPU virtualization is increasing and
GPU vendors are trying to meet that demand. For example, AMD says of
their GPU virtualization implementation:
The hardware-enforced memory isolation logic provides strong
data security among the VFs, which helps prevent one VM from
being able to access another VM’s data.
With security being a bare minimum requirement for any
virtualization solution, AMD’s hardware-based virtualized
GPU solution offers a strong deterrent to unauthorized users
who traverse the software or application layers seeking
means to extract or corrupt GPU user data from the virtual
machines. Although a VF can access full GPU capabilities at
its own GPU partition, it does not have access to the
dedicated local memory of its sibling VFs.
But it remains to be seen whether these claims will stand up in
practice, especially because SR-IOV is not yet widely available. The
alternatives (software multiplexing and GPU emulation) have both seen
exploitable security issues.
Because of this, my current plan is that access to the GPU will be
highly restricted in a default Spectrum system. I do think, though,
that there needs to be a way for users to opt in to using whatever
virtualization features their hardware provides, because I've heard so
many accounts from people who find themselves unable to use
compartmentalized systems /at all/ because of certain tasks where they
need graphics acceleration. My primary goal with Spectrum is to bring
compartmentalized computing to people who are not currently using it,
and to people who're currently using mainstream systems where any
application they run might be stealing their private SSH keys or
ransomwaring all their files, being potentially vulnerable to zero-days
in GPU drivers or hardware is the least of their problems.
1 Display controllers and renderers
Standalone GPUs, and Intel integrated GPUs, tend to be display
controllers and renderers all in a single package. The distinction
between them will be important, though, so briefly:
A *display controller* is the hardware that takes care of making a
pixel grid show up on a screen.
A *renderer* does computation (3D renderering, etc.) to create an
image that can be given to a display controller to display.
(Naturally in can also be used for other computations as well.)
2 Overview of GPU virtualization technologies
2.1 Direct passthrough
Direct passthrough is the simplest and least interesting option for
accelerated graphics in a VM. The host simply gives the VM control
over a whole GPU. Since we're aiming for one VM per application
instance in Spectrum, direct passthrough might be useful if you have a
dual GPU system, and want to accelerate one particular application,
like a game. Of course, direct passthrough is also the most secure
sort of GPU virtualization — data leaks are far less likely to happen
when each GPU user has their own dedicated hardware! We can
definitely support this.
2.2 Hardware virtualization and paravirtualization
By *hardware virtualization*, I mean that the virtualization is
implemented by the GPU itself, and by *paravirtualization* I mean that
the host kernel implements virtual GPUs by multiplexing.
The standard way of doing hardware virtualization is called SR-IOV,
and it's widely used today with server network cards, and to a lesser
extent NVMe drives. It's not widely available on GPUs, especially not
consumer ones, but it seems like that's about to change.
When using this sort of virtualization, all that's being virtualized
is the renderer. The display controller stays attached to the host.
I'm not aware of any implementation for hardware that is both a
display controller and a renderer that supports separately passing
through a display controller to a VM.
Intel's GPU paravirtualization technology is called GVT-g. Unlike
solutions from other GPU companies, GVT-g is very widely available —
it's supported for integrated graphics starting from Broadwell
(launched 2014). GVT-g is definitely something we can support in
Spectrum, but one limitation I found is that on my laptop (a Google
Pixelbook), I can create at most two vGPUs, so don't think you're
going to be able to have every application using accelerated graphics
using GVT-g even if you're okay with the security implications.
The most recent Intel GPUs no longer support GVT-g — they have
hardware SR-IOV implementations instead. I don't believe this is
supported in Linux yet, but I imagine it will be at some point.
NVIDIA officially supports GPU virtualization for most of their
datacentre and professional GPUs, through their proprietary driver.
Because of NVIDIA's general allergy to publishing code or
documentation, I don't know how this GPU virtualization is
implemented, but my guess is that it's mostly in hardware, because
their product briefs claim their cards support SR-IOV. There also
exists third-party software that can modify NVIDIA's software to
support some consumer cards.
The GPUs that support it tend to support between 16 and 32 virtual
Obviously, running a proprietary driver in the Spectrum host kernel
would have /horrible/ security implications, and I will always
strongly advise against it. But, we probably could make it work if we
There are five AMD GPUs that implement hardware (SR-IOV)
virtualization. The Radeon Pro V520 is "only available as a public
cloud offering" and so isn't a product you can actually buy. The
Radeon Pro V340 is only supported by VMware, not AMD's open source
Linux driver. The AMD S7100X, S7150, and S7150 x2 are all supported
with KVM, but none of them have been produced since 2016. It would
certainly be possible to support those last three in Spectrum, but due
to just how obscure AMD GPU virtualization is, I don't see myself
prioritizing it any time soon.
There's single ARM GPU that supports hardware virtualization, the
Mali-G78AE. It's intended for "automotive and industrial"
applications, so I doubt it's going to show up in any hardware anybody
wants to run Spectrum on.
2.3 Virgil 3D
Virgil 3D takes a different approach, by doing entirely software-based
GPU virtualization. It implements OpenGL in software, in a way that's
still backed by the host system's hardware GPU. This is naturally
less performant than hardware virtualization or paravirtualization due
to the extra overhead, but the advantages are that you can get a lot
of virtual GPUs out of it, since every VM is just another application
using the GPU on the host, and that this approach works even when
there isn't special support for virtualizing a particular GPU.
3 Seamless windowing
One of the earliest features that Spectrum committed to was a seamless
windowing experience, where applications running in different VMs
would appear as native windows in the same Wayland compositor.
3.1 virtio-gpu 2d
The simplest way to do seamless windowing is to run a Wayland
compositor in each VM, and have it render windows to a virtual 2D GPU,
where they can be displayed by a host compositor. To the host
compositor, windows are basically just opaque rectangles. This is
effectively how WSLg, the Windows Subsystem for Linux GUI, works.
This works okay, but you lose the ability to have any integration
between the application windows and the window manager they're being
displayed on. It's not actually seamless. That's why, in screenshots
of WSLg, you see Weston window chrome, instead of Windows ones. The
WSLg developers can either choose to draw Windows window decorations
around every application (which will look bad for GNOME programs that
draw their own decorations), or none of them (which means they have to
use Weston's decorations, or there'd be windows with no decorations),
because they can't directly speak the client-side decoration
negotiation protocol to the applications. This particular use case
doesn't matter for Spectrum, because CSD is incompatible with
unspoofable window decorations, but the point is that if you're not
letting applications and window managers speak Wayland to each other,
you need to come up with ad-hoc protocols every time you /do/ want
them to be able to cooperate.
Fortunately, there's another approach to seamless windowing, that
preserves Wayland as the method of communication between applications
and window manager.
The original plan for seamless windowing in Spectrum was to use
virtio-wl, a technology from Chromium OS. virtio-wl (plus Sommelier,
a component that runs in guest userspace) presents a socket-like
interface, over which Wayland clients running in VMs can send and
receive arbitrary data, as well as file descriptors, to and from a
Wayland compositor running on the host.
Because implementing support for sending arbitrary file descriptors
out of a guest would be impossible, only certain types needed by the
Wayland protocol are supported — pipes and host-allocated shared
memory. This worked fine for Wayland, but because of how
Wayland-specific it is, virtio-wl was never considered suitable for
This was a big problem for Spectrum specifically, because in virtio-wl
there is no way for guests to share memory with the host. In
virtio-wl's intended use case, the compositor runs on the host,
clients run in guests, and the compositor handles all memory
allocation, so memory only needs to ever be shared in one direction,
host → guest. But in Spectrum, we'd like to have the compositor in
its own VM, so we'd need some extra mechanism to let the compositor VM
allocate host memory.
3.3 virtio-gpu context types
Google's second attempt at seamless VM Wayland windowing is a new
mechanism called virtio-gpu context types. Despite the "gpu" in the
name, there's not (necessarily) any GPU involved — it's just
convenient to use virtio-gpu as a transport since it already has
primitives for sharing memory between guest and host.
Context types allow new protocols, like Wayland or Vulkan, to be sent
over virtio-gpu, in addition to the 2D and Virgil 3D protocols it was
designed for. Context types should be available in Linux from v5.15
(the next release at the time of writing). Support for Wayland over
virtio-gpu has already been implemented in crosvm and Sommelier.
Even more excitingly, virtio-gpu has a mechanism by which guest
userspace can do a special memory allocation that can be shared with
the host! So we won't need to add any special memory allocation
mechanism just for Spectrum — we'll just need to make sure the
compositor VM userspace knows how to allocate memory and send it to
the host in this way.
4 Putting it together — GPUs in Spectrum VMs
After extensively researching all this, my current plan for graphics
in Spectrum is as follows: each VM that runs a graphical application
will get a virtio-gpu device, over which it will be expected to speak
Wayland to the compositor. VMs with accelerated graphics (which will
be opt-in per-VM) will additionally get another GPU device to use for
rendering — this could be, for example, a GVT-g vGPU, or it could be a
Virgil 3D virtio-gpu. They can use that accelerator (or software
rendering if they don't have one) to render window contents into a
buffer that can then be shared with the host (and by extension the
compositor VM) by way of Wayland over virtio-gpu. Rendering on a GPU
while using Wayland over virtio-gpu isn't currently supported by
Sommelier, but Google has it planned.
A further problem we need to solve is how the compositor renders to
the display controller. If we can pass the display controller through
to the compositor VM (this might be a possibility on some ARM
systems), all's good. But otherwise, we'd need to render to a virtual
2D GPU, and then have the host display that on the actual display
controller. This is something that could probably be implemented
fairly easily in crosvm's virtio-gpu stack — it already supports
showing output as Wayland or X11 windows. And in the meantime, we
could cheat by running the world's simplest Wayland compositor on the
host, which would just show the crosvm window and take care of talking
to the display controller.
Graphics and windowing is definitely something that we'll be iterating
on for a very long time. We can get started with the happy path
Google already has working, with the compositor on the host. From
there, we can move the compositor into a guest, with direct GPU
passthrough. And then we can start looking at GPU virtualization.
This field is evolving /extremely/ quickly, so by the time we get
there things will undoubtedly have moved on. That means it's very
important not to work too much on it too early, because that work has
a very high chance of being obsoleted shortly afterwards. The purpose
of this report is to demonstrate to the community that GPU accelerated
applications in Spectrum will be possible by presenting how they would
be implemented if I were to do so today. But in reality, it makes
much more sense to focus on almost everything else first, because no
other area of virtualization is moving as fast as this one.
 (This would be direct passthrough, because if you have a graphics
card with one DisplayPort output, there's physically not much you can
do to share that output between multiple VMs.)
 I actually implemented this, before Wayland-over-virtio-gpu
emerged as a viable alternative that didn't need an extra mechanism
Hi everyone, especially people who're new to the project after seeing it
on Hacker News recently.
Those of you who've been around for a while will remember that the end
of last year was a busy time for Spectrum because of funding cycles, and
that's going to be the case again this year.
As a result of this, I'm going to have my head down until the end of the
year, and am not planning on doing more This Week in Spectrum this year.
I love TWiS, but it takes hours to write every week I manage to get it
done, and I need to spend those hours elsewhere at the moment. It'll be
back next year.
If you do still want to keep up with development (and I hope you do!),
I'm trying to talk in more detail about what I'm working on in the
#spectrum IRC channel. There's been a good reaction to this so far, and
the conversation has been extremely lively for the last few days. There
are consistently over 120 people in the channel now, which is amazing!
You can find information about how to join the channel through Matrix or
IRC on the website.
Hope to see you there!
What a week. Progress has felt a bit slow, but the work has been
consistently interesting, and there have been some exciting developments
in the ecosystem.
Last week I'd posted the patches required to get all of Spectrum working
with a more up-to-date Nixpkgs. After last week's email, Cole reviewed
the patch set, so I applied it, and merged the nixpkgs-update branch
into master, so now Spectrum is using a reasonably up-to-date Nixpkgs
for the first time in a long time.
I also said
> Once that's done, it's time for another chromiumOSPackages upgrade, but
> that should be pretty easy this time because we're only one version behind.
Alas, it's never that simple.
Firstly, the chromiumOSPackages update script broke, because the
information about the currently released Chromium OS build seems to have
gone missing. Google is apparently serving build number 13982, but
their published build metadata includes builds 13981 and 13983, but
not 13982. This means it's not possible for me to know what Git
revisions are used in the currently released Chromium OS. Assuming this
is just a one time thing, I hacked the update script to just look at the
previous build, but we should keep an eye on this. If it ever happens
again we should probably implement some sort of mitigation in the update
Once I had new versions of the Chromium OS packages it was time to get
them to build, which was straightforward enough for everything except
crosvm. For Spectrum, we have a patch for crosvm to make it support
VIRTIO_NET_F_MAC, which is a mechanism by which the host system can
indicate to the guest kernel what it should set as the MAC address of a
virtual network device. After the update, this patch no longer applied,
because all of a sudden crosvm has two different virtual network device
This turns out to be because crosvm has implemented vhost-user, a
protocol to allow virtual devices to be implemented outside the VMM
program! This is great news, but it's also surprising. virtiofs was
designed to be implemented with vhost-user, but when crosvm implemented
it a while ago, they became the only implementer to do so in-VMM. It's
great to see them moving in the vhost-user direction, because it makes
it much easier to mix and match virtual device implementation and VMMs.
Most excitingly, I saw a reference to vhost-user-wl, meaning a
standalone implementation of Virtio Wayland. This would allow us to use
Virtio Wayland with other, non-crosvm VMMs, which is great because I
think cloud-hypervisor is probably going to end up being a better fit
for Spectrum, but Virtio Wayland was crosvm's killer app. I'd even
thought about trying to port crosvm's virtio wayland implementation to
vhost-user myself, so it's great to know that when the time comes,
that'll already have been done for me.
While I could have fixed my crosvm patch for all this new virtual
network device code, the introduction of vhost-user support means we
should be able to drop the patch altogether. cloud-hypervisor provides
a vhost-user-net implementation that already supports
VIRTIO_NET_F_MAC, so if we can just get crosvm to talk to
cloud-hypervisor's vhost-user-net implementation we shouldn't have to
carry any patch for this any more.
cloud-hypervisor / rust-vmm
So I started looking at cloud-hypervisor to try to hook this all up, but
it looks like cloud-hypervisor still has some issues where it doesn't
quite follow the vhost-user specification. As I was trying to debug
this, I noticed some UB in rust-vmm (the shared utility code project
for crosvm and its derivatives like cloud-hypervisor and
Being a good citizen, I started working on a fix for this, and then I
encountered some more issues with functions that should have been marked
as unsafe but weren't. So that's going to need to be fixed too.
But the rust-vmm code is otherwise very high quality and easy to work
with, and they're very responsive, so it shouldn't take long to get
these issues fixed. The affected code is fortunately all to do with
communication between vhost-user backends and the VMM, so it's very very
unlikely that it's anything that could be exploited by a guest.
These sorts of problems are exactly the sort of thing that Rust is
supposed to prevent, so it was disappointing to discover these issues.
But on the other hand, Rust made them stand out like sore thumbs to me,
so even though these issues managed to sneak in, there's definitely
still a huge benefit to using Rust for these sorts of programs.
In the next week, I'm hoping to get all the rust-vmm issues I've
discovered fixed, and maybe get cloud-hypervisor's backends to speak
proper vhost-user. I'm getting my second vaccination next week as well
though, so we'll see. I might just end up being sick.
Hi! I've had a busy day today and am pretty tired, so I'm not sure how
coherent my writing is at the moment. But I'd rather get this out on
time, especially since tomorrow is also busy today.
As I said last week, I took some time off this week as a preventative
measure against burnout.
One of my patches was accepted, but another is still waiting. As
I said last week, I have more fixes for rust-vmm planned, but want to
let them catch up with the changes I've already sent upstream first so I
know what base I'm using for the next stuff. I expect the one that's
still waiting to be accepted next week.
Last week, I'd just integrated dm-verity into the Spectrum live image
I've been working on. When it came time to work on the actual root
filesystem, instead of the initramfs, I hit a bit of a brick wall. I
realised that trying to generate a whole operating system image using
Nix was giving me real writers' block. There was too much inbetween me
and how the files ended up on disk, and that meant there was too much
overhead to keep in mind when I was thinking about how things should be
designed and laid out. It might feel like making a Linux root
filesystem should be a solved problem, but Spectrum has a bunch of
special requirements. You might want to do something like start a VM
for each hardware device of a specific type, and that's something that
isn't really addressed by most standard stuff. All this stuff is
definitely solveable, but it requires some experimentation to get right,
and Nix was getting a bit in the way of that.
So I created a new directory, and I wrote a Makefile that builds an
ext4 image, and I just started putting files in an etc/ directory. This
made reasoning about the system way easier, and I was immediately making
Nix is great for building known targets, and for making customisable
systems (there's no way my Makefile-based system would allow the amount
of customisation I'd easily be able to provide with Nix), but for
experimentation, it's a lot nicer to be closer to the end product. So
once I know how all this should look I'll make it Nix-aware.
Currently, I have a root filesystem with a service manager that can
respond to hardware appearing and disappearing. The next (more exciting)
step will be to have it start some VMs, and assign hardware to them
appropriately. I'm looking forward to getting to that next week. An
interesting challenge I'll have to solve will be figuring out simple
categories (e.g. "ethernet device") from the huge amount of very
specific information the kernel provides. I think I might be able
abuse the modules.alias file from the kernel, that defines the mappings
from PCI etc. information to default drivers. Then all I'll have to do
will be to write mappings from default drivers to whatever categories I
come up with, or what VM I want to assign them to, or whatever.
One neat thing I'm using for the first time here is tar2ext4, a
utility program that's part of a larger Microsoft open source project I
don't entirely understand the purpose of. It's really useful for me
it builds ext4 images entirely in userspace, which will be great for
using in Nix derivations where it's not possible to just mount an ext4
image and write directly to it. For previous Spectrum experiments, I'd
always used SquashFS, entirely because I already knew of a tar2sqfs
program that made creating filesystem images really easy. I've added
tar2ext4 to Nixpkgs, which will hopefully help other people who have
similar problems discover it.
While I was working on the root filesystem, I noticed that mount -a,
which mounts all filesystems described in fstab(5), wasn't working.
This turned out to be because of a bug in Musl's implementation of
getmntent(3), a libc function for parsing fstab files.
So I wrote some tests and a fix and sent them to Musl.
Hope that all made sense.
Next week, I'll continue working on Spectrum live, and maybe fix the
next rust-vmm issue I have on my todo list if my final outstanding PR
It's a short update this week, because most of what I did was a
continuation of stuff from last week.
Last week, I mentioned I'd identified some Rust safety issues in
rust-vmm. Most of the patches for these are now up. The first
has been accepted already, and I expect another to be accepted later
today. There's still a UB issue I'm aware of and haven't sent a fix for
yet, because there are a number of ways to fix it and I wanted to get my
other patches in first before I decided how to fix that one.
I deliberately haven't made any progress on using cloud-hypervisor's
vhost-user-net backend with crosvm, which is what got me looking at this
code in the first place. I want to make sure I can work on
rust-vmm-adjacent things at a pace where I don't get overwhelmed with
having to keep track of loads of patches and whether I've got them
upstream yet. So I'll be putting that work on hold until the current
round of patches are upstreamed.
For the past little while, in the time when I wasn't writing regular
updates, I've been working on a live system for testing Spectrum. This
will be especially useful for testing things like GPU support, because I
can just build a live image with everything I might need, plug it into
all the computers I want to test, and have everything be automatic from
there. It will also probably evolve directly into what becomes the
Spectrum base system that we'll hopefully all be running as the host
system on our machines at some point.
I shifted my focus back to this this week because of wanting to not get
ahead of myself with rust-vmm. (I have a funding milestone for GPU
support, so getting that checked off soon would be good.) The main
thing I did this week was integrate dm-verity, which I did mostly for
fun and to satisfy my curiosity.
dm-verity is a Linux mechanism to efficiently ensure that a read-only
filesystem hasn't been tampered with, by constructing a Merkle tree out
of filesystem block hashes, and providing the root hash to the kernel
when the filesystem is mounted. dm-verity is a _great_ fit for Nix,
because we can generate the hashes at the same time as creating the
filesystem image, and then embed the hash into the initramfs we're also
building. Getting this all working took less than a day. The idea is
that (long) in the future, we'll also implement Secure Boot, which will
make sure the kernel and initramfs haven't been tampered with, and
dm-verity will extend that integrity guarantee to the host system's root
filesystem. I recommend reading "Producing a trustworthy x86-based
Linux appliance" by Matthew Garrett for an overview of how this all
dm-verity is something that's particularly exciting to me, because it's
very useful to us, but it's something that's generally used to frustrate
end user attempts to control computers they own. In Spectrum, it's
instead a tool that protects the end user against malicious filesystem
changes, while being almost completely transparent to the user if they
do want to modify their own system.
Protecting against root filesystem tampering (which would require a VM
escape or physical device access) is hardly the biggest priority for
Spectrum, but integrating dm-verity was fun, interesting, and provided
good motivation for working on the live image, which is one of the
highest priority bits of the system. (Because I'm tired of having to
say "you can't" when people ask me how they can try out Spectrum.)
This week, I'm going to take a bit of time off as an anti-burnout
defense, but probably not the whole week. I'll still keep an eye on the
rust-vmm patches throughout this time as well, to make sure they're not
delayed in getting accepted upstream.
Hi, it's been a long time since I've done one of these, but so much
interesting stuff happened this week, and there's been a bit of renewed
interest in the project, that it _really_ needed an update. So here it
is. As always when I'm not keeping up with TWiS, this update is limited
to only things that happened in the last week, because otherwise it
would just be far too long.
A problem I've been thinking about for a long time is Wayland access
control. The Linux desktop ecosystem is moving towards access controls
for most functionality through xdg-desktop-portal (which sadly
doesn't seem to have a website). But it doesn't cover stuff that's part
of the core Wayland protocol, like access to the clipboard. And some
compositors (wlroots-based ones) provide extra Wayland protocols for
things like screenshots. And up to now, Wayland compositors haven't
really done any authorization for these potentially dangerous APIs.
(All of this is only really meaningful for sandboxed applications --
running in Flatpak or a Spectrum VM or something. Linux isn't
really set up to do separation between processes running as the same
user without namespaces.)
So earlier this week, I posted to the Wayland mailing list with an
idea I'd been thinking about for a while, which was to place a proxy
program in front of the Wayland compositor, that would intercept
client->compositor requests and handle access control. I was quickly
convinced that a proxy wasn't a good idea, but there was a lot of
discussion, and it was really helpful to me figuring out what the right
way to do it might be.
There are really two problems to be solved here, one of which I hadn't
even thought much about. The first is securely identifying a Wayland
client. A compositor needs to be able to form the question "Should
client X be able to do this?", and to do that it needs to be able to
identify a client as client X, and know if it tries to interact with
client X later, that it'll be talking to the same client. In a
non-virtualized system, the obvious way to do this would be getting the
pid of the client from the connection socket and then looking it up in
proc(5) to find out executable path, but this approach is fundamentally
The way forward here (and one that would work for Spectrum) appears to
be the proposed Security Contexts protocol, which would allow a
sandbox implementation to provide a security context identifier for a
client before handing off the Wayland connection to the client. Once
the security context had been set up, it wouldn't be allowed to be
modified, so once the sandboxed client was given the connection, it
wouldn't be able to change the security context identifier.
In Spectrum, the security context identifier here would likely be a
unique, user-provided name for the VM, and the security context setup
would be done by the virtio wayland implementation in the VMM.
The second part of this puzzle is how the compositor should decide
whether a client should be allowed to perform a particular operation,
like a paste or going fullscreen (which is risky because it might spoof
There was actually a previous attempt at this a long time ago.
libwsm (short for Wayland Security Module) was a library that
compositors would have integrated to make authorization decisions. But
it wasn't adopted by compositors, and it some things we now know to
be bad ideas, like setting policy based on the executable path. It also
made compositors responsible for any sort of authorization UI. In my
opinion, it's better to have that done by the external piece, so that
compositors have as little work to do as possible and therefore
authorization is implemented as widely as possible.
The compositor could implement an authorization system entirely on its
own, but this would be a lot of code for each compositor to write, and
it would limit the user to whatever permissions system the compositor
came up with, which might not be able to accommodate their needs. (An
example of this would be a Spectrum user that wanted to allow pasting
between two applications, but not allow pasting between either of those
and any other application.)
It could also be implemented by having the compositors integrate with
something like polkit, but compositor authors are reluctant to
integrate directly with a single system like that, and even polkit might
not support everything that would be desirable in an authorization
system. (For example, it might be nice to implement libwsm's concept of
a soft allow, where an action is permitted, but a notification is shown
so the user is aware it's happened.)
So a third solution, suggested by ifreund, a Wayland compositor author,
in the Spectrum IRC channel, would be to have a privileged Wayland
client that the compositor could ask authorization questions to, with a
new protocol. The compositor would know that this particular client
should be privileged because it either wouldn't be in a security
context, or would be given a special security context identifier known
to the compositor ahead of time. Then, implementations of this protocol
could do authorization however they wanted, with the only limitation
being the questions the compositor was programmed to ask them. I think
this is a good way forward, but it'd be important to discuss with more
compositor authors before getting to excited about it.
Next steps from here with Wayland are:
* Figure out what needs to happen to move the security contexts
proposal forward. If it needs an experimental implementation, maybe
we could help with that?
* Inquire about the authorization protocol idea, and see how other
compositor offers would feel about it. If there's a generally
positive reaction, figure out how to move forward with it.
I created a branch in Spectrum's nixpkgs repository for the
long-overdue merge with upstream, did the merge, and posted some
patches to spectrum-devel that fix all the builds that broke as a
result. I've been letting them sit for a few days hoping for a review.
Once that's done, it's time for another chromiumOSPackages upgrade, but
that should be pretty easy this time because we're only one version behind.
Thanks for keeping up with Spectrum. :)
The Spectrum IRC channel is now #spectrum on irc.libera.chat.
It can also be joined through the Matrix bridge as
#spectrum:libera.chat. I understand there are some teething problems
with the Matrix bridge, but hopefully they should be resolved soon.
Apologies for the inconvenience.