So I tried to look over what happens in my system, and then apply some
hardening to it. The following is what I got.
Design sketch of a SpectrumOS prototype
User starts an application with some choice of VM base
image/capabilities, store paths available inside the VM, the store paths
to add to environment, state directories readable/writeable in VM,
external capabilities connected to the VM, internal devices visible to
Single-use state directory may be designated to provide a VM with
fully-owned secondary-storage-backed tmp-like space.
Layout of the state directories visible to the VM can be adjusted.
As a special case support there is support for generating VM-specific
configuration files such as /etc/passwd (these should not be prebuilt
due to per-launch UID generation).
A unique UID is assigned to the VM. This UID is added to the ACLs so it
can read the state directories assigned and all their subdirectories
(files are world-readable by default). It also gets access to write to
the writeable directories and possibly to some of the files in these
UIDs/ACLs are tracked. The user can assign a UID to an alias for further
reuse (this reduces the need for updating ACLs all the time). One-time
UIDs are reused as rarely as possible. When a VM with a single-use UID
quits, its UID is removed from the ACLs of the granted files. There is
also a periodic enumeration of files in order to clean up long-obsolete
ACLs in came of unclean shutdown or similar events.
A temporary directory is created. Everything that should be visible
inside the VM is bind-mounted there. This includes devices. An isolated
environment is set up for the VM to run.
VM is started serving the state directories and store paths to the
inside via virtfs/virtio-fs.
Inside VM the extra device node are removed from /dev/. Requested store
paths are added to PATH and possibly to other variables like
LD_LIBRARY_PATH or QT_PLUGIN_PATH as requested. Read-only state
directories are bind-mounted over themselves read-only. The specified
program is started normally under numerically the same UID as assigned
to the VM (to avoid the need to use UID mapping).
There are some predefined wrappers to launch a program requiring, for
example, a D-Bus session in a VM-local D-Bus session.
A special type of wrappers is restricted-environment setup, to reduce
the ability of an program to attack the surrounding VM.
(VM might eventually run a different OS kernel type than the host)
Special kinds of resources:
Expected to be handled by the VM: DRI passthrough. Sound passthrough
(wrapper to launch full isolated PulseAudio).
GUI is handled via virtio-wayland.
Local sockets (such as ssh-agent): hopefully can be handled by the file
Network: VM connected by socket to a virtual switch leading to a
firewall VM, VM itself runs in no-network environment. Cross-VM network
links do not correspond to any visible files inside the VM-surrounding
isolated environment. A firewall VM per ruleset (can be shared for
multiple Firefox instances with the same security profile). Firewall VMs
connect via a virtual switch to the network-handling VM; it may have
addressless filtering bridge VMs on both sides.
External storage: considered a special case of state storage; might not
support ACLs (so access control is purely by ro/rw bind mounts outside
and inside the VM).
Nontrivial external hardware: USB/PCI pass through?
Conversation on IRC has convinced me that this is the right thing to
do after all:
Jean-Phillipe, I'd be curious to hear your thoughts on the above
discussion, since you recommended block devices to me when we talked.
design.html | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/design.html b/design.html
index 4b96a41..dc14cfe 100644
@@ -43,23 +43,22 @@ one per application.
Each virtual machine will be generated by
a <a href="https://nixos.org/nix/">Nix</a> derivation, and will have a
completely immutable root file system. Persistent storage will be
-provided by virtual block devices, that arbitrary paths on the system
-can be mapped to from the host. There may be other writable mount
-points inside the virtual machine, but these will not persist between
-reboots of the VM. Using Nix to generate virtual machines allows them
-to be reproducibly built, rolled back, edited, and migrated as source
-code, rather than large, opaque virtual machine images.
+provided by mounting subdirectories of the global state directory into
+virtual machines. There may be other writable mount points inside the
+virtual machine, but these will not persist between reboots of the VM.
+Using Nix to generate virtual machines allows them to be reproducibly
+built, rolled back, edited, and migrated as source code, rather than
+large, opaque virtual machine images.
-Virtual block devices will also be defined in Nix, and block devices
-and applications will be <var>m</var>:<var>n</var>. Some virtual
-machines may have no persistent storage, or even write access to a
-disk, at all. In other cases, it might be desirable for multiple
-applications to be able to access the same device, such as a local
-mail store being shared by two mail clients. Other resources and
-permissions, such as network cards and USB controllers, will similarly
-be defined in Nix. There are three logical sections for the Nix
-configuration -- applications, which are just packages, resources
+State directories and applications will be <var>m</var>:<var>n</var>.
+Some virtual machines may have no persistent storage, or even write
+access to a disk, at all. In other cases, it might be desirable for
+multiple applications to be able to access the same device, such as a
+local mail store being shared by two mail clients. Other resources
+and permissions, such as network cards and USB controllers, will
+similarly be defined in Nix. There are three logical sections for the
+Nix configuration -- applications, which are just packages, resources
(virtual or physical devices), and <i>application instances</i>, which
are mappings between applications and accessible resources. This
structure allows users to have multiple instances of the same
This changes the URI for the application too, but that was never
published anywhere so I think it's fine.
Anything I've missed?
.../application.html | 0
nlnet-pet-2019-03/bom.txt | 8 ++++++++
2 files changed, 8 insertions(+)
rename nlnet-pet-2019-03.html => nlnet-pet-2019-03/application.html (100%)
create mode 100644 nlnet-pet-2019-03/bom.txt
diff --git a/nlnet-pet-2019-03.html b/nlnet-pet-2019-03/application.html
similarity index 100%
rename from nlnet-pet-2019-03.html
rename to nlnet-pet-2019-03/application.html
diff --git a/nlnet-pet-2019-03/bom.txt b/nlnet-pet-2019-03/bom.txt
new file mode 100644
@@ -0,0 +1,8 @@
+Spectrum -- Software Bill of Materials
+- Linux (notably KVM): <https://kernel.org/>
+- Nix: <https://nixos.org/nix/>
+- Nixpkgs: <https://nixos.org/nixpkgs/>
+- crosvm: <https://chromium.googlesource.com/chromiumos/platform/crosvm/>
+- A Wayland compositer (undecided which one)