Search results for "This Week in Spectrum" - Spectrum Discuss

Re: Qubes-lite With KVM and Wayland

by Thomas Leonard

On Tue, 9 Mar 2021 at 16:59, Alyssa Ross <hi(a)alyssa.is> wrote: > > On Sun, Mar 07, 2021 at 12:52:36PM +0000, Thomas Leonard wrote: > > On Wed, 27 Jan 2021 at 17:31, Thomas Leonard <talex5(a)gmail.com> wrote: > > [...] > > > If any of this sounds useful for spectrum let me know. I can try and > > > tidy it up; it's all a huge mess at the moment! > > > > I got a bit further (fixed my sommelier problems), but have run out of > > time for now :-( > > > > I've written up where I got to here: > > > > https://roscidus.com/blog/blog/2021/03/07/qubes-lite-with-kvm-and-wayland/ > > I saw this online the other day and started reading it without realising > it was you, and then I saw you were using Nix and thought "wow, that's > close to what I'm (not) doing", and then I saw the Spectrum section, and > then realised who the author was. :) :-) > I'll quote a little from it and reply to bits: > > > When I wanted a newer package (socat with vsock support, only just > > released) I just told Nix to install it from the latest Git checkout of > > nixpkgs. > > I'm excited to learn that socat has vsock support now! That's going to > be very useful. I have a half-done patch somewhere that adds vsock > support to strace that I should finish up as well. Yeah, I'm using it as a hacky replacement for qrexec for now. The fact that it connects to the network system, and allows you to specify the target VM ID, makes it look like it's designed to go between VMs, but it doesn't seem like it does. I worry that they'll enable that at some point and create a sudden security problem... > > True, my squashfs image is getting a bit big. Maybe I should instead > > make a minimal squashfs boot image, plus a shared directory of hard > > links to the required files. That would allow sharing the data with the > > host. I could also just share the whole /nix/store directory, if I > > wanted to make all host software available to guests. > > I think the solution I will end up going with for this will be a custom > virtiofsd implementation that can implement some access controls. Sounds sensible. > > I didn’t have time to write and debug C++ code for every missing > > Wayland protocol, so I took a short-cut: I wrote my own Wayland library, > > ocaml-wayland, and then used that to write my own version of sommelier. > > With that, adding support for copying text was fairly easy. > > Well this is interesting! I definitely want to learn more about this. I've put it up here: https://github.com/talex5/wayland-virtwl-proxy There's a default.nix file, so it should build easily enough (make sure to git clone with submodules). I'd be interested to know if it works for other people. I've been using it for about a week now, and it seems fine with firefox, evince and xfce4-terminal (the apps I use). But e.g. kitty won't run because there's no `wl_drm` support. I don't know anything about graphics acceleration. But someone on Hacker News commented that you did panfrost, so I guess you know about that sort of thing. > > * One problem with virtwl is that, while we can receive shared > > memory FDs from the host, we can’t export guest memory to the > > host. This is unfortunate, because in Wayland the shared memory for > > window contents is allocated by the application from guest memory, > > and the proxy therefore has to copy each frame. If the host > > provided the memory to the guest, this wouldn’t be needed. There > > is a wl_drm protocol for allocating video memory, which might help > > here, but I don’t know how that works and, like many Wayland > > specifications, it seems to be in the process of being replaced by > > something else. > > Yeah, this comes up on the virtio mailing list from time to time. It's > a very difficult problem to solve, but there might be a solution some > day. I think I've written about my own explorations in this area on > this list before. > > > I’m not sure how guest-to-guest communication works with KVM. > > It... doesn't really, at least not the way it does with Xen. > virtio-vhost-user[1] is promising, but very early stages. I've talked > in quite a lot of detail about how that works on this list before as > well. guest-to-guest communication was my main area of work for most of > the second half of last year (and what ended up causing me to burn out). I guess once you've got shared memory and inter-VM interrupts it might be possible to reuse the Xen protocols and drivers. I made a firewall VM on Qubes that did that a few years ago (https://roscidus.com/blog/blog/2016/01/01/a-unikernel-firewall-for-qubesos/). But the virtio protocols will probably be more widely supported in future. > [1]: https://wiki.qemu.org/Features/VirtioVhostUser > > > I hope the SpectrumOS project will resume at some point > > Me too! Maybe it's resuming right now! (Although I'm not committing -- > just because I'm feeling ready to get back into it today doesn't mean > that's going to be sustainable again yet.) :-) -- talex5 (GitHub/Twitter) http://roscidus.com/blog/ GPG: 5DD5 8D70 899C 454A 966D 6A51 7513 3C8F 94F6 E0CC

3 years, 1 month

Re: New user getting started questions

by Alyssa Ross

Hi Thomas! Thanks for keeping us updated. It's really great to have all this written up to read, even though I'm only getting to it a month and a half later. On Wed, Jan 27, 2021 at 05:31:08PM +0000, Thomas Leonard wrote: > I've made a bit of progress this week: > > It turns out that weston-terminal crashes sommelier if started when > the clipboard is empty, due to trying to dereference NULL. I've > patched it to fix that, and now I can run it directly under sommelier, > without wayfire. I made a few other changes to sommelier too: > > - I switched to the latest version, which provides meson instead of > common-mk for building. Also, they removed the demos and got rid of > some bogus dependencies. That simplified the build a lot! > - They switched to the stable XDG protocols, but then reverted it > again. I unreverted it to get things going again. Not sure if I did it > right (they migrated from C to C++ so the patch didn't apply > directly). This is great to know -- it sounds like maybe they're trying to make Sommelier more widely usable? Will probably be a while before I get to updating but this is very exicting. > - I added xwayland to the VM and sommelier command, allowing X > applications to run in the VM. > - By default sommelier runs the program with an already-open socket, > which doesn't work if the program (or its children) want to open > multiple connections. > I was able to fix that by using `--parent` mode, and getting rid of > PEER_CMD_PREFIX (which just adds some chromium paths preventing it > from working). > - Note: in `--parent` mode it waits for the process to exit before > processing events on the socket, so if you just run an application > directly it will hang. I used `bash -c 'firefox &'` as the command as > a work-around. > - Some programs (e.g. firefox) refused to start because the protocol > versions offered by sommelier were too old. I increased the version > numbers and that's working now. It needs doing properly, though. e.g. > I implemented the new "sl_host_surface_damage_buffer" by simply > calling the old damage function, which is obviously not correct but is > working for me so far! > - Annoyingly, using `--parent` disables xwayland support. Maybe we > should run xwayland manually, or use a second sommelier instance? > > In general, sommelier seems quite buggy and annoying. I guess it will > need updating constantly to proxy every new wayland protocol. Yet it > can't add any useful security because it runs inside the VM, and is > therefore untrusted. Yeah... > Some other changes that I found useful: > > - I added the generated kernel modules directory to rootfs, which > allows using all the normal features of Linux (e.g. ext4) in the VM. Ah, yes, that would remove a lot of gotchas. I have avoided that so far because I'm hoping to eventually build custom kernels that don't need many modules, to reduce code size in each VM. But it would probably make sense to do for now. > - I switched from `bash` to `bashInteractive` as the VM shell, which > gets cursor keys working. Good catch! I'll make that change in Spectrum as well. > - I wrote a Nix package to generate one script for each of my old > qubes. So e.g. I can now run `qvm-start-shopping` to start my crosvm > shopping instance, with its own /home LVM partition and IP address. It > passes the network configuration using some new kernel parameters > (alongside spectrumcmd). > - I put each VM on its own point-to-point virtual network. These > networks are set up by /etc/nixos/configuration.nix. That works well > for my qubes-like VMs, though I guess spectrum will need something > more dynamic. > - I enabled the shared filesystem (VIRTIO_FS), which works nicely. I > use it to provide a (separate) shared directory to each VM that I can > access from the host. > One problem is that the crosvm driver runs in a minijail with a > uidmap that makes every file appear to be owned by root, so only root > can write things in the VM. > Possibly a newer kernel would help; later versions of the kernel > docs say you can include any normal FUSE flags here, so mounting with > `uid=1000` might work. I've only looked into virtio-fs a little bit -- I remember having to make a change to crosvm to make the sandboxing work. Glad to hear it's working well. I'll find out if a later kernel works when I get to updating Nixpkgs (or somebody else does -- someone on IRC was actually offering to try doing this the other day). > - Finally, I added a `vm-halt` command that just calls `reboot`, as I > don't want to develop the habit of typing `reboot` without thinking > ;-) I don't want to think about how many times I've made this mistake, lol. > If any of this sounds useful for spectrum let me know. I can try and > tidy it up; it's all a huge mess at the moment! I think it might well be -- the stuff you have going on with networking and filesystems sound great, in particular -- but I'll have to have gotten a bit more back into the project again to know exactly what. Right now I'm focusing on slowly bringing myself back up to speed and remembering what state things are in. > Once this is working more smoothly, I guess the next issues will be > setting up some kind of secure window manager on the host (e.g. > labelling windows with the VM they come from, not allowing > screenshots, etc). Would also be good to get sound forwarding working > somehow (Qubes routes pulseaudio to all the VMs and gives you a mixer > to control the levels for each, but I don't know how that worked). It > also needs some kind of VM manager to keep track of which VMs are > running. And some kind of IPC system like qrexec would be useful. Do > you have thoughts or plans about how to do any of this? The window manager is a part of this whole thing that makes me very nervous. A secure window manager is very important for Wayland, and I'm not sure how much I trust any of the existing ones to get it right. But with Wayfire I'm hoping it'll at least be easy enough to implement stuff like tagged/coloured windows for the proof of concept (since the plugin API and stuff is Wayfire's niche), and I'm hoping at some point somebody comes up with a security-focused Wayland window manager we can switch to -- I'd love a Rust one, and there's work going on in that area[1]. Not sure about IPC yet, but I recently read an article about PipeWire[2], and that's been making me think a bit about audio. With PipeWire, they seem to have cared about security from the start: > To avoid the PulseAudio sandboxing limitations, security was > baked-in: a per-client permissions bitfield is attached to every > PipeWire node — where one or more SPA nodes are wrapped. This > security-aware design allowed easy and safe integration with Flatpak > portals; the sandboxed-application permissions interface now promoted > to a freedesktop XDG standard. And it gets better! In particular, this sounds very promising: > a native fully asynchronous protocol that was inspired by Wayland — > without the XML serialization part — was implemented over Unix-domain > sockets. Taymans wanted a protocol that is simple and hard-realtime > safe. It goes on to say they use this for sending file descriptors and stuff. The similarity to Wayland is very exciting, because it means we might just be able to run PipeWire over the existing virtio_wl infrastructure very efficiently. It'll be a while before I get to look at audio in depth, but this all sounds very good -- maybe most of the work will have been done for us! In general I'm feeling very optimistic about a lot of the stuff going on in the ecosystem to try to make Flatpak and co secure -- I don't trust Flatpak itself to provide meaningful security, but it means we're getting standard mechanisms for permissions for standard applications (xdg-desktop-portal is another that comes to mind), and if this goes well it means that all we have to do is provide implementations of those standard interfaces that cross VM boundaries, and applications designed to work in Flatpak etc. should already understand how to interact with them. [1]: https://smithay.github.io/pages/about.html [2]: https://lwn.net/SubscriberLink/847412/f5595d3e8875ce5d/

3 years, 1 month

Re: New user getting started questions

by Thomas Leonard

I've made a bit of progress this week: It turns out that weston-terminal crashes sommelier if started when the clipboard is empty, due to trying to dereference NULL. I've patched it to fix that, and now I can run it directly under sommelier, without wayfire. I made a few other changes to sommelier too: - I switched to the latest version, which provides meson instead of common-mk for building. Also, they removed the demos and got rid of some bogus dependencies. That simplified the build a lot! - They switched to the stable XDG protocols, but then reverted it again. I unreverted it to get things going again. Not sure if I did it right (they migrated from C to C++ so the patch didn't apply directly). - I added xwayland to the VM and sommelier command, allowing X applications to run in the VM. - By default sommelier runs the program with an already-open socket, which doesn't work if the program (or its children) want to open multiple connections. I was able to fix that by using `--parent` mode, and getting rid of PEER_CMD_PREFIX (which just adds some chromium paths preventing it from working). - Note: in `--parent` mode it waits for the process to exit before processing events on the socket, so if you just run an application directly it will hang. I used `bash -c 'firefox &'` as the command as a work-around. - Some programs (e.g. firefox) refused to start because the protocol versions offered by sommelier were too old. I increased the version numbers and that's working now. It needs doing properly, though. e.g. I implemented the new "sl_host_surface_damage_buffer" by simply calling the old damage function, which is obviously not correct but is working for me so far! - Annoyingly, using `--parent` disables xwayland support. Maybe we should run xwayland manually, or use a second sommelier instance? In general, sommelier seems quite buggy and annoying. I guess it will need updating constantly to proxy every new wayland protocol. Yet it can't add any useful security because it runs inside the VM, and is therefore untrusted. Some other changes that I found useful: - I added the generated kernel modules directory to rootfs, which allows using all the normal features of Linux (e.g. ext4) in the VM. - I switched from `bash` to `bashInteractive` as the VM shell, which gets cursor keys working. - I wrote a Nix package to generate one script for each of my old qubes. So e.g. I can now run `qvm-start-shopping` to start my crosvm shopping instance, with its own /home LVM partition and IP address. It passes the network configuration using some new kernel parameters (alongside spectrumcmd). - I put each VM on its own point-to-point virtual network. These networks are set up by /etc/nixos/configuration.nix. That works well for my qubes-like VMs, though I guess spectrum will need something more dynamic. - I enabled the shared filesystem (VIRTIO_FS), which works nicely. I use it to provide a (separate) shared directory to each VM that I can access from the host. One problem is that the crosvm driver runs in a minijail with a uidmap that makes every file appear to be owned by root, so only root can write things in the VM. Possibly a newer kernel would help; later versions of the kernel docs say you can include any normal FUSE flags here, so mounting with `uid=1000` might work. - Finally, I added a `vm-halt` command that just calls `reboot`, as I don't want to develop the habit of typing `reboot` without thinking ;-) If any of this sounds useful for spectrum let me know. I can try and tidy it up; it's all a huge mess at the moment! Once this is working more smoothly, I guess the next issues will be setting up some kind of secure window manager on the host (e.g. labelling windows with the VM they come from, not allowing screenshots, etc). Would also be good to get sound forwarding working somehow (Qubes routes pulseaudio to all the VMs and gives you a mixer to control the levels for each, but I don't know how that worked). It also needs some kind of VM manager to keep track of which VMs are running. And some kind of IPC system like qrexec would be useful. Do you have thoughts or plans about how to do any of this? On Wed, 20 Jan 2021 at 13:04, Thomas Leonard <talex5(a)gmail.com> wrote: > > On Thu, 14 Jan 2021 at 12:51, Alyssa Ross <hi(a)alyssa.is> wrote: > [...] > > Oh, whoops, I missed your reply about having worked this out already! > > Yeah, disk and networking is OK now. > > I also managed to fix the fonts, by using `export FONTCONFIG_FILE > /etc/fonts/fonts.conf`. By default, it didn't have a monospace font > available, which was pretty hard to read in the terminal. > > I want to get wayland forwarding working next. For now, I'm using `ssh > -Y` to my VM to forward X. It works, but it's a little slow. > > I set `export WAYLAND_DEBUG 1`, and tried weston-terminal again. That produced: > > [...] > [446067.157] -> wl_region(a)21.destroy() > [446067.481] -> wl_surface@16.set_input_region(wl_region@22) > [446068.036] -> wl_region(a)22.destroy() > [446068.412] -> wl_surface@16.attach(wl_buffer@24, 0, 0) > [446069.190] -> wl_surface(a)16.damage(0, 0, 806, 539) > [446070.141] -> wl_surface(a)16.commit() > [446070.531] wl_keyboard(a)20.keymap(1, fd 8, 48869) > [ 1.796076] sommelier[88]: segfault at 30 ip 00007fa5376062c0 sp > 00007ffe128592c8 error 4 in > libwayland-client.so.0.3.0[7fa537604000+6000] > [ 1.798026] Code: ff ff ff 5d 41 5c c3 0f 1f 00 48 8d b7 d0 00 00 > 00 e9 e4 df ff ff 0f 1f 40 00 48 89 77 30 c3 66 66 2e 0f 1f 84 00 00 > 00 00 00 <48> 8b 47 30 c3 66 66 2e 0f 1f 84 00 00 00 00 00 8b 47 40 c3 > 66 66 -- talex5 (GitHub/Twitter) http://roscidus.com/blog/ GPG: 5DD5 8D70 899C 454A 966D 6A51 7513 3C8F 94F6 E0CC

3 years, 2 months

This (and Last) Week in Spectrum, 2020-W34 & 2020-W35

by Michael Raskin

>to handle 9P over vsock, but I haven't tested yet. We can use existing >virtiofsd and 9P software (there are promising Rust implementations of >each), and harden them against potential vulnerabilites like directory >traversals using kernel features like RESOLVE_BENEATH and >RESOLVE_NO_XDEV. For the boot device, maybe there's no reason not to Also, if the server is in a namespace seeing only a bind mount to the necessary part of the FS, in a VM that only sees that one FS, the cheap attacks just become moot. You can probably talk it into traversal, but it doesn't see more than allowed anyway; talking it into attacking the VM kernel is hopefully harder (and still has limited impact) >just mount it using the host kernel, or maybe there's something to be >gained by just reading a small bootstrap payload into memory from the >start of the disk once, and then making all future communication go via >a VM. I'm not really sure yet. But the important thing is we'll have >mechanisms for all this in place. Maybe we'll decide that non-boot >devices should just go over inter-VM 9P, but in any case, we'll still >need all these pieces. Can virtiofs eventually be backed by a VM-wrapped vhost-user? Although we probably do want host-side page cache, as VM's requests to host are way more transparent for the scheduler than inter-VM requests. >computers I've tried it on so far. I suspect that I will get GPU >isolation working, but I'm not sure how reliable or performant it will >be. Hmm. Also a good question what is the timeslice for inter-VM communication. Does it make sense to have two VMs alternate for slices of ten milliseconds? This is just what is probably needed to have 25fps video playback�� >I'm pushing quite hard to make it over the line with my hardware >isolation funding milestone. I'm so close, and I'm about to need the >money. But once I've hit that, I think I'm going to need a break. This >stuff is gruelling. I wish you strength for this push!

3 years, 7 months

This (and Last) Week in Spectrum, 2020-W34 & 2020-W35

by Alyssa Ross

Last week I wasn't feeling well, so there was no This Week in Spectrum. crosvm ------ Where we left off, I had been attempting to port vhost-user-net support from cloud-hypervisor to crosvm. I'd been trying to port the first incarnation of the code in cloud-hypervisor to the contemporary version of crosvm from when it was added, thinking that that would be easier because the two codebases together. But I ran into the problem that this earliest incarnation of the vhost-user-net code from cloud-hypervisor didn't actually work (at least with the backend I was attempting to test it with). I'd been attempting to figure out exactly which changes were required to make it work, but hadn't been successful with that yet, and I thought I'd probably need to start the port over, from the latest cloud-hypervisor and crosvm code. The next day, I decided to give my previous strategy one more try, though, and an hour or two later, I found the required cloud-hypervisor change, applied it to crosvm, and it worked! So I now have a crosvm tree capable of vhost-user-net[1]. This means that it's looking good for my plans for inter-guest networking, and network hardware isolation. With that in place, I decided to start thinking about other kinds of hardware isolation and inter-VM communication, and that's what I did for most of the last two weeks. Let's go through them: Files will be shared between VMs using virtio-fs. This has the unique feature of (soon) being able to bypass guest page caches, and have only a single shared cache between VMs. This brings a performance improvement, but as I understand it, should also reduce memory consumption because each VM won't have to maintain its own copy of a disk-backed page. Of course, this feature (DAX) is also a big side channel, so it won't be appropriate for all use cases. But I think for some things people want to do with Spectrum, this will be very important. The problem with this is that, because it uses the page cache of the host kernel, the host has to know about the filesystem that's being shared -- there's no running virtiofsd in a VM if we want DAX. But I'd really like it if a (non-boot) block device could be used as a filesystem without the host having to actually talk to the device. I was stuck here, but edef pointed out to me that we could use the kernel's 9P support to attach the block device to a VM, and then mounting the filesystem in the host over 9P, either over a network connection or (ideally) vsock. It looks like the kernel should be able to handle 9P over vsock, but I haven't tested yet. We can use existing virtiofsd and 9P software (there are promising Rust implementations of each), and harden them against potential vulnerabilites like directory traversals using kernel features like RESOLVE_BENEATH and RESOLVE_NO_XDEV. For the boot device, maybe there's no reason not to just mount it using the host kernel, or maybe there's something to be gained by just reading a small bootstrap payload into memory from the start of the disk once, and then making all future communication go via a VM. I'm not really sure yet. But the important thing is we'll have mechanisms for all this in place. Maybe we'll decide that non-boot devices should just go over inter-VM 9P, but in any case, we'll still need all these pieces. GPU isolation should be possible by forwarding the GPU to a VM, but there are a few problems here. The first is that it would mean rendered surfaces have to be copied via shared memory to the VM with the GPU, before being sent to the GPU. Additionally, sharing the GPU between VMs for rendering at all would require significantly more work. The result of this is that graphics performance using an isolated GPU will probably be poor, at least for now. The final problem is that passthrough of integrated GPUs seems to be very difficult to get right. I will probably need to acquire some hardware that I've sene a report of this working on, so I can figure out what I've been doing wrong on the two computers I've tried it on so far. I suspect that I will get GPU isolation working, but I'm not sure how reliable or performant it will be. For generic USB devices, I expect to be able to take an approach similar to Qubes[2], having a VM to handle interactions with the hardware USB controller, and exposing individual USB devices over USB/IP to other VMs. It would be nice if I could use vsock for this too. [1]: https://spectrum-os.org/git/crosvm/?h=vhost-user-net [2]: https://www.qubes-os.org/doc/usb-devices/ spectrum-os.org --------------- Philipp registered a Matrix room and bridged it to the #spectrum IRC channel. I'm told that this should make it easier for Matrix users to join the room, since some bug in Matrix's IRC bridge prevents people from joining from Matrix the usual way. Philipp also sent a patch[3] to improve the instructions for Matrix users joining the channel on the website. Thanks Philipp! [3]: https://spectrum-os.org/lists/archives/spectrum-devel/87wo247zu7.fsf@alyssa… QEMU ---- I sent the previously requested patch[4] to resolve ambiguities in the vhost-user spec. No response yet, though. I'll probably resend it some time soon. [4]: https://lore.kernel.org/qemu-devel/20200813094847.4288-1-hi@alyssa.is/ I'm finding it hard to keep going at the moment. The stuff I'm doing now is probably the hardest part of implementing Spectrum, and it's frustrating to realise that not everything I want to do is going to be possible. So much of the KVM ecosystem assumes that things will be host<->guest, and there's not always an easy solution. But, whatever we end up with, it's going to be a lot better than what I'm using today, and what lots of other people are using today. I think I'm going to be able to deliver a good experience with a fairly high degree of protection against malicious hardware. But it's not going to be perfect. I'm pushing quite hard to make it over the line with my hardware isolation funding milestone. I'm so close, and I'm about to need the money. But once I've hit that, I think I'm going to need a break. This stuff is gruelling.

3 years, 7 months

This (Last) Week in Spectrum, 2020-W32

by Alyssa Ross

Last week, I'd just finished getting the cloud-hypervisor vhost-user-net frontend code to build as part of crosvm, and the next step was testing it. crosvm ------ I wrote some hacky code that replaced the virtio-net device creation in crosvm with an instance of the ported vhost-user-net code. When I booted crosvm, there were some of the expected simple oversights of mine that needed to be addressed, but once those were taken care of, it still didn't quite work. The VM boots, sees a network interface, and even communicates with the vhost-user-net backend! But, it doesn't quite work. The vhost-user-net code never realises/gets told that it has traffic, and so it's never processed. Unsure of what to do about this, I decided to turn to cloud-hypervisor and look at how the code ran there. cloud-hypervisor ---------------- I wanted to try running the cloud-hypervisor v-u-n backend I was using for testing (because it's much simpler than DPDK -- it just sends traffic to a TAP device) with QEMU as the frontend, because QEMU is a VMM I'm familiar with (much more so than cloud-hypervisor as the frontend), and I thought it would be useful to have a working frontend/backend combination to compare to. I had some problems, though, because apparently nobody had ever wanted to use QEMU with the cloud-hypervisor vhost-user-net backend before -- or if they had, they hadn't wanted to enough to make it work. The cloud-hypervisor backend didn't implement the vhost-user spec correctly in a few subtle ways that made it incompatible with QEMU. I won't explain every subtle issue, but I ended up writing a few patches[1][2] for cloud-hypervisor and the "vhost" crate it depends on (that is in the process of being moved under the rust-vmm umbrella). One interesting issue I will go into a little detail of was that the wording in the spec was a little unclear, and QEMU interpreted it one way, and cloud-hypervisor the other. I ended up sending an email[3] to the author of the spec asking for clarification. He answered my question, and we discussed how the wording could be improved. He liked my second attempt at improving my working, and asked me to send a patch, but preferably not right now, because QEMU is currently gearing up for a release, scheduled for next week if everything goes well. Since I wrote these cloud-hypervisor patches, and had to test them, I ended up having to learn how to use cloud-hypervisor anyway to make sure I hadn't broken it in fixing the backend up to work with QEMU. Oh well. Once this was done, I could use both QEMU and cloud-hypervisor with the backend, but not crosvm. But it was a little more complex than that. When I ported the v-u-n code to crosvm, I ported the first version of it that was added to the cloud-hypervisor tree, rather than the latest version. The theory here was that the earlier version would be closer to crosvm, because cloud-hypervisor would have had less time to diverge. Then, once I had that working, I could add on the later changes gradually. What I didn't account for here is that the initial version of the v-u-n frontend in cloud-hypervisor didn't really work properly, and needed some time to bake before it did. So having now had this experience I think it might be better to try to port the latest version, and accept that porting might be a bit harder, but the end result is more likely to work. [1]: https://github.com/cloud-hypervisor/vhost/pull/22 [2]: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/1565 [3]: https://lore.kernel.org/qemu-devel/87sgd1ktx9.fsf@alyssa.is/ libgit2 ------- While bisecting cloud-hypervisor to see if I could figure out when the v-u-n frontend started working properly, I encountered a large section of commits that I couldn't build any more, because Cargo couldn't resolve a git dependency. The dependency was locked to a commit that was no longer in the branch it had been in when the cloud-hypervisor commit was from. Despite knowing the exact commit it needed, Cargo fetched the branch the commit used to be on. This is because it is generally not possible to fetch arbitrary commits with git. Some servers, like GitHub, do however allow this, and I wondered why Cargo wouldn't at least fall back to trying that. As it turns out, it actually couldn't do that, though! Cargo uses libgit2, and libgit2 doesn't support fetching arbitrary commits. So I wrote a quick patch to libgit2 to support this[4]. It's only a partial implementation, though, because I don't find libgit2 to be a particularly easy codebase to work in (although it's better than git!). So I'm hoping somebody who knows more about it than me will help me figure out how to finish it. [4]: https://github.com/libgit2/libgit2/pull/5603 Next week, I'm hoping that I'll be able to get to vhost-user-net in crosvm working. I think this will probably mean porting the code again, using the latest version. Which is a bit of a shame, but at least I have an idea of what to do next. I am, overall, feeling pretty optimistic, though. I'm pretty confident that we can get some sort of decent but imperfect network hardware isolation even though virtio-vhost-user might not be ready yet, which was something I was worried about before. I don't want to really go into detail in that now though because this is already a long email and it's already a day late because I was tired yesterday, but essentially, we could forward the network device to a VM that would run the driver, and forward traffic back to the host over virtio-net. The host could handle this either in kernelspace or userspace with DPDK, but the important thing is that the only network driver it would need to support would be virtio-net. No talking to hundreds of different Wi-Fi cards and hoping that none of the drivers have a vulnerability. So, not perfect compared to proper guest<->guest networking, but a step in the right direction, and one that should be as simple as possible to upgrade to virtio-vhost-user once that becomes possible.

3 years, 8 months

This Week in Spectrum, 2020-W31

by Alyssa Ross

DPDK ---- Last week, I'd just figured out how to do a normal vhost-user setup with a QEMU VM connected to DPDK. This week, I wanted to try to move DPDK into another VM using the experimental virtio-vhost-user driver, taking the host system out of the networking equation altogether. In theory this should have been a very simple change, but I couldn't get it to work. DPDK claimed to be forwarding packets to the ethernet device I'd attached to the backend VM (the one running DPDK), but networking in the frontend VM (what you might think of as the application VM) didn't work at all. It tried and failed to do DHCP, and so couldn't progress beyond that. A breakthrough came when I thought to look at the logs of my local DHCP server. I saw that it was actually receiving requests from the VM, and assigning it an IP address. Once I realised this, I hypothesised that outgoing traffic was working, but not incoming. Finally having something to look for, I had a look through the DPDK virtio-vhost-user driver[1], and my suspicion was confirmed in an unexpected way. It looks like incoming traffic (from the perspective of the virtio-vhost-user frontend) is not actually implemented at all! But with outbound traffic working, this means that I'm confident enough I understand virtio-vhost-user enough to be able to leave this here for now. From Spectrum's side, I can now be pretty sure that everything should be workable, so we can just wait a bit for virtio-vhost-user to get a bit further along and then revisit it. And since the frontend has no idea it's talking to virtio-vhost-user instead of normal vhost-user, we can use normal (host-based) vhost-user for now, and drop virtio-vhost-user in down the line. A couple of outstanding questions I still don't know the answer to about DPDK are: - How will routing work if I have multiple frontend VMs with multiple virtio-vhost-user connections all wanting to use the same network device? Will I want to use something like Open vSwitch[2] for that? - DPDK by default uses a busyloop to check for data to process, for efficiency. This is obviously not appropriate for a workstation-focused operating system. There is an interrupt-based mode, though, but I don't know how to use it yet. Since I consider the concept proven, though, I'm going to punt on these for now. The longer I leave these questions, the more likely it is that a kernel driver for virtio-vhost-user will emerge and we can use that instead. That's not to say I want to leave inter-guest networking hanging forever, but I have other inter-guest networking bits I can switch focus to for now, and once those are down I can revisit the virtio-vhost-user backend situation. [1]: https://github.com/ndragazis/dpdk-next-virtio/blob/2d60e63/drivers/virtio_v… [2]: https://www.openvswitch.org/ crosvm ------ I started integrating the vhost-user-net code from Cloud Hypervisor into crosvm. I'm at the point where I can get all the copied Cloud Hypervisor code to compile in crosvm, which is pretty good! I have not yet written the code to actually start one of these devices yet, though, so I haven't been able to test it yet. It's been interesting to look at Cloud Hypervisor because it's a codebase that is heavily based on crosvm (even more so than Firecracker is), but that has also evolved and diverged from it. It's especially interesting to see stuff where parallel evolution occurred between the crosvm and Cloud Hypervisor codebases, or when Cloud Hypervisor changed how some crosvm code worked, and then later changed it back again. The codebases were still similar enough that I could have the cloud-hypervisor device integrated into the crosvm codebase in a day, although there's lots of code duplication that will have to be dealt with -- I copied over a bunch of supporting code rather than trying to integrate it into the crosvm equivalents to get the code running for the first time in an environment as similar as possible to the one it was designed for. I expect that when I test the device in crosvm it'll probably work fairly quickly if not first try. The more complicated part will be a bit of a change to how crosvm does guest memory that isn't strictly necessary but is important for security. crosvm allocates all guest memory in a single memfd. This means that, to share guest memory with another process, like when using vhost-user, the only option is to share all of guest memory. This would sort of defeat the purpose of hardware isolation in Spectrum! But from what I could tell -- I'm not 100% on this -- the guest memory abstraction in cloud-hypervisor is more advanced, and I think it might support multiple memfds backing guest memory for this sort of thing. I'll have to adapt crosvm to that model to be able to use vhost-user securely. website ------- The new "Bibliography" page is up[3]! Lots of links to relevant resources about concepts important to Spectrum. :) [3]: https://spectrum-os.org/bibliography.html It's a bit of a relief to have returned from the uncertain world of DPDK to the familiar territory of crosvm. I'm confident that the next bit of work here (vhost-user in crosvm) won't be that much of a big deal. Hopefully, we'll have at interim networking to a reasonable degree fairly soon. After that, I plan to look at file sharing, possibly with vhost-user-fs (virtio-fs over vhost-user), which I noticed cloud-hypervisor implements today. That should be pretty similar to the networking stuff, although I don't think any virtio-fs virtio-vhost-user code exists at the moment.

3 years, 8 months

This Week in Spectrum, 2020-W30

by Alyssa Ross

Feels like there's a pretty clear path forward. Nice feeling. :) QEMU ---- Last week, as I wrote TWiS, I had just discovered virtio-vhost-user, which looked like a very promising mechanism for getting a VM to take care of networking for other VMs. This week, I've been researching it further, and trying to test and evaluate it. The first thing I tried to do, naturally, was to build the patched QEMU tree and boot a VM with a virtio-vhost-user device attached. This was not as easy as I'd hoped, because adding the virtio-vhost-user device to my QEMU command line made the VM kernel panic at boot, with an error message about an invalid memory access. I spent most of the week trying to figure this out -- I wasn't doing anything different to the example[1] on the QEMU wiki, so it should have worked, and it felt like if I could just get past whatever was going wrong here, it would be worth it, because virtio-vhost-user otherwise seems so suited for what we need here. I emailed the patch author[2], but he didn't know what was up either. An early breakthrough came when I got frustrated with kernel builds taking hours on my 8-year-old laptop, and so decided to work on a more powerful computer instead. Once I got everything set up on that computer, I started up the VM, and it worked. Perhaps in setting it up over here I'd done something different? I copied over the exact VM disk/kernel/initrd/command line that I was running on my laptop, and the other computer booted it just fine. I had -cpu host in the QEMU command line, so I thought maybe the different kind of virtual CPU was causing it. Tried setting it to a specific value on both machines, and still the laptop VM panicked and the other didn't. So it sounded like whether it worked or not depended on the host hardware. I put together a Nix derivation that would automatically build the custom QEMU and output a script that would run a VM, and then asked people in #spectrum to test it out on various computers. After getting some further data, a pattern started to emerge, where Intel processors Ivy Bridge and older would fail, and Skylake and newer would succeed (I didn't encounter any AMD processors that failed, nor did I have data at the time for generations between Ivy Bridge and Skylake). This theory had a convenient explanation for why nobody else had seen this problem -- I doubt people at Red Hat are working on 7-year-old hardware. This was a good clue, but still didn't put me much closer to having a working system. I do have a more recent laptop around, but for reasons that are out of scope here it would be very inconvenient to decide to just move over to it. I could see that the kernel was panicking the first time it tried to access the PCI BARs of the virtio-vhost-user, which led me to believe that the problem was probably in how that memory was being set up. I found the function that did that[3], and stared at it for a long time. I tried to read the rest of the QEMU code, but it became clear that my domain knowledge here isn't good enough to be able to keep track of what's meant to be happening. I added some debug prints, which were vaguely helpful in making that understanding a little better. I was hoping to find the guest address each PCI BAR was mapped to so that I could check the kernel was trying to write to the right location, but didn't manage to do that. While attempting to, though, I did add a debug print that printed the size of each PCI bar as it was allocated. I noticed that most were small -- 16 MiB at most, but one was huge, at 64 GiB! The code that allocated this BAR was part of the function I'd been staring at. As far as I could tell, the choice of size was pretty arbitrary -- this big memory region was used as backing memory for all sorts of small objects on the fly. On a whim, I tried changing the BAR size from 1ULL << 36 to 1ULL << 26, and recompiled QEMU. The VM booted. The comment above the bar_size definition that I'd been looking at for so long said: /* TODO If the BAR is too large the guest won't have address space to map * it! */ I don't know if that's exactly what went wrong here, though. I suspect it's more like the host architecture doesn't have enough address space? The affected machines all reported 36 bit physical address size, and 48 bit virtual address size. So maybe what's happening is that the processor interprets PCI addresses in the hardware-assisted VM as physical addresses, and therefore runs out of space because all of it is taken up by this one PCI bar? I'm not really sure. Lowering the bar size to 2^35 or 2^34 (has to be a power of two) depending on the QEMU version made the problem go away, and that's good enough for now. I'm not very enthusiastic about this up-front allocation of a huge amount of memory that might not even fit in the available address space. I don't know if there's a better way of doing it in this case, but I certainly hope so. In general I think this perhaps demonstrates why this code is not considered suitable for "production" yet. The bet I'm taking here is that by the time Spectrum is further along, things will have moved on for virtio-vhost-user too. As I said, at some point we will want to implement it in crosvm to avoid having QEMU in the TCB, but it would be a bad idea to do that now while virtio-vhost-user is still going through the back-and-forth of making its way into the Virtio spec. [1]: https://wiki.qemu.org/Features/VirtioVhostUser [2]: https://lore.kernel.org/qemu-devel/87h7u1s5k1.fsf@alyssa.is/T/#u [3]: https://github.com/ndragazis/qemu/blob/f9ab08c0c8/hw/virtio/virtio-vhost-us… DPDK ---- Once I was able to boot a VM with the virtio-vhost-user device, I tried to connect another QEMU VM to it through vhost-user -- I'll want to have this working first as a reference before I start porting Cloud Hypervisor's vhost-user implementation to crosvm. But the "frontend" (vhost-user) QEMU process hung waiting for a reply on the vhost-user socket from the backend one. Not really knowing what to do about this, I decided that maybe I'd been a bit too ambitious in going straight for vhost-user <-> virtio-vhost-user when I'd never actually used vhost-user before, so maybe I should try a more conventional vhost-user setup first. As far as I can tell, vhost-user is usually used for connecting a VM to a userspace networking stack. And usually, this networking stack is DPDK, the "Data Plane Development Kit"[4]. DPDK was also used in the virtio-vhost-user examples, so I figured my next step would be to try it there as well, and therefore it was worth the time in learning how to do a very basic setup with it. Quick start -style documentation for this was pretty lacking, but I did eventually manage to make this work. Here's what I did, for my own future reference as much as anything else: (1) Make some hugepages available. 1GiB for DPDK and 1GiB for QEMU: echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages (2) Take my ethernet interface offline so it could be used with DPDK: nmcli d disconnect enp0s25 (3) Load the vfio-pci module, which allows PCI devices to be exported to userspace rather than managed by the kernel: mobprobe vfio-pci (4) Export the ethernet interface: usertools/dpdk-devbind.py -b vfio-pci enp0s25 (5) Run testpmd, a program that comes with DPDK mostly used for debugging and tracing it seems, but that with no special arguments acts as a simple packet forwarder. Here I create a vhost-user socket, and forward traffic between vhost-user and my ethernet interface: build/app/dpdk-testpmd -l 0,1 -w 00:19.0 \ --vdev net_vhost0,iface=/run/vhost-user0.sock The -w value is the PCI address of the ethernet interface. Note how "00:19" corresponds to "p0s25". (19 in hex is 25 is decimal.) (6) Start a VM. The relevant QEMU flags appear to be: -chardev socket,id=char0,path=/run/vhost-user0.sock \ -netdev type=vhost-user,id=net0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=net0 \ -object memory-backend-file,id=mem0,size=1024M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem0 \ -mem-prealloc I figured this all out mostly from a guide for a DPDK benchmark[5]. I have not yet experimented with variations on the QEMU flags yet. I'm not sure if all the memory flags are required -- -mem-prealloc might just be there because it was important for a benchmark, for example. So this is the point I'm at with this exploration. Next up, I'll be trying with DPDK inside a VM with a virtio-vhost-user device. I think that maybe, despite the virtio-vhost-user device showing up as an ethernet device inside the VM, it needs some special support which is available for DPDK as a patchset, but that has not been written for the kernel yet. I was a bit worried about this, because unlike the kernel, DPDK isn't going to have things that Wi-Fi drivers for all sorts of different hardware, and so using DPDK instead of the kernel network stack would be a problem. But then I learned that DPDK has a component called the Kernel Native Interface (KNI) which allows it to use network interfaces from the kernel, so a hybrid approach would be possible, and is what I think we'll end up using for now. Then, once virtio-vhost-user is a bit more mature, a kernel driver will probably show up, and we can use that instead and drop DPDK. [4]: https://dpdk.org/ [5]: https://doc.dpdk.org/guides/howto/pvp_reference_benchmark.html?highlight=pvp Website ------- I was having a conversation about Spectrum yesterday, and I found myself sending over a bunch of links to articles and papers that I often find myself referring to when talking to somebody about Spectrum. This made me think that maybe there should be some place where we keep all these relevant articles. So I mined the IRC logs, the TWiS archive, and my blog, and added whatever I could pull from my brain, and wrote a Spectrum bibliography, containing 27 links to interesting articles and papers that are particularly relevant to Spectrum. This isn't on the website quite yet, but I did sent this as a patch[6] to the mailing list, if you want an early look. I also posted a patch to fix a minor issue where I'd mistakenly used ".." instead of "." as href values, to no user-visible effect[7]. [6]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726045701.32259-… [7]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726055410.20641-… Documentation ------------- On Monday, I had a call with the Free Software Foundation Europe. They're a part of NGI Zero (where my funding comes from), and they are promoting their new "REUSE" specification[8] for license information in free software projects to NGI Zero projects. It basically covers standardised per-file license and copyright annotations, and a standard way of including license texts. I think this is really cool! It's something I've been unsure of how to handle because it's all vague conventions that are different in different circles, and it's nice to see something formalised about it. They also have an automated tool[9] for checking compliance and semi-automatically adding license information, which is great! So I'm enthusiastically adopting the REUSE specification. I decided that our smaller, first-party repositories (the documentation, the website, etc.) would be a good place to get started, and so I posted a patch[10] that makes the documentation repository REUSE-compliant. [8]: https://reuse.software/ [9]: https://git.fsfe.org/reuse/tool [10]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726105527.27432-… mktuntap -------- I posted a patch[11] to make mktuntap REUSE-compliant. [11]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726110123.30159-… The thing that's most on my mind this week is the extent to which I'm learning about and working on software like QEMU and DPDK that I don't see having a place in Spectrum in the long run. It's counterintuitive, but this is definitely worth it. There's no point writing a kernel driver for virtio-vhost-user (should such a thing be required) right now, because if I use DPDK for now instead, at some point either virtio-vhost-user will end up not being the thing that gets adopted by the ecosystem and we'll have to move to something else, or (more likely) it gets widely adopted and somebody else writes a kernel driver. Similarly, using QEMU for network VMs is the smart choice even though I don't want it to end up in the TCB, because even though I'm probably going to end up implementing virtio-vhost-user in crosvm later, swapping out QEMU is going to be so easy later that it would be a very bad idea to implement that now in case virtio-vhost-user doesn't take off. But it still /feels/ weird to be using QEMU for this stuff, you know?

3 years, 8 months

This Week in Spectrum, 2020-W29

by Alyssa Ross

This has been a week of thinking I wanted to do one thing, not being sure how to do it, and finding out that there was a better way. I'll write it up in the order it happened. crosvm ------ Last week, I described that I wanted to implement a virtio proxy to be able to allow a kernel in an application VM to use a virtual device in another VM. I was wondering how to manage virtio buffers, and thought that I probably wanted an allocator to be able to manage throwing buffers of different sizes around. This turned out to be a case of the XY problem[1]. I couldn't find a good solution, but it turned out that an allocator wasn't what I wanted anyway. edef pointed out that I could just make the shared memory I allocated as big as necessary to hold buffers of the maximum size I wanted to support. The kernel will only actually allocate pages as they are written to, and I could use fallocate[2] with FALLOC_FL_PUNCH_HOLE to tell the kernel it can drop pages when I'm done with them. This would mean that an unusually large buffer would only take up lots of memory while it was in use, and as soon as it was done with, the kernel would be able to take back the memory. So exactly what I wanted from an allocator, but with no need for an allocator at all! This made the implementation much simpler, and by Friday I was able to get the proxy into a state where it could pass unit tests that transported messages in both directions through it. And then it was suggested to me that maybe a virtio proxy is not what I want after all. The main disadvantage to a virtio proxy is that it requires context switching to the host to send data between VMs. This is a trade-off I was aware of, but a virtio proxy is pretty straightforward to write as inter-VM communication systems go, and I was not aware of anything else that would be up to the job. As it turns out, there is something. vhost-user is a mechanism for connecting, say, a virtio device to a userspace network stack in a performant way. I was aware of this, but what I was not aware of was virtio-vhost-user[3]. virtio-vhost-user is a proposed mechanism to allow a VMM to forward a vhost-user backend to a VM. This means that two VMs could directly share virtqueues, with no host copy step. This would mean there would be no opportunity for the host to mediate communication between two guests, but that wasn't really on the cards anyway -- if it's ever required, a virtio proxy would probably be the way to go. For all the other cases, virtio-vhost-user would be a faster, cleaner way of sharing network devices between VMs. The main problem with virtio-vhost-user is that it's still in its infancy. There's a patchset[4] implementing it for QEMU that's a couple of years old, but that has not been accepted upstream. The main blocker for this seems to be first standardising it in the Virtio spec[5][6]. The good news here is that the standardisation process seems to be progressing actively at the moment. It's being discussed on the virtio-dev mailing list basically right now, with the most recent emails dated Friday (unfortunately, I don't know of a good web archive with virtio-dev, but you can find the thread on Gmane if you're interested but not subscribed to the list). The good news is that virtio-vhost-user mostly works by composing things that already exist. There's no kernel work required, because devices are just exposed by the VMM as regular virtio devices. The frontend VM (i.e. the one that uses the virtual device, as opposed to the one that provides it) doesn't need any special virtio-vhost-user support, because it just needs to speak normal vhost-user. Only the backend VM needs support for virtio-vhost-user, because its VMM needs to expose the vhost-user backend from the host to that VM. This means that provisionally using virtio-vhost-user in Spectrum actually looks very feasible, with a couple of compromises. For evaluation purposes, it's not worth writing a virtio-vhost-user device for crosvm. But, the VMs that need that device are the ones that are very specialised -- VMs that manage networking or block devices or similar. So for these VMs, for now, we could use QEMU, with the virtio-vhost-user patch. I investigated what it would take to port it to the most recent QEMU version, and the answer appears to be "not much at all". Obviously having two VMMs in the Trusted Computing Base (TCB) isn't something we'd want in the long term, but it would be fine for, say, reaching the next funding milestone. If we decide that virtio-vhost-user is the way to go after all, support in crosvm can be added then -- in general, adding a new virtio device to crosvm isn't a huge undertaking. Earlier, I said that the application side of the communication doesn't need anything special, because to that it's just regular vhost-user. This is true, but I glossed over there that crosvm doesn't actually implement vhost-user. Implementing vhost-user in crosvm would probably be a big deal at this stage, and not something I feel would be a good use of my time. BUT! Remember, crosvm has two children: Amazon's Firecracker[7], and at so-called "serverless" computing; and Intel's Cloud Hypervisor[8], which aims at traditional, full system server virtualisation. And both of these children inherited the crosvm device model from their parents, and Cloud Hypervisor implements vhost-user[9]. So I _think_ it should be possible to pretty much lift the vhost-user implementation from Cloud Hypervisor, and use it in crosvm. Pretty neat! So, the setup I'd like to evaluate is QEMU with the virtio-vhost-user patch on one side, and crosvm with Cloud Hypervisor's vhost-user implementation on the other. It might well be that there are complications here. If there are, I'll probably just finish the proxy and move on for now, because I want to keep up the pace. I do think that virtio-vhost-user is probably the way to do interguest networking in the long-term, though. Another thing that I've realised is that I don't need to worry about pulling bits out of crosvm to run in other VMs. I focused a lot on that towards the beginning of the year, mostly motivated by Wayland, because the virtio wayland implementation in crosvm is the only one there is. Now that that works in a different way, though, there's no need to continue down this path, because things like networking can be done in more normal ways through virtio and the device VM kernel. [1]: https://en.wikipedia.org/wiki/XY_problem [2]: https://man7.org/linux/man-pages/man2/fallocate.2.html [3]: https://wiki.qemu.org/Features/VirtioVhostUser [4]: https://github.com/stefanha/qemu/compare/master...virtio-vhost-user [5]: https://lists.nongnu.org/archive/html/qemu-devel/2019-04/msg03082.html [6]: https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.… [7]: https://firecracker-microvm.github.io/ [8]: https://github.com/cloud-hypervisor/cloud-hypervisor [9]: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/b4d04bdff6a7e2c3d… Overall, it's been frustrating for me to try things, and discover they're not going to work, or not going to work as well as some other thing, and make a call on whether to keep going on something I know is the worse option or switch to the better thing. I have to keep reminding myself that Spectrum is a research project, and there are always going to be false starts like this. Lots of what we're doing is either very unusual (virtio-vhost-user) or brand new (interguest Wayland), after all.

3 years, 9 months

This Week in Spectrum,

by Alyssa Ross

After I got an isolated Wayland compositor working last week, I wasn't really sure what to do next -- this was a big piece of work that I'd been very focused on for a while. The funding milestone I'm closest to is to do with implementing hardware isolation, which the Wayland work was a part of, so I decided to keep going with that, and explore other types of isolation. More on that in a bit. Wayland ------- Posted my patch for virtio_wl display socket support in libwayland-server[1]. This is what allows it to run in a VM, and receive connections from clients in other VMs. The patch description is very extensive, so I recommend reading it for more detail if you're interested. It introduces a libvirtio_wl, which should also be useful for porting other programs that we might want to communicate with across a VM boundary, if they are written with normal Unix sockets in mind (including transferring file descriptors). This is the evolution of code I previously had put in wlroots, moved to Wayland for convenience. If it ever acquires another user (or maybe even if it doesn't) it might make sense to make it its own package, since virtio_wl is useful even if Wayland isn't involved. [1]: https://spectrum-os.org/lists/archives/spectrum-devel/SJ0PR03MB5581479F3388… crosvm ------ I pushed all my crosvm changes to get the isolated compositor working to the work-in-progress "interguest" branch[2]. Remember, I only got it working last week right before I needed to start writing the TWiS email, so I hadn't even done that yet! I also posted some patches[3] to the list to fix a bug in my previous crosvm deadlock fix, and to improve some related documentation. As usual, these were kindly reviewed by Cole. Next, I turned my attention to other forms of hardware isolation. Wayland was a bit special, because despite crosvm including a virtual "Wayland device", it's not really hardware, and so it required an approach to isolation that will be quite different to other crosvm virtual devices. My hope is that other virtual devices should all be substantially similar to each other. The basic idea for actual hardware isolation is that rather than having drivers in the host kernel for USB, network devices, etc. those will be exposed to dedicated VMs as virtual PCI devices. This should substantially reduce host kernel attack surface. crosvm virtual devices will be run in these device VMs, and communicate over virtio with application VMs as normal. This will require implementing in crosvm a virtio proxy device, than allows for the crosvm running an application VM to forward virtio communication to the virtual device running in userspace in the driver VM. (The reason devices aren't attached to application VMs directly but run in seperate device VMs is that hardware is probably not going to be very happy if multiple kernels are trying to talk to it at the same time. Additionally, this indirection means that application VMs only have to use the one virtio driver for that device category, rather than any of the hundreds of drivers for different hardware in that category. If one of those drivers had a vulnerability, this should help to contain it to the device VM.) So I started writing this virtio proxy. The basic idea is to copy virtio buffers from application VM guest memory into memory that can be shared with the userspace virtual device in the device VM. I can't find any prior art on this (which is not unusual -- not many systems isolate drivers in this way), so this has required a lot of looking back at the virtio paper[4] and spec[5] to make sure I understand what to do here. As I write this, the next problem to solve is integrating some sort of memory allocator that can manage buffer allocations in the shared memory that the virtual device looks at. This is a new area for me that I'd appreciate advice on if anybody can give it -- think of it like, I have a memfd, mmaped into my process, and I would like to dynamically allocate and release memory buffers of dynamic sizes in that region. I'm sure there's a library I'll be able to plug in for this. [2]: https://spectrum-os.org/git/crosvm/?h=interguest [3]: https://spectrum-os.org/lists/archives/spectrum-devel/SJ0PR03MB55819DE7E13B… [4]: https://www.ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf [5]: https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.… As usual, big thank you to Cole for reviewing patches, and for finding room for improvement even in languages/areas he isn't familiar with. It feels nice to have done some thinking about the project at a slightly higher level than I have been recently, and to know where I am on the way to the next milestone. Having taken a lot of time away from the milestone list this year to work on fundamentals, it's good to feel like I'm getting back on track.

3 years, 9 months

Spectrum Discuss search results for query "This Week in Spectrum"