July 2020 - Spectrum Discuss

This Week in Spectrum, 2020-W30
by Alyssa Ross 27 Jul '20

27 Jul '20

Feels like there's a pretty clear path forward. Nice feeling. :) QEMU ---- Last week, as I wrote TWiS, I had just discovered virtio-vhost-user, which looked like a very promising mechanism for getting a VM to take care of networking for other VMs. This week, I've been researching it further, and trying to test and evaluate it. The first thing I tried to do, naturally, was to build the patched QEMU tree and boot a VM with a virtio-vhost-user device attached. This was not as easy as I'd hoped, because adding the virtio-vhost-user device to my QEMU command line made the VM kernel panic at boot, with an error message about an invalid memory access. I spent most of the week trying to figure this out -- I wasn't doing anything different to the example[1] on the QEMU wiki, so it should have worked, and it felt like if I could just get past whatever was going wrong here, it would be worth it, because virtio-vhost-user otherwise seems so suited for what we need here. I emailed the patch author[2], but he didn't know what was up either. An early breakthrough came when I got frustrated with kernel builds taking hours on my 8-year-old laptop, and so decided to work on a more powerful computer instead. Once I got everything set up on that computer, I started up the VM, and it worked. Perhaps in setting it up over here I'd done something different? I copied over the exact VM disk/kernel/initrd/command line that I was running on my laptop, and the other computer booted it just fine. I had -cpu host in the QEMU command line, so I thought maybe the different kind of virtual CPU was causing it. Tried setting it to a specific value on both machines, and still the laptop VM panicked and the other didn't. So it sounded like whether it worked or not depended on the host hardware. I put together a Nix derivation that would automatically build the custom QEMU and output a script that would run a VM, and then asked people in #spectrum to test it out on various computers. After getting some further data, a pattern started to emerge, where Intel processors Ivy Bridge and older would fail, and Skylake and newer would succeed (I didn't encounter any AMD processors that failed, nor did I have data at the time for generations between Ivy Bridge and Skylake). This theory had a convenient explanation for why nobody else had seen this problem -- I doubt people at Red Hat are working on 7-year-old hardware. This was a good clue, but still didn't put me much closer to having a working system. I do have a more recent laptop around, but for reasons that are out of scope here it would be very inconvenient to decide to just move over to it. I could see that the kernel was panicking the first time it tried to access the PCI BARs of the virtio-vhost-user, which led me to believe that the problem was probably in how that memory was being set up. I found the function that did that[3], and stared at it for a long time. I tried to read the rest of the QEMU code, but it became clear that my domain knowledge here isn't good enough to be able to keep track of what's meant to be happening. I added some debug prints, which were vaguely helpful in making that understanding a little better. I was hoping to find the guest address each PCI BAR was mapped to so that I could check the kernel was trying to write to the right location, but didn't manage to do that. While attempting to, though, I did add a debug print that printed the size of each PCI bar as it was allocated. I noticed that most were small -- 16 MiB at most, but one was huge, at 64 GiB! The code that allocated this BAR was part of the function I'd been staring at. As far as I could tell, the choice of size was pretty arbitrary -- this big memory region was used as backing memory for all sorts of small objects on the fly. On a whim, I tried changing the BAR size from 1ULL << 36 to 1ULL << 26, and recompiled QEMU. The VM booted. The comment above the bar_size definition that I'd been looking at for so long said: /* TODO If the BAR is too large the guest won't have address space to map * it! */ I don't know if that's exactly what went wrong here, though. I suspect it's more like the host architecture doesn't have enough address space? The affected machines all reported 36 bit physical address size, and 48 bit virtual address size. So maybe what's happening is that the processor interprets PCI addresses in the hardware-assisted VM as physical addresses, and therefore runs out of space because all of it is taken up by this one PCI bar? I'm not really sure. Lowering the bar size to 2^35 or 2^34 (has to be a power of two) depending on the QEMU version made the problem go away, and that's good enough for now. I'm not very enthusiastic about this up-front allocation of a huge amount of memory that might not even fit in the available address space. I don't know if there's a better way of doing it in this case, but I certainly hope so. In general I think this perhaps demonstrates why this code is not considered suitable for "production" yet. The bet I'm taking here is that by the time Spectrum is further along, things will have moved on for virtio-vhost-user too. As I said, at some point we will want to implement it in crosvm to avoid having QEMU in the TCB, but it would be a bad idea to do that now while virtio-vhost-user is still going through the back-and-forth of making its way into the Virtio spec. [1]: https://wiki.qemu.org/Features/VirtioVhostUser [2]: https://lore.kernel.org/qemu-devel/87h7u1s5k1.fsf@alyssa.is/T/#u [3]: https://github.com/ndragazis/qemu/blob/f9ab08c0c8/hw/virtio/virtio-vhost-us… DPDK ---- Once I was able to boot a VM with the virtio-vhost-user device, I tried to connect another QEMU VM to it through vhost-user -- I'll want to have this working first as a reference before I start porting Cloud Hypervisor's vhost-user implementation to crosvm. But the "frontend" (vhost-user) QEMU process hung waiting for a reply on the vhost-user socket from the backend one. Not really knowing what to do about this, I decided that maybe I'd been a bit too ambitious in going straight for vhost-user <-> virtio-vhost-user when I'd never actually used vhost-user before, so maybe I should try a more conventional vhost-user setup first. As far as I can tell, vhost-user is usually used for connecting a VM to a userspace networking stack. And usually, this networking stack is DPDK, the "Data Plane Development Kit"[4]. DPDK was also used in the virtio-vhost-user examples, so I figured my next step would be to try it there as well, and therefore it was worth the time in learning how to do a very basic setup with it. Quick start -style documentation for this was pretty lacking, but I did eventually manage to make this work. Here's what I did, for my own future reference as much as anything else: (1) Make some hugepages available. 1GiB for DPDK and 1GiB for QEMU: echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages (2) Take my ethernet interface offline so it could be used with DPDK: nmcli d disconnect enp0s25 (3) Load the vfio-pci module, which allows PCI devices to be exported to userspace rather than managed by the kernel: mobprobe vfio-pci (4) Export the ethernet interface: usertools/dpdk-devbind.py -b vfio-pci enp0s25 (5) Run testpmd, a program that comes with DPDK mostly used for debugging and tracing it seems, but that with no special arguments acts as a simple packet forwarder. Here I create a vhost-user socket, and forward traffic between vhost-user and my ethernet interface: build/app/dpdk-testpmd -l 0,1 -w 00:19.0 \ --vdev net_vhost0,iface=/run/vhost-user0.sock The -w value is the PCI address of the ethernet interface. Note how "00:19" corresponds to "p0s25". (19 in hex is 25 is decimal.) (6) Start a VM. The relevant QEMU flags appear to be: -chardev socket,id=char0,path=/run/vhost-user0.sock \ -netdev type=vhost-user,id=net0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=net0 \ -object memory-backend-file,id=mem0,size=1024M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem0 \ -mem-prealloc I figured this all out mostly from a guide for a DPDK benchmark[5]. I have not yet experimented with variations on the QEMU flags yet. I'm not sure if all the memory flags are required -- -mem-prealloc might just be there because it was important for a benchmark, for example. So this is the point I'm at with this exploration. Next up, I'll be trying with DPDK inside a VM with a virtio-vhost-user device. I think that maybe, despite the virtio-vhost-user device showing up as an ethernet device inside the VM, it needs some special support which is available for DPDK as a patchset, but that has not been written for the kernel yet. I was a bit worried about this, because unlike the kernel, DPDK isn't going to have things that Wi-Fi drivers for all sorts of different hardware, and so using DPDK instead of the kernel network stack would be a problem. But then I learned that DPDK has a component called the Kernel Native Interface (KNI) which allows it to use network interfaces from the kernel, so a hybrid approach would be possible, and is what I think we'll end up using for now. Then, once virtio-vhost-user is a bit more mature, a kernel driver will probably show up, and we can use that instead and drop DPDK. [4]: https://dpdk.org/ [5]: https://doc.dpdk.org/guides/howto/pvp_reference_benchmark.html?highlight=pvp Website ------- I was having a conversation about Spectrum yesterday, and I found myself sending over a bunch of links to articles and papers that I often find myself referring to when talking to somebody about Spectrum. This made me think that maybe there should be some place where we keep all these relevant articles. So I mined the IRC logs, the TWiS archive, and my blog, and added whatever I could pull from my brain, and wrote a Spectrum bibliography, containing 27 links to interesting articles and papers that are particularly relevant to Spectrum. This isn't on the website quite yet, but I did sent this as a patch[6] to the mailing list, if you want an early look. I also posted a patch to fix a minor issue where I'd mistakenly used ".." instead of "." as href values, to no user-visible effect[7]. [6]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726045701.32259-… [7]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726055410.20641-… Documentation ------------- On Monday, I had a call with the Free Software Foundation Europe. They're a part of NGI Zero (where my funding comes from), and they are promoting their new "REUSE" specification[8] for license information in free software projects to NGI Zero projects. It basically covers standardised per-file license and copyright annotations, and a standard way of including license texts. I think this is really cool! It's something I've been unsure of how to handle because it's all vague conventions that are different in different circles, and it's nice to see something formalised about it. They also have an automated tool[9] for checking compliance and semi-automatically adding license information, which is great! So I'm enthusiastically adopting the REUSE specification. I decided that our smaller, first-party repositories (the documentation, the website, etc.) would be a good place to get started, and so I posted a patch[10] that makes the documentation repository REUSE-compliant. [8]: https://reuse.software/ [9]: https://git.fsfe.org/reuse/tool [10]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726105527.27432-… mktuntap -------- I posted a patch[11] to make mktuntap REUSE-compliant. [11]: https://spectrum-os.org/lists/archives/spectrum-devel/20200726110123.30159-… The thing that's most on my mind this week is the extent to which I'm learning about and working on software like QEMU and DPDK that I don't see having a place in Spectrum in the long run. It's counterintuitive, but this is definitely worth it. There's no point writing a kernel driver for virtio-vhost-user (should such a thing be required) right now, because if I use DPDK for now instead, at some point either virtio-vhost-user will end up not being the thing that gets adopted by the ecosystem and we'll have to move to something else, or (more likely) it gets widely adopted and somebody else writes a kernel driver. Similarly, using QEMU for network VMs is the smart choice even though I don't want it to end up in the TCB, because even though I'm probably going to end up implementing virtio-vhost-user in crosvm later, swapping out QEMU is going to be so easy later that it would be a very bad idea to implement that now in case virtio-vhost-user doesn't take off. But it still /feels/ weird to be using QEMU for this stuff, you know?

1 0

This Week in Spectrum, 2020-W29
by Alyssa Ross 20 Jul '20

20 Jul '20

This has been a week of thinking I wanted to do one thing, not being sure how to do it, and finding out that there was a better way. I'll write it up in the order it happened. crosvm ------ Last week, I described that I wanted to implement a virtio proxy to be able to allow a kernel in an application VM to use a virtual device in another VM. I was wondering how to manage virtio buffers, and thought that I probably wanted an allocator to be able to manage throwing buffers of different sizes around. This turned out to be a case of the XY problem[1]. I couldn't find a good solution, but it turned out that an allocator wasn't what I wanted anyway. edef pointed out that I could just make the shared memory I allocated as big as necessary to hold buffers of the maximum size I wanted to support. The kernel will only actually allocate pages as they are written to, and I could use fallocate[2] with FALLOC_FL_PUNCH_HOLE to tell the kernel it can drop pages when I'm done with them. This would mean that an unusually large buffer would only take up lots of memory while it was in use, and as soon as it was done with, the kernel would be able to take back the memory. So exactly what I wanted from an allocator, but with no need for an allocator at all! This made the implementation much simpler, and by Friday I was able to get the proxy into a state where it could pass unit tests that transported messages in both directions through it. And then it was suggested to me that maybe a virtio proxy is not what I want after all. The main disadvantage to a virtio proxy is that it requires context switching to the host to send data between VMs. This is a trade-off I was aware of, but a virtio proxy is pretty straightforward to write as inter-VM communication systems go, and I was not aware of anything else that would be up to the job. As it turns out, there is something. vhost-user is a mechanism for connecting, say, a virtio device to a userspace network stack in a performant way. I was aware of this, but what I was not aware of was virtio-vhost-user[3]. virtio-vhost-user is a proposed mechanism to allow a VMM to forward a vhost-user backend to a VM. This means that two VMs could directly share virtqueues, with no host copy step. This would mean there would be no opportunity for the host to mediate communication between two guests, but that wasn't really on the cards anyway -- if it's ever required, a virtio proxy would probably be the way to go. For all the other cases, virtio-vhost-user would be a faster, cleaner way of sharing network devices between VMs. The main problem with virtio-vhost-user is that it's still in its infancy. There's a patchset[4] implementing it for QEMU that's a couple of years old, but that has not been accepted upstream. The main blocker for this seems to be first standardising it in the Virtio spec[5][6]. The good news here is that the standardisation process seems to be progressing actively at the moment. It's being discussed on the virtio-dev mailing list basically right now, with the most recent emails dated Friday (unfortunately, I don't know of a good web archive with virtio-dev, but you can find the thread on Gmane if you're interested but not subscribed to the list). The good news is that virtio-vhost-user mostly works by composing things that already exist. There's no kernel work required, because devices are just exposed by the VMM as regular virtio devices. The frontend VM (i.e. the one that uses the virtual device, as opposed to the one that provides it) doesn't need any special virtio-vhost-user support, because it just needs to speak normal vhost-user. Only the backend VM needs support for virtio-vhost-user, because its VMM needs to expose the vhost-user backend from the host to that VM. This means that provisionally using virtio-vhost-user in Spectrum actually looks very feasible, with a couple of compromises. For evaluation purposes, it's not worth writing a virtio-vhost-user device for crosvm. But, the VMs that need that device are the ones that are very specialised -- VMs that manage networking or block devices or similar. So for these VMs, for now, we could use QEMU, with the virtio-vhost-user patch. I investigated what it would take to port it to the most recent QEMU version, and the answer appears to be "not much at all". Obviously having two VMMs in the Trusted Computing Base (TCB) isn't something we'd want in the long term, but it would be fine for, say, reaching the next funding milestone. If we decide that virtio-vhost-user is the way to go after all, support in crosvm can be added then -- in general, adding a new virtio device to crosvm isn't a huge undertaking. Earlier, I said that the application side of the communication doesn't need anything special, because to that it's just regular vhost-user. This is true, but I glossed over there that crosvm doesn't actually implement vhost-user. Implementing vhost-user in crosvm would probably be a big deal at this stage, and not something I feel would be a good use of my time. BUT! Remember, crosvm has two children: Amazon's Firecracker[7], and at so-called "serverless" computing; and Intel's Cloud Hypervisor[8], which aims at traditional, full system server virtualisation. And both of these children inherited the crosvm device model from their parents, and Cloud Hypervisor implements vhost-user[9]. So I _think_ it should be possible to pretty much lift the vhost-user implementation from Cloud Hypervisor, and use it in crosvm. Pretty neat! So, the setup I'd like to evaluate is QEMU with the virtio-vhost-user patch on one side, and crosvm with Cloud Hypervisor's vhost-user implementation on the other. It might well be that there are complications here. If there are, I'll probably just finish the proxy and move on for now, because I want to keep up the pace. I do think that virtio-vhost-user is probably the way to do interguest networking in the long-term, though. Another thing that I've realised is that I don't need to worry about pulling bits out of crosvm to run in other VMs. I focused a lot on that towards the beginning of the year, mostly motivated by Wayland, because the virtio wayland implementation in crosvm is the only one there is. Now that that works in a different way, though, there's no need to continue down this path, because things like networking can be done in more normal ways through virtio and the device VM kernel. [1]: https://en.wikipedia.org/wiki/XY_problem [2]: https://man7.org/linux/man-pages/man2/fallocate.2.html [3]: https://wiki.qemu.org/Features/VirtioVhostUser [4]: https://github.com/stefanha/qemu/compare/master...virtio-vhost-user [5]: https://lists.nongnu.org/archive/html/qemu-devel/2019-04/msg03082.html [6]: https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.… [7]: https://firecracker-microvm.github.io/ [8]: https://github.com/cloud-hypervisor/cloud-hypervisor [9]: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/b4d04bdff6a7e2c3d… Overall, it's been frustrating for me to try things, and discover they're not going to work, or not going to work as well as some other thing, and make a call on whether to keep going on something I know is the worse option or switch to the better thing. I have to keep reminding myself that Spectrum is a research project, and there are always going to be false starts like this. Lots of what we're doing is either very unusual (virtio-vhost-user) or brand new (interguest Wayland), after all.

1 0

This Week in Spectrum,
by Alyssa Ross 13 Jul '20

13 Jul '20

After I got an isolated Wayland compositor working last week, I wasn't really sure what to do next -- this was a big piece of work that I'd been very focused on for a while. The funding milestone I'm closest to is to do with implementing hardware isolation, which the Wayland work was a part of, so I decided to keep going with that, and explore other types of isolation. More on that in a bit. Wayland ------- Posted my patch for virtio_wl display socket support in libwayland-server[1]. This is what allows it to run in a VM, and receive connections from clients in other VMs. The patch description is very extensive, so I recommend reading it for more detail if you're interested. It introduces a libvirtio_wl, which should also be useful for porting other programs that we might want to communicate with across a VM boundary, if they are written with normal Unix sockets in mind (including transferring file descriptors). This is the evolution of code I previously had put in wlroots, moved to Wayland for convenience. If it ever acquires another user (or maybe even if it doesn't) it might make sense to make it its own package, since virtio_wl is useful even if Wayland isn't involved. [1]: https://spectrum-os.org/lists/archives/spectrum-devel/SJ0PR03MB5581479F3388… crosvm ------ I pushed all my crosvm changes to get the isolated compositor working to the work-in-progress "interguest" branch[2]. Remember, I only got it working last week right before I needed to start writing the TWiS email, so I hadn't even done that yet! I also posted some patches[3] to the list to fix a bug in my previous crosvm deadlock fix, and to improve some related documentation. As usual, these were kindly reviewed by Cole. Next, I turned my attention to other forms of hardware isolation. Wayland was a bit special, because despite crosvm including a virtual "Wayland device", it's not really hardware, and so it required an approach to isolation that will be quite different to other crosvm virtual devices. My hope is that other virtual devices should all be substantially similar to each other. The basic idea for actual hardware isolation is that rather than having drivers in the host kernel for USB, network devices, etc. those will be exposed to dedicated VMs as virtual PCI devices. This should substantially reduce host kernel attack surface. crosvm virtual devices will be run in these device VMs, and communicate over virtio with application VMs as normal. This will require implementing in crosvm a virtio proxy device, than allows for the crosvm running an application VM to forward virtio communication to the virtual device running in userspace in the driver VM. (The reason devices aren't attached to application VMs directly but run in seperate device VMs is that hardware is probably not going to be very happy if multiple kernels are trying to talk to it at the same time. Additionally, this indirection means that application VMs only have to use the one virtio driver for that device category, rather than any of the hundreds of drivers for different hardware in that category. If one of those drivers had a vulnerability, this should help to contain it to the device VM.) So I started writing this virtio proxy. The basic idea is to copy virtio buffers from application VM guest memory into memory that can be shared with the userspace virtual device in the device VM. I can't find any prior art on this (which is not unusual -- not many systems isolate drivers in this way), so this has required a lot of looking back at the virtio paper[4] and spec[5] to make sure I understand what to do here. As I write this, the next problem to solve is integrating some sort of memory allocator that can manage buffer allocations in the shared memory that the virtual device looks at. This is a new area for me that I'd appreciate advice on if anybody can give it -- think of it like, I have a memfd, mmaped into my process, and I would like to dynamically allocate and release memory buffers of dynamic sizes in that region. I'm sure there's a library I'll be able to plug in for this. [2]: https://spectrum-os.org/git/crosvm/?h=interguest [3]: https://spectrum-os.org/lists/archives/spectrum-devel/SJ0PR03MB55819DE7E13B… [4]: https://www.ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf [5]: https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.… As usual, big thank you to Cole for reviewing patches, and for finding room for improvement even in languages/areas he isn't familiar with. It feels nice to have done some thinking about the project at a slightly higher level than I have been recently, and to know where I am on the way to the next milestone. Having taken a lot of time away from the milestone list this year to work on fundamentals, it's good to feel like I'm getting back on track.

1 0

This Week in Spectrum, 2020-W27
by Alyssa Ross 06 Jul '20

06 Jul '20

I really didn't want this to be another week where I posted about how I was still trying to patch Wayland to do virtio_wl, and I am delighted to have just discovered it's not going to be! crosvm ------ I realised that emulating accept(2) for the Wayland compositor socket in the way I'd planned would require some crosvm rework. I want to have a host proxy program that accepts the connection, then connects the connection socket to crosvm. I had made it possible to dynamically add sockets to the crosvm Wl device through the control socket, but this turned out not be enough, because crosvm would store virtio_wl sockets in a BTreeMap<String, PathBuf>, and then use connect(2) to connect to the socket when asked to by the guest kernel. This works fine for e.g. connecting to a host Wayland compositor, which is what crosvm was designed for, but it wouldn't work for opening a connection socket from accept(2), because you can only connect to a listening socket. So instead, I modified the `crosvm wl add' command to take a file descriptor pointing to the connection socket. I made crosvm store sockets as an enum that looks like this: enum WaylandSocket { Listening(PathBuf), NonListening(UnixStream), } This way, when it gets asked by the VM to connect to a socket, it can either connect to a listening socket at its path using connect(2), or just use the existing file descriptor if it's a non-listening socket. A NonListening socket will be consumed by a connection, so when the VM close(2)s it, it'll go away, and on the host side the connection will finish as expected. Listening sockets can be connected to repeatedly, as before. I also added support to `crosvm add wl' for dynamic socket names. So it's possible to do `crosvm add wl wl-conn-%d', and connections will be added with names like `wl-conn-0', `wl-conn-1', etc. So it's easy to get unique names for connection sockets. The chosen name is printed by the command, so the caller knows what name to tell the VM to connect to. I also found and fixed a bug with the previous crosvm deadlock fix[1]. I had assumed that device_sock.recv(&mut []) would drop a message from the (SOCK_SEQPACKET) socket, without having to read any of it. But UnixSeqpacket::recv calls libc::read, and read(2) tells us that: > In the absence of any errors, or if read() does not check for errors, > a read() with a count of 0 returns zero and has no other effects. So this was in fact doing nothing at all. I don't know why crosvm's UnixSeqpacket::recv calls read() instead of recv(), but it's always been like that and I'm guessing this sort of thing (from recv(2)) might have something to do with it: > The only difference between recv() and read(2) is the presence of > flags. With a zero flags argument, recv() is generally equivalent to > read(2) (but see NOTES). So probably read() just looked like a nicer way to recv() when no flags were needed. But, unfortunately, zero-byte reads are when the aforementioned NOTES section becomes relevant: > If a zero-length datagram is pending, read(2) and recv() with a flags > argument of zero provide different behavior. In this circumstance, > read(2) has no effect (the datagram remains pending), while recv() > consumes the pending datagram. So, my assumption that UnixSeqpacket::recv(&mut []) would consume a message turned out to be quite reasonable -- the surprising thing was that a method called `recv' would call read() rather than recv(). I think the best fix here will be to just make it call recv() instead, rather than modifying my code to do UnixSeqpacket::recv(&mut [0]) or something, to prevent further nasty surprises with this in future. [1]: https://spectrum-os.org/lists/archives/spectrum-devel/20200614114344.22642-… Wayland ------- I created API-compatible implementations of the libc sendmsg(2) and recvmsg(2) functions for virtio_wl sockets. This was quite an achievement, because the API (which allows you to send and receive data and file descriptors, as well as other things I don't intend to support) is rather arcane (see the example in cmsg(3) if you're not familiar with them). I wrote unit tests for them, and it took a long time before they worked reliably. Once I had these, though, I could find the places where Wayland called sendmsg() and recvmsg() and fall back to the virtio_wl-based implementations if the standard functions failed with ENOTSOCK. I stubbed out some stuff that isn't going to work over virtio_wl, like looking up the pid of the Wayland client through getsockopt(2). I also had to resort to a few hacks, like faking support for MSG_DONTWAIT by using fcntl(2) to set O_NONBLOCK on the socket, recv()ing from it, and then removing O_NONBLOCK again, or faking mremap(2) by munmap()-ing and mmap()-ing. We will want to clean these up later by implementing the required missing functionality in the virtio_wl kernel module. In the first case, at least, this should be pretty straightforward, because it supports non-blocking operations if the socket is O_NONBLOCK -- it just needs to accept a MSG_DONTWAIT option as well. The VIRTWL_IOCTL_{SEND,RECV} syscalls don't currently have a flags argument, so that'll need to be added. I implemented this bit by bit, at every step trying to run Alacritty on my host system, connected to the virtio_wl Wayland server socket through the accept() proxy, and using strace and some printf()-debugging to see where the Wayland compositor in the VM would get stuck, and about an hour ago, it finally worked! For the first time, a Wayland compositor running in a VM can display an application running outside of it. (Obviously we'll want the application to be running in another VM rather than on the host, but that's similar enough that it probably works already -- I just haven't tested it yet.) This feels like a huge achievement. I've been working towards it for so long. Next week, I'll be cleaning up this code and posting patches for all of it. Then I'll probably move on to other sorts of device virtualization, like running a virtual network device in a VM. I'm feeling so much more positive about the direction of the project than I was before. It's been difficult to make myself keep going making little progress for the last couple of weeks, and it's great that I've managed to pick things up again so much. I hope that the level of detail in this email is enough to make up for the brevity of last week's! I'm sending late again, too, but only by a couple of minutes -- I didn't expect this email to take over an hour to write, but there we go. Thanks for reading! I hope you're looking forward to seeing where things go from here as much as I am.

1 0