general high-level discussion about spectrum
 help / color / Atom feed
From: Alyssa Ross <>
Subject: This Week in Spectrum, 2020-W27
Date: Mon, 06 Jul 2020 00:02:57 +0000
Message-ID: <> (raw)

[-- Attachment #1: Type: text/plain, Size: 6715 bytes --]

I really didn't want this to be another week where I posted about how I
was still trying to patch Wayland to do virtio_wl, and I am delighted to
have just discovered it's not going to be!


I realised that emulating accept(2) for the Wayland compositor socket in
the way I'd planned would require some crosvm rework.  I want to have a
host proxy program that accepts the connection, then connects the
connection socket to crosvm.  I had made it possible to dynamically add
sockets to the crosvm Wl device through the control socket, but this
turned out not be enough, because crosvm would store virtio_wl sockets
in a BTreeMap<String, PathBuf>, and then use connect(2) to connect to
the socket when asked to by the guest kernel.  This works fine for
e.g. connecting to a host Wayland compositor, which is what crosvm was
designed for, but it wouldn't work for opening a connection socket from
accept(2), because you can only connect to a listening socket.

So instead, I modified the `crosvm wl add' command to take a file
descriptor pointing to the connection socket.  I made crosvm store
sockets as an enum that looks like this:

enum WaylandSocket {

This way, when it gets asked by the VM to connect to a socket, it can
either connect to a listening socket at its path using connect(2), or
just use the existing file descriptor if it's a non-listening socket.  A
NonListening socket will be consumed by a connection, so when the VM
close(2)s it, it'll go away, and on the host side the connection will
finish as expected.  Listening sockets can be connected to repeatedly,
as before.

I also added support to `crosvm add wl' for dynamic socket names.  So
it's possible to do `crosvm add wl wl-conn-%d', and connections will be
added with names like `wl-conn-0', `wl-conn-1', etc.  So it's easy to
get unique names for connection sockets.  The chosen name is printed by
the command, so the caller knows what name to tell the VM to connect to.

I also found and fixed a bug with the previous crosvm deadlock fix[1].
I had assumed that device_sock.recv(&mut []) would drop a message from
the (SOCK_SEQPACKET) socket, without having to read any of it.  But
UnixSeqpacket::recv calls libc::read, and read(2) tells us that:

> In the absence of any errors, or if read() does not check for errors,
> a read() with a count of 0 returns zero and has no other effects.

So this was in fact doing nothing at all.  I don't know why crosvm's
UnixSeqpacket::recv calls read() instead of recv(), but it's always been
like that and I'm guessing this sort of thing (from recv(2)) might have
something to do with it:

> The only difference between recv() and read(2) is the presence of
> flags.  With a zero flags argument, recv() is generally equivalent to
> read(2) (but see NOTES).

So probably read() just looked like a nicer way to recv() when no flags
were needed.

But, unfortunately, zero-byte reads are when the aforementioned NOTES
section becomes relevant:

> If a zero-length datagram is pending, read(2) and recv() with a flags
> argument of zero provide different behavior.  In this circumstance,
> read(2) has no effect (the datagram remains pending), while recv()
> consumes the pending datagram.

So, my assumption that UnixSeqpacket::recv(&mut []) would consume a
message turned out to be quite reasonable -- the surprising thing was
that a method called `recv' would call read() rather than recv().  I
think the best fix here will be to just make it call recv() instead,
rather than modifying my code to do UnixSeqpacket::recv(&mut [0]) or
something, to prevent further nasty surprises with this in future.



I created API-compatible implementations of the libc sendmsg(2) and
recvmsg(2) functions for virtio_wl sockets.  This was quite an
achievement, because the API (which allows you to send and receive data
and file descriptors, as well as other things I don't intend to support)
is rather arcane (see the example in cmsg(3) if you're not familiar with
them).  I wrote unit tests for them, and it took a long time before they
worked reliably.  Once I had these, though, I could find the places
where Wayland called sendmsg() and recvmsg() and fall back to the
virtio_wl-based implementations if the standard functions failed with
ENOTSOCK.  I stubbed out some stuff that isn't going to work over
virtio_wl, like looking up the pid of the Wayland client through

I also had to resort to a few hacks, like faking support for
MSG_DONTWAIT by using fcntl(2) to set O_NONBLOCK on the socket,
recv()ing from it, and then removing O_NONBLOCK again, or faking
mremap(2) by munmap()-ing and mmap()-ing.  We will want to clean these
up later by implementing the required missing functionality in the
virtio_wl kernel module.  In the first case, at least, this should be
pretty straightforward, because it supports non-blocking operations if
the socket is O_NONBLOCK -- it just needs to accept a MSG_DONTWAIT
option as well.  The VIRTWL_IOCTL_{SEND,RECV} syscalls don't currently
have a flags argument, so that'll need to be added.

I implemented this bit by bit, at every step trying to run Alacritty on
my host system, connected to the virtio_wl Wayland server socket through
the accept() proxy, and using strace and some printf()-debugging to see
where the Wayland compositor in the VM would get stuck, and about an
hour ago, it finally worked!  For the first time, a Wayland compositor
running in a VM can display an application running outside of it.
(Obviously we'll want the application to be running in another VM rather
than on the host, but that's similar enough that it probably works
already -- I just haven't tested it yet.)  This feels like a huge
achievement.  I've been working towards it for so long.

Next week, I'll be cleaning up this code and posting patches for all of
it.  Then I'll probably move on to other sorts of device virtualization,
like running a virtual network device in a VM.  I'm feeling so much more
positive about the direction of the project than I was before.  It's
been difficult to make myself keep going making little progress for the
last couple of weeks, and it's great that I've managed to pick things up
again so much.  I hope that the level of detail in this email is enough
to make up for the brevity of last week's!  I'm sending late again, too,
but only by a couple of minutes -- I didn't expect this email to take
over an hour to write, but there we go.

Thanks for reading!  I hope you're looking forward to seeing where
things go from here as much as I am.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

                 reply index

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

general high-level discussion about spectrum

Archives are clonable:
	git clone --mirror spectrum-discuss/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 spectrum-discuss spectrum-discuss/ \
	public-inbox-index spectrum-discuss

Example config snippet for mirrors

Newsgroups are available over NNTP:

AGPL code for this site: