On Tue, 22 Mar 2022 at 20:05, Thomas Leonard
On Tue, 22 Mar 2022 at 11:16, Alyssa Ross
wrote:On Tue, Mar 22, 2022 at 11:08:15AM +0000, Thomas Leonard wrote:
On Mon, 21 Mar 2022 at 16:05, Alyssa Ross
wrote:On Mon, Mar 21, 2022 at 12:10:43PM +0000, Thomas Leonard wrote:
Looking at the Linux virtio_gpu driver, it seems that using contexts requires virgl:
static int virtio_gpu_context_init_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { ... if (!vgdev->has_context_init || !vgdev->has_virgl_3d) return -EINVAL;
https://github.com/torvalds/linux/blob/f443e374ae131c168a065ea1748feac6b2e76...
I think perhaps that crosvm is compiled without the "virgl_renderer" feature (it's not in the default set), and this is causing it to crash because that's also "self.default_component". I don't know how to compile crosvm with virgl enabled, though.
It wasn't easy, but I got it to build[1]. I hope that helps. It adds both virgl_renderer and virgl_renderer_next. I think virgl_renderer is on by default with --gpu, and virgl_renderer_next is used with the --gpu-render-server argument. Hopefully at least one of those does the right thing — let me know!
Thanks, that is very helpful!
I gave it a try, and it got a little further. But now, doing `modprobe virtio_gpu` in the VM crashes crosvm with:
Stack trace of thread 2: #0 0x00007fa5fd0915f6 abort (libc.so.6 + 0x265f6) #1 0x00007fa5fcfc6bfd get_dlopen_handle.part.0 (libepoxy.so.0 + 0xc7bfd) #2 0x00007fa5fcfc7366 epoxy_egl_dlsym (libepoxy.so.0 + 0xc8366)
[...]
It looks like it should be printing a message to stderr before calling abort, but I don't see it (https://github.com/anholt/libepoxy/blob/1.5.9/src/dispatch_common.c#L315).
Did you try --disable-sandbox, like I suggested in my other mail? The sandbox blocks writing error messages, and is something I frequently trip over when trying to use crosvm.
It's not very easy because --disable-sandbox seems to conflict with --shared-dir, which I use for lots of things.
I got around this by changing `create_gpu_device` to use `let jail = None;`, so only the GPU device isn't jailed. I suspect the minijail config needs updating for NixOS (e.g. https://github.com/google/crosvm/blob/main/src/linux/gpu.rs#L82). I tried, but failed, to figure out the protocol. I did manage to get a test application showing a little animation, but it crashes after a few seconds. The basic idea seems to be: 1. You allocate a page of memory shared with the crosvm on the host. 2. You tell crosvm to read messages from the host compositor and write them to this page. 3. After doing this, crosvm signals the guest, which reads the data. The shared page is referred to as a "ring", but it's not used as a ring buffer. The host always writes to the start of it. Separately, to allocate an image buffer: 1. You tell crosvm the width and height, etc. 2. It assigns a blob_id and writes it to the shared page. 3. You wait for the operation to complete, then use the blob_id to create the buffer. The problem is that both these operations write to the same page, and they race! So sometimes the image information overwrites the Wayland data, or the Wayland data overwrites the image information, and then it crashes. This image_query function shows the problem: https://chromium.googlesource.com/chromiumos/platform2/+/refs/heads/main/vm_... It asks for the image information to be written to "ring_addr_" and then reads it from there. But at the moment when the function is called, ring_addr_ may contain Wayland protocol data that hasn't been read yet. I didn't test it with Sommelier, but that's the problem I had in my code and I don't see how Sommelier's code can work in general. Sommelier caches the results, so it might not hit this case too often. I didn't use a cache, and also added a small sleep to my code to make the problem easier to reproduce. Anyone have any ideas how this is supposed to work? -- talex5 (GitHub/Twitter) http://roscidus.com/blog/