From: Alyssa Ross <hi@alyssa.is>
To: discuss@spectrum-os.org, devel@spectrum-os.org
Subject: [DEMO] virtio-vhost-user between QEMU and crosvm
Date: Thu, 13 May 2021 12:41:03 +0000 [thread overview]
Message-ID: <878s4i3ixs.fsf@alyssa.is> (raw)
[-- Attachment #1: Type: text/plain, Size: 15864 bytes --]
Introduction
------------
Virtio-vhost-user[1] is a promising virtualisation technology that allows
virtual devices that are exposed to VMs to themselves be implemented
in VMs.
Let's break down its name a bit to understand how it works:
* Virtio[2] is a standard driver interface for virtualisation.
Interfaces are available for all sorts of types of virtual devices,
e.g. virtio-net, virtio-blk, and virtio-scsi. Typically, virtio
devices are implemented by a virtual machine monitor (VMM).
* Vhost[3] is kernelspace implementation of virtio virtual devices,
created for their performance benefit. Instead of implementing the
virtual devices itself, the VMM talks to the kernel implementation
of them using a special ioctl protocol.
* Vhost-user[4] allows another process to implement the vhost
protocol, instead of the kernel, by using a UNIX socket instead of
ioctls on a special character device. This doesn't provide the raw
performance of vhost, but it serves a different purpose -- it
allows virtual devices to be implemented by external programs, in a
standardised way so they're portable between VMMs.
Virtio-vhost-user allows the program implementing the virtual device
to run in a VM of its own, by having the VMM for that VM create the
vhost-user socket, and transferring messages over it to its guest
using virtio. This is exciting for Spectrum, because it would mean
that the host system doesn't have to interact with physical hardware
directly beyond the PCI level, and can instead pass it through to a
VM, which is responsible for implementing the virtual device backed by
that physical hardware, which can be exposed to other VMs.
Last year I spent a while looking into virtio-vhost-user[5][6][7].
It's a long way from being ready to use, and it seems to be maturing
very slowly. It might be useful to us eventually for driver
isolation, or something else might come along. My conclusion from my
research was that we should decide later, once the ecosystem has had a
chance to develop. But I wanted something to come out of the research
I did anyway, and so I've prepared a demonstration.
[1]: https://wiki.qemu.org/Features/VirtioVhostUser
[2]: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html
[3]: https://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html
[4]: https://qemu.readthedocs.io/en/latest/interop/vhost-user.html
[5]: https://spectrum-os.org/lists/archives/spectrum-devel/87pn8rezqn.fsf@alyssa.is/
[6]: https://spectrum-os.org/lists/archives/spectrum-devel/87blk1pwph.fsf@alyssa.is/
[7]: https://spectrum-os.org/lists/archives/spectrum-devel/87wo2glkg0.fsf@alyssa.is/
What the demo does
------------------
Using any sort of non-host-based virtual device implementation is
going to have to start with taking the virtual device implementation
out of the VMM running an application VM, and vhost-user the clear
solution to this. Vhost-user isn't supported by crosvm -- its focus
is on doing all the virtualisation required by Chromium OS, so there's
no need for it to allow other programs to provide virtual device
implementations. So another part of the research I did was to try to
port the vhost-user implementation from cloud-hypervisor to crosvm,
which I was able to do successfully[8]. This has implications beyond
vhost-user, and even beyond crosvm, because it demonstrates that it's
practical to port features between rust-vmm[9] VMMs, which means we
don't have to worry about finding one that provides every feature we
need (which is just as well, because there isn't one).
So here I demonstrate not just a "standard" virtio-vhost-user setup
(to the extent that such a thing can exist at this early stage), but
also that my patched crosvm with vhost-user support is capable of
interoperating with the experimental virtio-vhost-user implementation
for QEMU, because both are speaking the standardised vhost-user
protocol.
The demo sets up two VMs. One is run with my patched crosvm, and
expects a virtual ethernet device to be provided by a vhost-user
socket. When it boots, it brings up its network interface, tries to
run a DHCP client, and then exits. The other VM is run with Nikos
Dragazis and Stefan Hajnoczi's experimental virtio-vhost-user
implementation in QEMU[10]. It gets a standard virtual ethernet
device (backed by a TAP device on the host), and the virtio-vhost-user
device hooked up to the socket that crosvm will be connecting to.
Inside the VM, a userspace networking stack (DPDK, again modified to
support virtio-vhost-user by Nikos Dragazis[11]) implements the device
side of virtio-vhost-user, and forwards packets sent by crosvm's guest
to the virtual ethernet device backed by the host TAP.
+-----------------------------------------------------------------------------+
| |
| +------------------------+ +------------------------+ |
| | | | | |
| | +----------------+ | | +----------------+ | |
| | | | | | | | | |
| +-----+ | | +--------+ | | | | | | |
| | TAP +------+---+---+ DPDK +---+---+------+---+ | | |
| +-----+ | | +--------+ | | | | | | |
| | | | | | | | | |
| | | Linux | | | | Linux | | |
| | +----------------+ | | +----------------+ | |
| | | | | |
| | QEMU | | crosvm | |
| +------------------------+ +------------------------+ |
| |
| Linux |
+-----------------------------------------------------------------------------+
A complicating factor is that the virtio-vhost-user implementation for
DPDK only supports outgoing traffic[12]. So packets coming from
crosvm will be relayed to the TAP, but not the other way around. This
means that we can't just use ping inside the crosvm VM to verify that
the connection is working. Instead, we have to tcpdump on the host
and verify that the packets the DHCP client inside the crosvm VM is
sending are arriving on the TAP.
For this to be useful for our intended purpose of isolating drivers
for physical devices, we'd pass through the device here rather than
using a TAP. It would otherwise work exactly the same, but it's more
difficult to test it's working correctly. (I have tested it though --
for the first version of this I got working last year, I verified it
worked by checking the logs of my local network's DHCP server.)
[8]: https://spectrum-os.org/lists/archives/spectrum-devel/20210512170812.192540-1-hi@alyssa.is/
[9]: https://github.com/rust-vmm
[10]: https://github.com/ndragazis/qemu/tree/virtio-vhost-user
[11]: https://github.com/ndragazis/dpdk-next-virtio/tree/virtio-vhost-user
[12]: https://github.com/ndragazis/dpdk-next-virtio/blob/2d60e63/drivers/virtio_vhost_user/trans_virtio_vhost_user.c#L379
Running the demo
----------------
First, create a TAP device for QEMU to use:
# ip tuntap add qemutap mode tap
# ip link set qemutap up
Start tcpdump, so we can see if packets arrive on the TAP:
# tcpdump -i qemutap
Start the QEMU VM:
$ $(nix-build -A qemuVm /path/to/demo.nix)
When you see "Press enter to exit", DPDK is ready to receive a
virtio-vhost-user connection.
Start the crosvm VM:
$ $(nix-build -A crosvmVm /path/to/demo.nix)
Once that VM boots, you should see some "BOOTP/DHCP" lines in the
tcpdump output. This demonstrates that traffic from the crosvm guest
has been relayed over virtio-vhost-user to DPDK, and then to the TAP
on the host over virtio-net.
You'll want to press enter to shut down the QEMU VM now, because DPDK
pegs a CPU core (for reasons[*] unrelated to virtio-vhost-user that
are out of scope here).
Then you can remove the TAP device:
# ip link delete qemutap
Nix expression for the demo
---------------------------
# SPDX-License-Identifier: MIT OR Apache-2.0
# SPDX-FileCopyrightText: 2021 Alyssa Ross <hi@alyssa.is>
let
pinned = builtins.fetchTarball {
url = "https://github.com/NixOS/nixpkgs/tarball/b14062b75c4e8ef4dd4110282f7105be87f681d7";
sha256 = "1hzs0w6pcwwbzl2gkqyk46yrzizzm03mph4kggws02a6vlwphsib";
};
in
{ pkgs ? import pinned {} }: with pkgs;
rec {
linux = pkgs.linux.override {
structuredExtraConfig = with lib.kernel; {
"9P_FS" = yes;
NET_9P = yes;
NET_9P_VIRTIO = yes;
PACKET = yes;
VFIO = yes;
VFIO_NOIOMMU = yes;
VFIO_PCI = yes;
VIRTIO_NET = yes;
VIRTIO_PCI = yes;
};
};
dpdk = stdenv.mkDerivation {
name = "dpdk-virtio-vhost-user";
src = fetchFromGitHub {
owner = "ndragazis";
repo = "dpdk-next-virtio";
rev = "0a46582dc1d02c0dc5069347ffff1a64239385f2";
sha256 = "169cxdps9k764jj420q44262x3291h2jcqsbrh7038hqjczjkgif";
};
buildInputs = [ numactl ];
configurePhase = ''
runHook preConfigure
make $makeFlags defconfig
runHook postConfigure
'';
enableParallelBuilding = true;
RTE_KERNELDIR = "${linux.dev}/lib/modules/${linux.modDirVersion}/build";
NIX_CFLAGS_COMPILE = [
"-Wno-error=implicit-fallthrough"
"-Wno-error=incompatible-pointer-types"
];
makeFlags = [
"RTE_OUTPUT=$(out)/lib"
"kerneldir=$(out)/lib/modules/${linux.modDirVersion}/build"
"prefix=$(out)"
];
inherit (pkgs.dpdk) meta;
};
# DPDK is huge! We just need one program from it.
testpmd = runCommandNoCC "testpmd" {} ''
mkdir -p $out/bin
cp ${dpdk}/bin/testpmd $out/bin
'';
# qemu has changed build system since the virtio-vhost-user branch
# was last updated, so it's simpler to just make a new derivation
# and inherit the bits that are the same than to override the
# existing one.
qemu = stdenv.mkDerivation {
name = "qemu-virtio-vhost-user";
src = fetchFromGitHub {
owner = "ndragazis";
repo = "qemu";
rev = "f9ab08c0c8cfc58036ed95b895f9780397448071";
sha256 = "0p6v4i7gj70d6x7s28x3i3x9z8vlswcbbqdwfbhlx87bbnxjrn3b";
fetchSubmodules = true;
};
enableParallelBuilding = true;
nativeBuildInputs =
lib.subtractLists [ ninja meson ] qemu_kvm.nativeBuildInputs;
postPatch = ''
sed -i '/$(INSTALL_DIR) "$(DESTDIR)$(qemu_localstatedir)/d' Makefile
# The virtio-vhost-user implementation tries to allocate a huge
# PCI bar, that's bigger than some CPUs can support! If you see
# a kernel panic in vp_reset(), lower this further.
substituteInPlace hw/virtio/virtio-vhost-user-pci.c \
--replace '1ULL << 36' '1ULL << 34'
'';
inherit (qemu_kvm) buildInputs configureFlags meta;
};
qemuInitramfs = makeInitrd {
contents = [
{
symlink = "/init";
object = writeScript "init" ''
#!${busybox}/bin/sh -eux
export PATH=${busybox}/bin
mkdir -p /nix/store /run /var
mount -t sysfs none /sys
mount -t proc none /proc
mount -t tmpfs none /run
mount -t devtmpfs none /dev
mkdir /dev/hugepages
mount -t hugetlbfs none /dev/hugepages
ln -s /run /var
# Unbind the virtio-net (host TAP) and virtio-vhost-user devices
# from their default drivers, since we'll be passing them
# through to DPDK.
echo 0000:00:04.0 > /sys/bus/pci/devices/0000:00:04.0/driver/unbind
echo 0000:00:05.0 > /sys/bus/pci/devices/0000:00:05.0/driver/unbind
# Tell the vfio-pci driver it can support virtio-net and
# virtio-vhost-user devices. Since our devices are not
# bound to any driver at the moment, doing this will bind
# them to vfio-pci automatically.
echo 1af4 1000 > /sys/bus/pci/drivers/vfio-pci/new_id
echo 1af4 1017 > /sys/bus/pci/drivers/vfio-pci/new_id
echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
${testpmd}/bin/testpmd \
-l 0-1 \
-w 0000:00:05.0 \
--vdev net_vhost0,iface=0000:00:05.0,virtio-transport=1 \
-w 0000:00:04.0
poweroff -f
'';
}
];
};
qemuVm = writeShellScript "qemu-vm" ''
exec ${qemu}/bin/qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G \
-M q35,kernel-irqchip=split \
-initrd ${qemuInitramfs}/initrd \
-netdev tap,id=net0,ifname=qemutap,script=no,downscript=no \
-device virtio-net-pci,netdev=net0,addr=04 \
-chardev socket,id=chardev0,path="$XDG_RUNTIME_DIR/vhost-user0.sock",server,nowait \
-device virtio-vhost-user-pci,addr=05,chardev=chardev0 \
-kernel ${linux}/${stdenv.hostPlatform.linux-kernel.target} \
-append "console=ttyS0 vfio.enable_unsafe_noiommu_mode=1" \
-nographic
'';
# Can't use overrideAttrs because of cargoSha256.
crosvm = rustPlatform.buildRustPackage rec {
name = "crosvm-virtio-vhost-user";
src = fetchFromGitiles {
url = "https://chromium.googlesource.com/chromiumos/platform/crosvm";
rev = "8a7e4e902a4950b060ea23b40c0dfce7bfa1b2cb";
sha256 = "1lm6psp0xakb66nhgmmh94valc4wzbb967chk80msk8bcvsfpdn4";
};
unpackPhase =
let origSrc = pkgs.crosvm.passthru.src; in
builtins.replaceStrings [ "${origSrc}" origSrc.name ] [ "$src" src.name ]
pkgs.crosvm.unpackPhase;
cargoPatches = [
(fetchpatch {
url = "https://spectrum-os.org/lists/archives/spectrum-devel/20210512170812.192540-2-hi@alyssa.is/raw";
sha256 = "0yzqrpgq35s9wxvbf9s3dgs5cpyxgdc5hr14hsdjr0gd18a6camg";
})
];
patches = pkgs.crosvm.patches ++ [
(fetchpatch {
url = "https://spectrum-os.org/lists/archives/spectrum-devel/20210512170812.192540-3-hi@alyssa.is/raw";
sha256 = "0g2rvqqa4lvq7bjq0s1ynsjx7lmrxql7lsdv8wyzb7d2z9j6mj13";
})
(fetchpatch {
url = "https://spectrum-os.org/lists/archives/spectrum-devel/20210512170812.192540-4-hi@alyssa.is/raw";
sha256 = "051sz87i8kzc5sbygk2bpiqp4g32y9fxswg2yax1nd3lg4rxh43r";
})
(fetchpatch {
url = "https://spectrum-os.org/lists/archives/spectrum-devel/20210512170812.192540-5-hi@alyssa.is/raw";
sha256 = "1jpas65masn2xg9jxha16vi0y7scarzhl221y9wxh4chi4aa4m3f";
})
];
cargoSha256 = "07yizbhs64jrb05fq5g7sx812xbz2989bsficacq5l19ziax5164";
passthru = pkgs.crosvm.passthru // { inherit src; };
inherit (pkgs.crosvm) sourceRoot postPatch nativeBuildInputs buildInputs
preBuild postInstall CROSVM_CARGO_TEST_KERNEL_BINARY meta;
};
crosvmInitramfs = makeInitrd {
contents = [
{
symlink = "/init";
object = writeScript "init" ''
#!${busybox}/bin/sh -eux
export PATH=${busybox}/bin
mount -t sysfs none /sys
mount -t proc none /proc
ip link set eth0 up
udhcpc -n || :
reboot -f
'';
}
];
};
crosvmVm = writeShellScript "crosvm-vm" ''
# In our patched crosvm, suppling --mac without --host_ip or
# --netmask will put it into vhost-user mode.
exec ${crosvm}/bin/crosvm run \
--mac 0A:B3:EC:FF:FF:FF \
-i ${crosvmInitramfs}/initrd \
${linux}/${stdenv.hostPlatform.linux-kernel.target}
'';
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
reply other threads:[~2021-05-13 12:41 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878s4i3ixs.fsf@alyssa.is \
--to=hi@alyssa.is \
--cc=devel@spectrum-os.org \
--cc=discuss@spectrum-os.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).