general high-level discussion about spectrum
 help / color / mirror / Atom feed
* This (and Last) Week in Spectrum, 2020-W34 & 2020-W35
@ 2020-08-24 22:03 Alyssa Ross
  2020-08-26 13:31 ` Michael Raskin
  0 siblings, 1 reply; 2+ messages in thread
From: Alyssa Ross @ 2020-08-24 22:03 UTC (permalink / raw)
  To: discuss, devel; +Cc: edef, Philipp Steinpaß

Last week I wasn't feeling well, so there was no This Week in Spectrum.


crosvm
------

Where we left off, I had been attempting to port vhost-user-net support
from cloud-hypervisor to crosvm.  I'd been trying to port the first
incarnation of the code in cloud-hypervisor to the contemporary version
of crosvm from when it was added, thinking that that would be easier
because the two codebases together.  But I ran into the problem that
this earliest incarnation of the vhost-user-net code from
cloud-hypervisor didn't actually work (at least with the backend I was
attempting to test it with).  I'd been attempting to figure out exactly
which changes were required to make it work, but hadn't been successful
with that yet, and I thought I'd probably need to start the port over,
from the latest cloud-hypervisor and crosvm code.

The next day, I decided to give my previous strategy one more try,
though, and an hour or two later, I found the required cloud-hypervisor
change, applied it to crosvm, and it worked!  So I now have a crosvm
tree capable of vhost-user-net[1].

This means that it's looking good for my plans for inter-guest
networking, and network hardware isolation.  With that in place, I
decided to start thinking about other kinds of hardware isolation and
inter-VM communication, and that's what I did for most of the last two
weeks.  Let's go through them:

Files will be shared between VMs using virtio-fs.  This has the
unique feature of (soon) being able to bypass guest page caches, and
have only a single shared cache between VMs.  This brings a performance
improvement, but as I understand it, should also reduce memory
consumption because each VM won't have to maintain its own copy of a
disk-backed page.  Of course, this feature (DAX) is also a big side
channel, so it won't be appropriate for all use cases.  But I think for
some things people want to do with Spectrum, this will be very
important.

The problem with this is that, because it uses the page cache of the
host kernel, the host has to know about the filesystem that's being
shared -- there's no running virtiofsd in a VM if we want DAX.  But I'd
really like it if a (non-boot) block device could be used as a
filesystem without the host having to actually talk to the device.  I
was stuck here, but edef pointed out to me that we could use the
kernel's 9P support to attach the block device to a VM, and then
mounting the filesystem in the host over 9P, either over a network
connection or (ideally) vsock.  It looks like the kernel should be able
to handle 9P over vsock, but I haven't tested yet.  We can use existing
virtiofsd and 9P software (there are promising Rust implementations of
each), and harden them against potential vulnerabilites like directory
traversals using kernel features like RESOLVE_BENEATH and
RESOLVE_NO_XDEV.  For the boot device, maybe there's no reason not to
just mount it using the host kernel, or maybe there's something to be
gained by just reading a small bootstrap payload into memory from the
start of the disk once, and then making all future communication go via
a VM.  I'm not really sure yet.  But the important thing is we'll have
mechanisms for all this in place.  Maybe we'll decide that non-boot
devices should just go over inter-VM 9P, but in any case, we'll still
need all these pieces.

GPU isolation should be possible by forwarding the GPU to a VM, but
there are a few problems here.  The first is that it would mean rendered
surfaces have to be copied via shared memory to the VM with the GPU,
before being sent to the GPU.  Additionally, sharing the GPU between VMs
for rendering at all would require significantly more work.  The result
of this is that graphics performance using an isolated GPU will probably
be poor, at least for now.  The final problem is that passthrough of
integrated GPUs seems to be very difficult to get right.  I will
probably need to acquire some hardware that I've sene a report of this
working on, so I can figure out what I've been doing wrong on the two
computers I've tried it on so far.  I suspect that I will get GPU
isolation working, but I'm not sure how reliable or performant it will
be.

For generic USB devices, I expect to be able to take an approach similar
to Qubes[2], having a VM to handle interactions with the hardware USB
controller, and exposing individual USB devices over USB/IP to other
VMs.  It would be nice if I could use vsock for this too.

[1]: https://spectrum-os.org/git/crosvm/?h=vhost-user-net
[2]: https://www.qubes-os.org/doc/usb-devices/


spectrum-os.org
---------------

Philipp registered a Matrix room and bridged it to the #spectrum IRC
channel.  I'm told that this should make it easier for Matrix users to
join the room, since some bug in Matrix's IRC bridge prevents people
from joining from Matrix the usual way.  Philipp also sent a patch[3] to
improve the instructions for Matrix users joining the channel on the
website.  Thanks Philipp!

[3]: https://spectrum-os.org/lists/archives/spectrum-devel/87wo247zu7.fsf@alyssa.is/T/#t


QEMU
----

I sent the previously requested patch[4] to resolve ambiguities in the
vhost-user spec.  No response yet, though.  I'll probably resend it some
time soon.

[4]: https://lore.kernel.org/qemu-devel/20200813094847.4288-1-hi@alyssa.is/


I'm finding it hard to keep going at the moment.  The stuff I'm doing
now is probably the hardest part of implementing Spectrum, and it's
frustrating to realise that not everything I want to do is going to be
possible.  So much of the KVM ecosystem assumes that things will be
host<->guest, and there's not always an easy solution.  But, whatever we
end up with, it's going to be a lot better than what I'm using today,
and what lots of other people are using today.  I think I'm going to be
able to deliver a good experience with a fairly high degree of
protection against malicious hardware.  But it's not going to be
perfect.

I'm pushing quite hard to make it over the line with my hardware
isolation funding milestone.  I'm so close, and I'm about to need the
money.  But once I've hit that, I think I'm going to need a break.  This
stuff is gruelling.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* This (and Last) Week in Spectrum, 2020-W34 & 2020-W35
  2020-08-24 22:03 This (and Last) Week in Spectrum, 2020-W34 & 2020-W35 Alyssa Ross
@ 2020-08-26 13:31 ` Michael Raskin
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Raskin @ 2020-08-26 13:31 UTC (permalink / raw)
  To: hi, discuss, devel; +Cc: edef, philipp


>to handle 9P over vsock, but I haven't tested yet.  We can use existing
>virtiofsd and 9P software (there are promising Rust implementations of
>each), and harden them against potential vulnerabilites like directory
>traversals using kernel features like RESOLVE_BENEATH and
>RESOLVE_NO_XDEV.  For the boot device, maybe there's no reason not to

Also, if the server is in a namespace seeing only a bind mount to the
necessary part of the FS, in a VM that only sees that one FS, the cheap
attacks just become moot. You can probably talk it into traversal, but 
it doesn't see more than allowed anyway; talking it into attacking the
VM kernel is hopefully harder (and still has limited impact)

>just mount it using the host kernel, or maybe there's something to be
>gained by just reading a small bootstrap payload into memory from the
>start of the disk once, and then making all future communication go via
>a VM.  I'm not really sure yet.  But the important thing is we'll have
>mechanisms for all this in place.  Maybe we'll decide that non-boot
>devices should just go over inter-VM 9P, but in any case, we'll still
>need all these pieces.

Can virtiofs eventually be backed by a VM-wrapped vhost-user?

Although we probably do want host-side page cache, as VM's requests to
host are way more transparent for the scheduler than inter-VM requests.

>computers I've tried it on so far.  I suspect that I will get GPU
>isolation working, but I'm not sure how reliable or performant it will
>be.

Hmm. Also a good question what is the timeslice for inter-VM 
communication. Does it make sense to have two VMs alternate for slices 
of ten milliseconds? This is just what is probably needed to have 25fps
video playback???

>I'm pushing quite hard to make it over the line with my hardware
>isolation funding milestone.  I'm so close, and I'm about to need the
>money.  But once I've hit that, I think I'm going to need a break.  This
>stuff is gruelling.

I wish you strength for this push!



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-08-26 13:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 22:03 This (and Last) Week in Spectrum, 2020-W34 & 2020-W35 Alyssa Ross
2020-08-26 13:31 ` Michael Raskin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).