* [PATCH crosvm 0/2] Fix deadlock on early VmRequest @ 2020-06-14 11:43 Alyssa Ross 2020-06-14 11:43 ` [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt Alyssa Ross 2020-06-14 11:43 ` [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest Alyssa Ross 0 siblings, 2 replies; 11+ messages in thread From: Alyssa Ross @ 2020-06-14 11:43 UTC (permalink / raw) To: devel This is the problem you might have seen me complaining about on IRC a couple of days ago. A detailed explanation of the issue and the fix can be found in the second patch message. The fix consists of two patches. The first just adds an API that's needed by the refactored, fixed code. The actual bug fix is in the second patch. Alyssa Ross (2): msg_socket: introduce UnixSeqpacketExt crosvm: fix deadlock on early VmRequest devices/src/virtio/block.rs | 5 + msg_socket/src/lib.rs | 52 ++++++---- src/linux.rs | 201 ++++++++++++++++++++++++++++++------ vm_control/src/lib.rs | 57 +++++++--- 4 files changed, 245 insertions(+), 70 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-14 11:43 [PATCH crosvm 0/2] Fix deadlock on early VmRequest Alyssa Ross @ 2020-06-14 11:43 ` Alyssa Ross 2020-06-16 0:17 ` Cole Helbling 2020-06-14 11:43 ` [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest Alyssa Ross 1 sibling, 1 reply; 11+ messages in thread From: Alyssa Ross @ 2020-06-14 11:43 UTC (permalink / raw) To: devel Occasionally, it is useful to be able to use UnixSeqpacket as a type that can represent any kind of MsgSocket. For example, to keep some MsgSockets of different types in a Vec. In this case, it may be known what type of messages should be sent over a socket, even though that may not be represantable in the type system. To accomodate this situation, this patch introduces send_msg_on_socket and recv_msg_on_socket methods on UnixSeqpacket, that can be used to send or receive any kind of MsgOnSocket. The caller is obviously responsible for ensuring that the messages being sent are of the type expected by the socket. This lack of type safety for message types is not ideal, and so MsgSender and MsgReceiver should still be preferred wherever possible. --- msg_socket/src/lib.rs | 52 ++++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/msg_socket/src/lib.rs b/msg_socket/src/lib.rs index 540d49d7..be860701 100644 --- a/msg_socket/src/lib.rs +++ b/msg_socket/src/lib.rs @@ -123,42 +123,38 @@ impl<M: MsgOnSocket> AsRawFd for Receiver<M> { } } -/// Types that could send a message. -pub trait MsgSender: AsRef<UnixSeqpacket> { - type M: MsgOnSocket; - fn send(&self, msg: &Self::M) -> MsgResult<()> { +pub trait UnixSeqpacketExt { + fn send_msg_on_socket<M: MsgOnSocket>(&self, msg: &M) -> MsgResult<()>; + fn recv_msg_on_socket<M: MsgOnSocket>(&self) -> MsgResult<M>; +} + +impl UnixSeqpacketExt for UnixSeqpacket { + fn send_msg_on_socket<M: MsgOnSocket>(&self, msg: &M) -> MsgResult<()> { let msg_size = msg.msg_size(); let fd_size = msg.fd_count(); let mut msg_buffer: Vec<u8> = vec![0; msg_size]; let mut fd_buffer: Vec<RawFd> = vec![0; fd_size]; let fd_size = msg.write_to_buffer(&mut msg_buffer, &mut fd_buffer)?; - let sock: &UnixSeqpacket = self.as_ref(); if fd_size == 0 { - handle_eintr!(sock.send(&msg_buffer)) + handle_eintr!(self.send(&msg_buffer)) .map_err(|e| MsgError::Send(SysError::new(e.raw_os_error().unwrap_or(0))))?; } else { let ioslice = IoSlice::new(&msg_buffer[..]); - sock.send_with_fds(&[ioslice], &fd_buffer[0..fd_size]) + self.send_with_fds(&[ioslice], &fd_buffer[0..fd_size]) .map_err(MsgError::Send)?; } Ok(()) } -} - -/// Types that could receive a message. -pub trait MsgReceiver: AsRef<UnixSeqpacket> { - type M: MsgOnSocket; - fn recv(&self) -> MsgResult<Self::M> { - let sock: &UnixSeqpacket = self.as_ref(); + fn recv_msg_on_socket<M: MsgOnSocket>(&self) -> MsgResult<M> { let (msg_buffer, fd_buffer) = { - if Self::M::uses_fd() { - sock.recv_as_vec_with_fds() + if M::uses_fd() { + self.recv_as_vec_with_fds() .map_err(|e| MsgError::Recv(SysError::new(e.raw_os_error().unwrap_or(0))))? } else { ( - sock.recv_as_vec().map_err(|e| { + self.recv_as_vec().map_err(|e| { MsgError::Recv(SysError::new(e.raw_os_error().unwrap_or(0))) })?, vec![], @@ -166,11 +162,11 @@ pub trait MsgReceiver: AsRef<UnixSeqpacket> { } }; - if msg_buffer.len() == 0 && Self::M::fixed_size() != Some(0) { + if msg_buffer.len() == 0 && M::fixed_size() != Some(0) { return Err(MsgError::RecvZero); } - if let Some(fixed_size) = Self::M::fixed_size() { + if let Some(fixed_size) = M::fixed_size() { if fixed_size != msg_buffer.len() { return Err(MsgError::BadRecvSize { expected: fixed_size, @@ -180,7 +176,7 @@ pub trait MsgReceiver: AsRef<UnixSeqpacket> { } // Safe because fd buffer is read from socket. - let (v, read_fd_size) = unsafe { Self::M::read_from_buffer(&msg_buffer, &fd_buffer)? }; + let (v, read_fd_size) = unsafe { M::read_from_buffer(&msg_buffer, &fd_buffer)? }; if fd_buffer.len() != read_fd_size { return Err(MsgError::NotExpectFd); } @@ -188,6 +184,22 @@ pub trait MsgReceiver: AsRef<UnixSeqpacket> { } } +/// Types that could send a message. +pub trait MsgSender: AsRef<UnixSeqpacket> { + type M: MsgOnSocket; + fn send(&self, msg: &Self::M) -> MsgResult<()> { + self.as_ref().send_msg_on_socket(msg) + } +} + +/// Types that could receive a message. +pub trait MsgReceiver: AsRef<UnixSeqpacket> { + type M: MsgOnSocket; + fn recv(&self) -> MsgResult<Self::M> { + self.as_ref().recv_msg_on_socket() + } +} + impl<I: MsgOnSocket, O: MsgOnSocket> MsgSender for MsgSocket<I, O> { type M = I; } -- 2.26.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-14 11:43 ` [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt Alyssa Ross @ 2020-06-16 0:17 ` Cole Helbling 2020-06-16 9:32 ` Alyssa Ross 0 siblings, 1 reply; 11+ messages in thread From: Cole Helbling @ 2020-06-16 0:17 UTC (permalink / raw) To: Alyssa Ross, devel On Sun Jun 14, 2020 at 11:43 AM, Alyssa Ross wrote: > Occasionally, it is useful to be able to use UnixSeqpacket as a type > that can represent any kind of MsgSocket. For example, to keep some > MsgSockets of different types in a Vec. In this case, it may be known > what type of messages should be sent over a socket, even though that > may not be represantable in the type system. > > To accomodate this situation, this patch introduces send_msg_on_socket > and recv_msg_on_socket methods on UnixSeqpacket, that can be used to > send or receive any kind of MsgOnSocket. The caller is obviously > responsible for ensuring that the messages being sent are of the type > expected by the socket. > > This lack of type safety for message types is not ideal, and so > MsgSender and MsgReceiver should still be preferred wherever possible. Is there anyway this can or will be rectified? > --- > msg_socket/src/lib.rs | 52 ++++++++++++++++++++++++++----------------- > 1 file changed, 32 insertions(+), 20 deletions(-) Otherwise, I don't see anything obviously wrong, but I'm also not very familiar with sockets. Reviewed-by: Cole Helbling <cole.e.helbling@outlook.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-16 0:17 ` Cole Helbling @ 2020-06-16 9:32 ` Alyssa Ross 2020-06-22 22:06 ` Cole Helbling 2020-06-25 1:54 ` impaqt 0 siblings, 2 replies; 11+ messages in thread From: Alyssa Ross @ 2020-06-16 9:32 UTC (permalink / raw) To: Cole Helbling; +Cc: devel [-- Attachment #1: Type: text/plain, Size: 6361 bytes --] >> Occasionally, it is useful to be able to use UnixSeqpacket as a type >> that can represent any kind of MsgSocket. For example, to keep some >> MsgSockets of different types in a Vec. In this case, it may be known >> what type of messages should be sent over a socket, even though that >> may not be represantable in the type system. >> >> To accomodate this situation, this patch introduces send_msg_on_socket >> and recv_msg_on_socket methods on UnixSeqpacket, that can be used to >> send or receive any kind of MsgOnSocket. The caller is obviously >> responsible for ensuring that the messages being sent are of the type >> expected by the socket. >> >> This lack of type safety for message types is not ideal, and so >> MsgSender and MsgReceiver should still be preferred wherever possible. > > Is there anyway this can or will be rectified? I started out writing about why I didn't think this was something that necessarily needed to be rectified, but in the course of writing that up came up with a couple of potential solutions, so maybe it should be after all. :P Did you see the way it was used in the second patch? We put all the sockets into a Vec<&UnixSeqpacket>, ask a function which is the right socket to send the message to, and then send the message to that socket. It's not really possible to do that sort of thing while preserving type information, because you can't really have a collection containing members of different types. The reason this is done at all is to avoid duplicating a match statement between two different functions (because I think that would be likely to go out of sync). This does mean that it's the responsibility of the caller of VmRequest::execute to pass in the right socket, though, which is not ideal. So here are some options I came up with: (1) Since VmRequest::socket is only ever really going to be used to inspect the socket value before then passing it along to VmRequest::execute, we could drop VmRequest::socket, move the match statement back into VmRequest::execute, but have it take a callback parameter, which would be called with the chosen socket as a &UnixSeqpacket. That would mean we wouldn't have to assume the correct type of socket in VmRequest::execute, but the caller could still see which was chosen. The problem is that in run_control, we use the chosen socket to decide whether to proceed with the request at all, or to do something else (queue it for later). So then the closure would have to return a bool or something indicating whether to actually do the request or not. All feasible, but starts to sound a little awkward. We might have to change the name of do_request considering that it now might or might not do a request depending on what the callback told it. (2) Keep VmRequest::socket, but have it return a type like this: pub struct ChosenSocket<'a> { request: &'a VmRequest, socket: &'a UnixSeqpacket, } impl<'a> AsRef<UnixSeqpacket> for ChosenSocket<'a> where T: AsRef<UnixSeqpacket<T>, { fn as_ref(&self) -> &UnixSeqpacket { self.socket } } Importantly, there would be no public constructor for this type. That way, when you called do_request, the ChosenSocket value would be compile-time proof that you had previously called VmRequest::socket. It could then do a runtime check that the request that the ChosenSocket was contructed with matches the request that it is now being called with. The advantage to this approach would be no annoying callback, but the disadvantage would be that it would be possible to pass a ChosenSocket for a different request, and only have that be checked at runtime. It would still be better than the current situation of no checking, though, and passing in the wrong ChosenSocket would probably be a difficult thing to do accidentally. (3) Make each each VmRequest case a newtype, like so: #[derive(MsgOnSocket, Debug)] pub enum VmRequest { Exit(Exit), Suspend(Suspend), ... } This would basically be a workaround for the fact that enum variants in Rust aren't proper types. If they were, though (which we can emulate through this approach), we could have some trait with an associated socket type for executing each type of request. Both 1 and 3 would provide full compile-time type safety, at the expense of being a bit more horrible than 2. I think I'm leaning towards 1, but I don't have strong feelings about any of them. Having thought about this now though, in response to your question (thanks!), I definitely think that we should do something here better than what I presented in these patches. I'd be very curious to hear your thoughts. You might also be interested in implementing one of them. Would be a good starter crosvm exercise. It's probably the type of thing where you could be pretty confident you'd got it right if it compiled, but I could offer some guidance on testing it as well, if you wanted. If not, though, please give me your thoughts on approach anyway, and I'll implement whichever one we think is best myself at some point. :) > Otherwise, I don't see anything obviously wrong, but I'm also not very > familiar with sockets. > > Reviewed-by: Cole Helbling <cole.e.helbling@outlook.com> Thank you so much for the review! I wasn't expecting a review for this patch (haven't really got one for crosvm stuff before), so I didn't wait very long before I pushed it. The review is still very very much appreciated, though -- as I hope my wall of text above demonstrates, your review is definitely going to lead to better quality code. But, because I wasn't expecting a review and pushed already, it's too late to include your review in the commit message. :( I'm going to try to figure out how to use git notes (git-notes(1)) this morning, which I think should let me annotate the commit with your review after the fact, such that it shows up in the log message and everything. I'll definitely wait longer to push crosvm patches in future, now that I know somebody might be looking at them! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-16 9:32 ` Alyssa Ross @ 2020-06-22 22:06 ` Cole Helbling 2020-06-23 2:32 ` Alyssa Ross 2020-06-25 1:54 ` impaqt 1 sibling, 1 reply; 11+ messages in thread From: Cole Helbling @ 2020-06-22 22:06 UTC (permalink / raw) To: Alyssa Ross, Cole Helbling; +Cc: devel Alyssa Ross <hi@alyssa.is> writes: > Did you see the way it was used in the second patch? We put all the > sockets into a Vec<&UnixSeqpacket>, ask a function which is the right > socket to send the message to, and then send the message to that > socket. It's not really possible to do that sort of thing while > preserving type information, because you can't really have a collection > containing members of different types. Ah, yeah. That's pretty obvious now that you've explained it... > --->8--- > > Both 1 and 3 would provide full compile-time type safety, at the expense > of being a bit more horrible than 2. I think I'm leaning towards 1, but > I don't have strong feelings about any of them. Having thought about > this now though, in response to your question (thanks!), I definitely > think that we should do something here better than what I presented in > these patches. > > I'd be very curious to hear your thoughts. I think that options 1 or 3 are the best choices. My only qualm with option 1 is that it uses a callback. I can't even really explain *why* I don't like callbacks (maybe because I had a bad time trying to use them within Rust before, when I was lacking experience), but it just feels somehow wrong to me. (Maybe it's also because I stereotype callbacks to C -- which isn't necessarily a bad thing -- but it would certainly make me happier to find a more Rust-y way to approach this.) Thinking about it for a little bit longer, I'm leaning a bit more towards option 3, just because I think it would be a better idea to associate a trait and a socket type. My gut feeling (keeping in mind my very brief knowledge of traits, and even less in that of callbacks) is that this would end up being nicer to implement, though maybe not as simple. > You might also be interested in implementing one of them. Would be a > good starter crosvm exercise. It's probably the type of thing where you > could be pretty confident you'd got it right if it compiled, but I could > offer some guidance on testing it as well, if you wanted. If not, > though, please give me your thoughts on approach anyway, and I'll > implement whichever one we think is best myself at some point. :) Although I would be interested in implementing one of them (#3, specifically), time is not on my side at the moment. I've recently started my first 40-hour-per-week job, so I'd like to adjust to that schedule before I really start doing things again. > I'll definitely wait longer to push crosvm patches in future, now that I > know somebody might be looking at them! Don't worry about it; it's because I took so long to respond. :P Cole P.S. I've switched to composing emails using notmuch inside Emacs. There should be no more whitespace weirdness (though we'll find out when the next patch comes through for sure ;^) ). P.P.S. Thanks for going into such detail in all your responses (and apologies for not doing the same)! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-22 22:06 ` Cole Helbling @ 2020-06-23 2:32 ` Alyssa Ross 0 siblings, 0 replies; 11+ messages in thread From: Alyssa Ross @ 2020-06-23 2:32 UTC (permalink / raw) To: Cole Helbling; +Cc: devel [-- Attachment #1: Type: text/plain, Size: 3256 bytes --] >> Both 1 and 3 would provide full compile-time type safety, at the expense >> of being a bit more horrible than 2. I think I'm leaning towards 1, but >> I don't have strong feelings about any of them. Having thought about >> this now though, in response to your question (thanks!), I definitely >> think that we should do something here better than what I presented in >> these patches. >> >> I'd be very curious to hear your thoughts. > > I think that options 1 or 3 are the best choices. My only qualm with > option 1 is that it uses a callback. I can't even really explain *why* I > don't like callbacks (maybe because I had a bad time trying to use them > within Rust before, when I was lacking experience), but it just feels > somehow wrong to me. (Maybe it's also because I stereotype callbacks to > C -- which isn't necessarily a bad thing -- but it would certainly make > me happier to find a more Rust-y way to approach this.) > > Thinking about it for a little bit longer, I'm leaning a bit more > towards option 3, just because I think it would be a better idea to > associate a trait and a socket type. My gut feeling (keeping in mind my > very brief knowledge of traits, and even less in that of callbacks) is > that this would end up being nicer to implement, though maybe not as > simple. In the time since I last posted, I've been leaning towards option 1. But I don't really like callbacks either. I think the main reason for me is that then it's not really obvious what the function _does_. It some cases (if the callback returned false) it wouldn't actually do anything at all, which feels a bit weird. What would a good name for such a function be? It becomes very difficult. On the other hand, making every enum variant a struct of its own feels like a hack around a limitation of Rust, and it's a lot of boilerplate. The size of the change puts me off it. I think I'll probably have to try both out and see what feels best. >> You might also be interested in implementing one of them. Would be a >> good starter crosvm exercise. It's probably the type of thing where you >> could be pretty confident you'd got it right if it compiled, but I could >> offer some guidance on testing it as well, if you wanted. If not, >> though, please give me your thoughts on approach anyway, and I'll >> implement whichever one we think is best myself at some point. :) > > Although I would be interested in implementing one of them (#3, > specifically), time is not on my side at the moment. I've recently > started my first 40-hour-per-week job, so I'd like to adjust to that > schedule before I really start doing things again. Oh, congratulations! Definitely let yourself adjust to the job as a priority! >> I'll definitely wait longer to push crosvm patches in future, now that I >> know somebody might be looking at them! > > Don't worry about it; it's because I took so long to respond. :P Didn't you respond, like, the next day? :P > P.P.S. Thanks for going into such detail in all your responses (and > apologies for not doing the same)! No worries! It was really useful to have somebody on the other end to go into detail to, because otherwise I wouldn't have thought about this so much and come to the decision I did. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-16 9:32 ` Alyssa Ross 2020-06-22 22:06 ` Cole Helbling @ 2020-06-25 1:54 ` impaqt 2020-07-09 13:24 ` Alyssa Ross 1 sibling, 1 reply; 11+ messages in thread From: impaqt @ 2020-06-25 1:54 UTC (permalink / raw) To: Alyssa Ross, Cole Helbling; +Cc: devel > (3) Make each each VmRequest case a newtype, like so: > > #[derive(MsgOnSocket, Debug)] > pub enum VmRequest { > Exit(Exit), > Suspend(Suspend), > ... > } > > This would basically be a workaround for the fact that enum variants in > Rust aren't proper types. If they were, though (which we can emulate > through this approach), we could have some trait with an associated > socket type for executing each type of request. The machine crate might make this option a bit less annoying: https://github.com/rust-bakery/machine ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt 2020-06-25 1:54 ` impaqt @ 2020-07-09 13:24 ` Alyssa Ross 0 siblings, 0 replies; 11+ messages in thread From: Alyssa Ross @ 2020-07-09 13:24 UTC (permalink / raw) To: impaqt; +Cc: Cole Helbling, devel [-- Attachment #1: Type: text/plain, Size: 979 bytes --] "impaqt" <impaqt@vendek.net> writes: >> (3) Make each each VmRequest case a newtype, like so: >> >> #[derive(MsgOnSocket, Debug)] >> pub enum VmRequest { >> Exit(Exit), >> Suspend(Suspend), >> ... >> } >> >> This would basically be a workaround for the fact that enum variants in >> Rust aren't proper types. If they were, though (which we can emulate >> through this approach), we could have some trait with an associated >> socket type for executing each type of request. > > The machine crate might make this option a bit less annoying: > https://github.com/rust-bakery/machine That does look good. Feels like a bit of a shame the machine!() macro isn't its own crate with a more generic name, because I'm sure there are all sorts of situations in which that's useful. As it is, I'd feel a bit concerned about depending on a crate about state machines to define some enums, because that could be very confusing for somebody reading the code. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest 2020-06-14 11:43 [PATCH crosvm 0/2] Fix deadlock on early VmRequest Alyssa Ross 2020-06-14 11:43 ` [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt Alyssa Ross @ 2020-06-14 11:43 ` Alyssa Ross 2020-06-16 1:08 ` Cole Helbling 1 sibling, 1 reply; 11+ messages in thread From: Alyssa Ross @ 2020-06-14 11:43 UTC (permalink / raw) To: devel If a DiskCommand was received on the crosvm socket before the virtio-block device was activated, the Token::VmRequest case in the main event loop would forward the request to the block device socket, and then wait syncronously for a response. That response would never come because the device hadn't been activated, and it would never be activated because the event loop would never continue, and therefore never be able to respond to the event that causes the device to be activated. crosvm would therefore just hang forever, waiting for a response that would never come. This patch fixes this deadlock by keeping track of whether devices that send a response in this way have been activated yet. If they have already been activated, messages are sent and responses are received as normal. If they have not been activated, messages are instead put into a per-device queue. Once the device is activated, queued messages are processed all at once, and then the device is marked as ready, and the queue is dropped. Future messages are processed immediately as they come in, with no further queueing. A device indicates that it is ready by sending a message on its socket. The main crosvm event loop can then poll the socket, to be notified when the device is ready. This poll event will only trigger once -- once it has been received, it is removed from the poll context. Currently, the only device type that responds to external control messages AND needs to be activated by an event is the block device. The balloon device does not respond to messages, and the xhci controller device is activated up front. The code is nevertheless structured so that it should be very easy to drop another kind of device in to the queuing system, should that be required. --- devices/src/virtio/block.rs | 5 + src/linux.rs | 201 ++++++++++++++++++++++++++++++------ vm_control/src/lib.rs | 57 +++++++--- 3 files changed, 213 insertions(+), 50 deletions(-) diff --git a/devices/src/virtio/block.rs b/devices/src/virtio/block.rs index 80d51030..5a8972a7 100644 --- a/devices/src/virtio/block.rs +++ b/devices/src/virtio/block.rs @@ -390,6 +390,11 @@ impl Worker { } }; + if let Err(e) = self.control_socket.send(&DiskControlResult::Ready) { + error!("control socket failed to notify readiness: {}", e); + return; + } + 'poll: loop { let events = match poll_ctx.wait() { Ok(v) => v, diff --git a/src/linux.rs b/src/linux.rs index ec2067cf..ccd7d88c 100644 --- a/src/linux.rs +++ b/src/linux.rs @@ -3,6 +3,7 @@ // found in the LICENSE file. use std::cmp::max; +use std::collections::{BTreeMap, VecDeque}; use std::convert::TryFrom; use std::error::Error as StdError; use std::ffi::CStr; @@ -32,12 +33,12 @@ use acpi_tables::sdt::SDT; use devices::virtio::EventDevice; use devices::virtio::{self, Console, VirtioDevice}; use devices::{ - self, Ac97Backend, Ac97Dev, HostBackendDeviceProvider, PciDevice, VfioContainer, VfioDevice, - VfioPciDevice, VirtioPciDevice, XhciController, + self, Ac97Backend, Ac97Dev, Bus, HostBackendDeviceProvider, PciDevice, VfioContainer, + VfioDevice, VfioPciDevice, VirtioPciDevice, XhciController, }; use io_jail::{self, Minijail}; use kvm::*; -use msg_socket::{MsgError, MsgReceiver, MsgSender, MsgSocket}; +use msg_socket::{MsgError, MsgReceiver, MsgResult, MsgSender, MsgSocket}; use net_util::{Error as NetError, MacAddress, Tap}; use remain::sorted; use resources::{Alloc, MmioType, SystemAllocator}; @@ -45,7 +46,7 @@ use sync::{Condvar, Mutex}; use sys_util::net::{UnixSeqpacket, UnixSeqpacketListener, UnlinkUnixSeqpacketListener}; use sys_util::{ - self, block_signal, clear_signal, drop_capabilities, error, flock, get_blocked_signals, + self, block_signal, clear_signal, debug, drop_capabilities, error, flock, get_blocked_signals, get_group_id, get_user_id, getegid, geteuid, info, register_rt_signal_handler, set_cpu_affinity, validate_raw_fd, warn, EventFd, FlockOperation, GuestAddress, GuestMemory, Killable, MemoryMappingArena, PollContext, PollToken, Protection, ScopedEvent, SignalFd, @@ -57,7 +58,7 @@ use vm_control::{ DiskControlResult, UsbControlSocket, VmControlResponseSocket, VmIrqRequest, VmIrqResponse, VmIrqResponseSocket, VmMemoryControlRequestSocket, VmMemoryControlResponseSocket, VmMemoryRequest, VmMemoryResponse, VmMsyncRequest, VmMsyncRequestSocket, VmMsyncResponse, - VmMsyncResponseSocket, VmRunMode, + VmMsyncResponseSocket, VmRequest, VmRunMode, }; use crate::{Config, DiskOption, Executable, SharedDir, SharedDirKind, TouchDeviceOption}; @@ -1667,6 +1668,45 @@ fn file_to_i64<P: AsRef<Path>>(path: P) -> io::Result<i64> { .ok_or_else(|| io::Error::new(io::ErrorKind::InvalidData, "empty file")) } +/// Returns a boolean indicating whether the VM should be exited. +fn do_vm_request( + request: VmRequest, + device_socket: Option<&UnixSeqpacket>, + control_socket: &VmControlResponseSocket, + run_mode_arc: &Arc<VcpuRunMode>, + vcpu_handles: &mut Vec<JoinHandle<()>>, + io_bus: &mut Bus, +) -> MsgResult<bool> { + let mut run_mode_opt = None; + let response = request.execute(&mut run_mode_opt, device_socket); + control_socket.send(&response)?; + if let Some(run_mode) = run_mode_opt { + info!("control socket changed run mode to {}", run_mode); + match run_mode { + VmRunMode::Exiting => Ok(true), + VmRunMode::Running => { + if let VmRunMode::Suspending = *run_mode_arc.mtx.lock() { + io_bus.notify_resume(); + } + run_mode_arc.set_and_notify(VmRunMode::Running); + for handle in vcpu_handles { + let _ = handle.kill(SIGRTMIN() + 0); + } + Ok(false) + } + other => { + run_mode_arc.set_and_notify(other); + for handle in vcpu_handles { + let _ = handle.kill(SIGRTMIN() + 0); + } + Ok(false) + } + } + } else { + Ok(false) + } +} + pub fn run_config(cfg: Config) -> Result<()> { if cfg.sandbox { // Printing something to the syslog before entering minijail so that libc's syslogger has a @@ -1818,7 +1858,7 @@ fn run_control( ) -> Result<()> { const LOWMEM_AVAILABLE: &str = "/sys/kernel/mm/chromeos-low_mem/available"; - #[derive(PollToken)] + #[derive(Debug, PollToken)] enum Token { Exit, Suspend, @@ -1828,6 +1868,7 @@ fn run_control( BalloonResult, VmControlServer, VmControl { index: usize }, + DeviceReady { sock_fd: RawFd }, } stdin() @@ -1912,6 +1953,38 @@ fn run_control( } vcpu_thread_barrier.wait(); + struct QueuedDeviceReq { + request: VmRequest, + control_sock_index: usize, + } + + /// A device can either be waiting, or ready. + /// + /// If it's waiting, we queue up events to send to it once it's ready. We can't just send the + /// requests straight away and let them sit on the socket until the device comes up, because we + /// want to syncronously wait for a response after sending. If the device hasn't been activated + /// when we do this, we'd sit waiting for a response forever and never activate the device. + enum DeviceStatus<'a> { + Waiting(&'a UnixSeqpacket, VecDeque<QueuedDeviceReq>), + Ready, + } + + let mut queued_device_reqs = BTreeMap::<RawFd, DeviceStatus>::new(); + + // Each socket will send a ready message when it's ready to receive requests. Add the sockets + // to the event loop, so that when they become ready, we can process their queues. After the + // initial queue is processed, the device becomes ready, and the socket is removed from the + // event loop. + for socket in disk_host_sockets.iter().map(AsRef::as_ref) { + let token = Token::DeviceReady { + sock_fd: socket.as_raw_fd(), + }; + poll_ctx.add(socket, token).map_err(Error::PollContextAdd)?; + + let status = DeviceStatus::Waiting(socket, VecDeque::new()); + queued_device_reqs.insert(socket.as_raw_fd(), status); + } + let mut ioapic_delayed = Vec::<usize>::default(); 'poll: loop { let events = { @@ -2110,40 +2183,44 @@ fn run_control( match socket { TaggedControlSocket::Vm(socket) => match socket.recv() { Ok(request) => { - let mut run_mode_opt = None; - let response = request.execute( - &mut run_mode_opt, + let device_socket = request.socket( &balloon_host_socket, - disk_host_sockets, + &disk_host_sockets, &usb_control_socket, ); - if let Err(e) = socket.send(&response) { - error!("failed to send VmResponse: {}", e); - } - if let Some(run_mode) = run_mode_opt { - info!("control socket changed run mode to {}", run_mode); - match run_mode { - VmRunMode::Exiting => { - break 'poll; - } - VmRunMode::Running => { - if let VmRunMode::Suspending = - *run_mode_arc.mtx.lock() - { - linux.io_bus.notify_resume(); - } - run_mode_arc.set_and_notify(VmRunMode::Running); - for handle in &vcpu_handles { - let _ = handle.kill(SIGRTMIN() + 0); - } - } - other => { - run_mode_arc.set_and_notify(other); - for handle in &vcpu_handles { - let _ = handle.kill(SIGRTMIN() + 0); + + match device_socket.map(|s| { + queued_device_reqs.get_mut(&s.as_raw_fd()).unwrap() + }) { + None | Some(DeviceStatus::Ready) => { + match do_vm_request( + request, + device_socket, + socket, + &run_mode_arc, + &mut vcpu_handles, + &mut linux.io_bus, + ) { + Ok(true) => break 'poll, + Ok(false) => {} + Err(e) => { + error!("failed to handle VmRequest: {}", e); + continue 'poll; } } } + + Some(DeviceStatus::Waiting(_, queue)) => { + debug!( + "Device {} not yet ready. Queueing request.", + device_socket.unwrap().as_raw_fd() + ); + + queue.push_back(QueuedDeviceReq { + request, + control_sock_index: index, + }); + } } } Err(e) => { @@ -2204,6 +2281,61 @@ fn run_control( } } } + Token::DeviceReady { sock_fd } => { + if let Some(DeviceStatus::Waiting(device_sock, mut queue)) = + queued_device_reqs.remove(&sock_fd) + { + debug!( + "Device {} is ready. Processing its queue of {} item(s).", + sock_fd, + queue.len() + ); + + // Deal with the message sent to indicate readiness. Its contents don't matter. + if let Err(e) = device_sock.recv(&mut []) { + error!( + "Failed to recv ready notification from device socket: {}", + e + ); + continue 'poll; + } + + while let Some(QueuedDeviceReq { + request, + control_sock_index, + }) = queue.pop_front() + { + let control_sock = match control_sockets.get(control_sock_index) { + Some(TaggedControlSocket::Vm(s)) => s, + _ => unreachable!(), + }; + + match do_vm_request( + request, + Some(device_sock), + control_sock, + &run_mode_arc, + &mut vcpu_handles, + &mut linux.io_bus, + ) { + Ok(true) => break 'poll, + Ok(false) => {} + Err(e) => { + error!("failed to handle VmRequest: {}", e); + continue 'poll; + } + } + } + + if let Err(e) = poll_ctx.delete(device_sock) { + error!("failed to delete poll token: {}", e); + } + } else { + error!("Received ready notification for a device that isn't waiting"); + } + + queued_device_reqs.insert(sock_fd, DeviceStatus::Ready); + } } } @@ -2229,6 +2361,7 @@ fn run_control( _ => {} } } + Token::DeviceReady { .. } => {} } } diff --git a/vm_control/src/lib.rs b/vm_control/src/lib.rs index c63fbfe6..d4922ab9 100644 --- a/vm_control/src/lib.rs +++ b/vm_control/src/lib.rs @@ -19,8 +19,9 @@ use std::os::unix::io::{AsRawFd, FromRawFd, RawFd}; use libc::{EINVAL, EIO, ENODEV}; use kvm::{IrqRoute, IrqSource, Vm}; -use msg_socket::{MsgError, MsgOnSocket, MsgReceiver, MsgResult, MsgSender, MsgSocket}; +use msg_socket::{MsgError, MsgOnSocket, MsgResult, MsgSocket, UnixSeqpacketExt}; use resources::{Alloc, GpuMemoryDesc, MmioType, SystemAllocator}; +use sys_util::net::UnixSeqpacket; use sys_util::{error, Error as SysError, EventFd, GuestAddress, MemoryMapping, MmapError, Result}; /// A data structure that either owns or borrows a file descriptor. @@ -189,6 +190,7 @@ impl Display for DiskControlCommand { #[derive(MsgOnSocket, Debug)] pub enum DiskControlResult { + Ready, Ok, Err(SysError), } @@ -563,17 +565,34 @@ fn register_memory( } impl VmRequest { + pub fn socket<'s>( + &self, + balloon_host_socket: &'s BalloonControlRequestSocket, + disk_host_sockets: &'s [DiskControlRequestSocket], + usb_control_socket: &'s UsbControlSocket, + ) -> Option<&'s UnixSeqpacket> { + use VmRequest::*; + match *self { + Exit => None, + Suspend => None, + Resume => None, + BalloonCommand(_) => Some(balloon_host_socket.as_ref()), + DiskCommand { disk_index, .. } => disk_host_sockets.get(disk_index).map(AsRef::as_ref), + UsbCommand(_) => Some(usb_control_socket.as_ref()), + } + } + /// Executes this request on the given Vm and other mutable state. /// /// This does not return a result, instead encapsulating the success or failure in a /// `VmResponse` with the intended purpose of sending the response back over the socket that /// received this `VmRequest`. + /// + /// The `socket` parameter must be the value that was obtained by calling `VmRequest::socket`. pub fn execute( &self, run_mode: &mut Option<VmRunMode>, - balloon_host_socket: &BalloonControlRequestSocket, - disk_host_sockets: &[DiskControlRequestSocket], - usb_control_socket: &UsbControlSocket, + socket: Option<&UnixSeqpacket>, ) -> VmResponse { match *self { VmRequest::Exit => { @@ -589,14 +608,18 @@ impl VmRequest { VmResponse::Ok } VmRequest::BalloonCommand(BalloonControlCommand::Adjust { num_bytes }) => { - match balloon_host_socket.send(&BalloonControlCommand::Adjust { num_bytes }) { + match socket + .unwrap() + .send_msg_on_socket(&BalloonControlCommand::Adjust { num_bytes }) + { Ok(_) => VmResponse::Ok, Err(_) => VmResponse::Err(SysError::last()), } } VmRequest::BalloonCommand(BalloonControlCommand::Stats) => { - match balloon_host_socket.send(&BalloonControlCommand::Stats {}) { - Ok(_) => match balloon_host_socket.recv() { + let socket = socket.unwrap(); + match socket.send_msg_on_socket(&BalloonControlCommand::Stats {}) { + Ok(_) => match socket.recv_msg_on_socket() { Ok(BalloonControlResult::Stats { stats, balloon_actual, @@ -612,19 +635,20 @@ impl VmRequest { Err(_) => VmResponse::Err(SysError::last()), } } - VmRequest::DiskCommand { - disk_index, - ref command, - } => { + VmRequest::DiskCommand { ref command, .. } => { // Forward the request to the block device process via its control socket. - if let Some(sock) = disk_host_sockets.get(disk_index) { - if let Err(e) = sock.send(command) { + if let Some(sock) = socket { + if let Err(e) = sock.send_msg_on_socket(command) { error!("disk socket send failed: {}", e); VmResponse::Err(SysError::new(EINVAL)) } else { - match sock.recv() { + match sock.recv_msg_on_socket() { Ok(DiskControlResult::Ok) => VmResponse::Ok, Ok(DiskControlResult::Err(e)) => VmResponse::Err(e), + Ok(resp) => { + error!("unexpected disk socket result: {:?}", resp); + VmResponse::Err(SysError::new(EINVAL)) + } Err(e) => { error!("disk socket recv failed: {}", e); VmResponse::Err(SysError::new(EINVAL)) @@ -636,12 +660,13 @@ impl VmRequest { } } VmRequest::UsbCommand(ref cmd) => { - let res = usb_control_socket.send(cmd); + let socket = socket.unwrap(); + let res = socket.send_msg_on_socket(cmd); if let Err(e) = res { error!("fail to send command to usb control socket: {}", e); return VmResponse::Err(SysError::new(EIO)); } - match usb_control_socket.recv() { + match socket.recv_msg_on_socket() { Ok(response) => VmResponse::UsbResponse(response), Err(e) => { error!("fail to recv command from usb control socket: {}", e); -- 2.26.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest 2020-06-14 11:43 ` [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest Alyssa Ross @ 2020-06-16 1:08 ` Cole Helbling 2020-06-16 9:39 ` Alyssa Ross 0 siblings, 1 reply; 11+ messages in thread From: Cole Helbling @ 2020-06-16 1:08 UTC (permalink / raw) To: Alyssa Ross, devel On Sun Jun 14, 2020 at 11:43 AM, Alyssa Ross wrote: > If a DiskCommand was received on the crosvm socket before the > virtio-block device was activated, the Token::VmRequest case in the > main event loop would forward the request to the block device socket, > and then wait syncronously for a response. That response would never > come because the device hadn't been activated, and it would never be > activated because the event loop would never continue, and therefore > never be able to respond to the event that causes the device to be > activated. crosvm would therefore just hang forever, waiting for a > response that would never come. > > This patch fixes this deadlock by keeping track of whether devices > that send a response in this way have been activated yet. If they > have already been activated, messages are sent and responses are > received as normal. If they have not been activated, messages are > instead put into a per-device queue. Once the device is activated, > queued messages are processed all at once, and then the device is > marked as ready, and the queue is dropped. Future messages are > processed immediately as they come in, with no further queueing. Sounds like quite the "chicken or the egg"-type problem. > A device indicates that it is ready by sending a message on its > socket. The main crosvm event loop can then poll the socket, to be > notified when the device is ready. This poll event will only trigger > once -- once it has been received, it is removed from the poll > context. > > Currently, the only device type that responds to external control > messages AND needs to be activated by an event is the block device. > The balloon device does not respond to messages, and the xhci > controller device is activated up front. The code is nevertheless > structured so that it should be very easy to drop another kind of > device in to the queuing system, should that be required. +1 for modularity. > --- > devices/src/virtio/block.rs | 5 + > src/linux.rs | 201 ++++++++++++++++++++++++++++++------ > vm_control/src/lib.rs | 57 +++++++--- > 3 files changed, 213 insertions(+), 50 deletions(-) > > diff --git a/src/linux.rs b/src/linux.rs > index ec2067cf..ccd7d88c 100644 > --- a/src/linux.rs > +++ b/src/linux.rs > > ----->8----- > > @@ -1667,6 +1668,45 @@ fn file_to_i64<P: AsRef<Path>>(path: P) -> > io::Result<i64> { > .ok_or_else(|| io::Error::new(io::ErrorKind::InvalidData, "empty file")) > } > > +/// Returns a boolean indicating whether the VM should be exited. > +fn do_vm_request( > + request: VmRequest, > + device_socket: Option<&UnixSeqpacket>, > + control_socket: &VmControlResponseSocket, > + run_mode_arc: &Arc<VcpuRunMode>, > + vcpu_handles: &mut Vec<JoinHandle<()>>, > + io_bus: &mut Bus, > +) -> MsgResult<bool> { > + let mut run_mode_opt = None; > + let response = request.execute(&mut run_mode_opt, device_socket); > + control_socket.send(&response)?; > + if let Some(run_mode) = run_mode_opt { > + info!("control socket changed run mode to {}", run_mode); > + match run_mode { > + VmRunMode::Exiting => Ok(true), > + VmRunMode::Running => { > + if let VmRunMode::Suspending = *run_mode_arc.mtx.lock() { > + io_bus.notify_resume(); > + } > + run_mode_arc.set_and_notify(VmRunMode::Running); > + for handle in vcpu_handles { > + let _ = handle.kill(SIGRTMIN() + 0); I know this is essentially just moved (and probably isn't even something you wrote), but do you know why this is `+ 0`? Does this somehow coerce to the desired type or something? Maybe I'm overlooking something obvious here. > ----->8----- Either way, the above is just a nit. Good work on tracking this particular issue down! Reviewed-by: Cole Helbling <cole.e.helbling@outlook.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest 2020-06-16 1:08 ` Cole Helbling @ 2020-06-16 9:39 ` Alyssa Ross 0 siblings, 0 replies; 11+ messages in thread From: Alyssa Ross @ 2020-06-16 9:39 UTC (permalink / raw) To: Cole Helbling; +Cc: devel [-- Attachment #1: Type: text/plain, Size: 2236 bytes --] >> @@ -1667,6 +1668,45 @@ fn file_to_i64<P: AsRef<Path>>(path: P) -> >> io::Result<i64> { >> .ok_or_else(|| io::Error::new(io::ErrorKind::InvalidData, "empty file")) >> } >> >> +/// Returns a boolean indicating whether the VM should be exited. >> +fn do_vm_request( >> + request: VmRequest, >> + device_socket: Option<&UnixSeqpacket>, >> + control_socket: &VmControlResponseSocket, >> + run_mode_arc: &Arc<VcpuRunMode>, >> + vcpu_handles: &mut Vec<JoinHandle<()>>, >> + io_bus: &mut Bus, >> +) -> MsgResult<bool> { >> + let mut run_mode_opt = None; >> + let response = request.execute(&mut run_mode_opt, device_socket); >> + control_socket.send(&response)?; >> + if let Some(run_mode) = run_mode_opt { >> + info!("control socket changed run mode to {}", run_mode); >> + match run_mode { >> + VmRunMode::Exiting => Ok(true), >> + VmRunMode::Running => { >> + if let VmRunMode::Suspending = *run_mode_arc.mtx.lock() { >> + io_bus.notify_resume(); >> + } >> + run_mode_arc.set_and_notify(VmRunMode::Running); >> + for handle in vcpu_handles { >> + let _ = handle.kill(SIGRTMIN() + 0); > > I know this is essentially just moved (and probably isn't even something > you wrote), but do you know why this is `+ 0`? Does this somehow coerce > to the desired type or something? Maybe I'm overlooking something > obvious here. I have no idea! Had a look through the history but couldn't see anything obvious. I remvoed a random + 0 I found, and nothing looked like it went wrong. As an aside, it looks like your MUA has stripped repeated whitespace in the quote above. Makes the code rather difficult to read. :P You might want to look into that. > Either way, the above is just a nit. Good work on tracking this > particular issue down! > > Reviewed-by: Cole Helbling <cole.e.helbling@outlook.com> Thanks for the review! Again, much appreciated even though I pushed already, and I'll try to add it retrospectively as a git note. It may interest you to know that this patch actually had a bug in it, which I didn't notice either until I managed to crash crosvm because of it. Here's the fix: https://spectrum-os.org/git/crosvm/commit/?id=ca5bdd2ac3e473e9b082c44c2870f446b96323a2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-07-09 13:25 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-14 11:43 [PATCH crosvm 0/2] Fix deadlock on early VmRequest Alyssa Ross 2020-06-14 11:43 ` [PATCH crosvm 1/2] msg_socket: introduce UnixSeqpacketExt Alyssa Ross 2020-06-16 0:17 ` Cole Helbling 2020-06-16 9:32 ` Alyssa Ross 2020-06-22 22:06 ` Cole Helbling 2020-06-23 2:32 ` Alyssa Ross 2020-06-25 1:54 ` impaqt 2020-07-09 13:24 ` Alyssa Ross 2020-06-14 11:43 ` [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest Alyssa Ross 2020-06-16 1:08 ` Cole Helbling 2020-06-16 9:39 ` Alyssa Ross
Code repositories for project(s) associated with this public inbox https://spectrum-os.org/git/crosvm https://spectrum-os.org/git/doc https://spectrum-os.org/git/mktuntap https://spectrum-os.org/git/nixpkgs https://spectrum-os.org/git/spectrum https://spectrum-os.org/git/ucspi-vsock https://spectrum-os.org/git/www This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).