From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.3 (2019-12-06) on atuin.qyliss.net X-Spam-Level: X-Spam-Status: No, score=-1.2 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS autolearn=unavailable autolearn_force=no version=3.4.3 Received: by atuin.qyliss.net (Postfix, from userid 496) id 14DA37DB2; Sun, 14 Jun 2020 11:44:26 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by atuin.qyliss.net (Postfix) with ESMTP id E49A37D44; Sun, 14 Jun 2020 11:44:06 +0000 (UTC) Received: by atuin.qyliss.net (Postfix, from userid 496) id D309F7D36; Sun, 14 Jun 2020 11:44:01 +0000 (UTC) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by atuin.qyliss.net (Postfix) with ESMTPS id 0CB137CBD for ; Sun, 14 Jun 2020 11:43:56 +0000 (UTC) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 170325C00F4 for ; Sun, 14 Jun 2020 07:43:55 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Sun, 14 Jun 2020 07:43:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alyssa.is; h= from:to:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm3; bh=B3yzParMtbw30 XyCu6Q9/VSCCqRNaBb79tNVUZw9UYc=; b=wcIhJXre6bf2GChQgVcIvNiSvosLH UfYXS06lY5zHMCLPucyaoXd74Vt9LUwweyjsWqjhdksc6fhT6v61bKPCmZ+j/633 mvDdH752FdSwOBntJyCh3IaJs+DhlFHqXA+NMksirLE+fR4m/BLH18lHTooEWeTe IOWI7td3vaRY4vw5PDTVnFduEz+tY9dMshLNcz5b0LHR0M2HCHYFM/BICf9smd4M yyk5xxSn/tYuap8j6AjbpblhGvjoNi4Vgo/c9J/TWR7QuUVlXFJQeXWs51CdRJE6 Plmj42EFG1JKBn94E/zjbP0XqeABeyBB1Bsk5jeno4zH+kvzNP7iS9A6w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=B3yzParMtbw30XyCu6Q9/VSCCqRNaBb79tNVUZw9UYc=; b=q7V1erXH FoheMHp75G+RMxtxecp/xbL1u+FuNGrHCRjrNiZYYh6u8SyX+ZHAx2SXuBEhmfQW T83UcMh1aHcY4r4F791/bHNedkLnxBKpGL9PY65uLNRczbpZxgUFaXMYdm+o1bkd VsL4OYLGw0Irkgw0QtG2+aT0DU4wDXlZG6mKgPi2F34LFAD0HzZn5rT8FenIUCJm 9iknb/gWmaV8qV0+qfLR0rx+RTGGTSV+5cJNUEd6cBUM/vSqZuXrIYnnelh66vE9 t5uCqa6w2rA6hYLpSUOBKTjnVa988V42cv6csl8hYZHgRNMhCJrJhICU67bL4I4a c2SgYmbsxJzsvg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrudeiiedggeeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomheptehlhihsshgrucftohhsshcuoehhihesrghlhihsshgrrdhi sheqnecuggftrfgrthhtvghrnhepieffleeuueeulefhhfefffeghfetleetjedvudfgje ekkeelvdetkeegudeijeelnecuffhomhgrihhnpehithgvrhdrmhgrphdpuhhnfihrrghp rdgrshenucfkphepgeeirdektddruddvkedrudehvdenucevlhhushhtvghrufhiiigvpe dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehqhihlihhsshesgidvvddtrdhqhihlihhs shdrnhgvth X-ME-Proxy: Received: from x220.qyliss.net (p2e508098.dip0.t-ipconnect.de [46.80.128.152]) by mail.messagingengine.com (Postfix) with ESMTPA id 6B11330614FA for ; Sun, 14 Jun 2020 07:43:54 -0400 (EDT) Received: by x220.qyliss.net (Postfix, from userid 1000) id 2DFD66A3; Sun, 14 Jun 2020 11:43:53 +0000 (UTC) From: Alyssa Ross To: devel@spectrum-os.org Subject: [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest Date: Sun, 14 Jun 2020 11:43:44 +0000 Message-Id: <20200614114344.22642-3-hi@alyssa.is> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200614114344.22642-1-hi@alyssa.is> References: <20200614114344.22642-1-hi@alyssa.is> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: 5JBY6OSL26FETG4SBCJAY3G2HRHLBW4D X-Message-ID-Hash: 5JBY6OSL26FETG4SBCJAY3G2HRHLBW4D X-MailFrom: qyliss@x220.qyliss.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.3.1 Precedence: list List-Id: Patches and low-level development discussion Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: If a DiskCommand was received on the crosvm socket before the virtio-block device was activated, the Token::VmRequest case in the main event loop would forward the request to the block device socket, and then wait syncronously for a response. That response would never come because the device hadn't been activated, and it would never be activated because the event loop would never continue, and therefore never be able to respond to the event that causes the device to be activated. crosvm would therefore just hang forever, waiting for a response that would never come. This patch fixes this deadlock by keeping track of whether devices that send a response in this way have been activated yet. If they have already been activated, messages are sent and responses are received as normal. If they have not been activated, messages are instead put into a per-device queue. Once the device is activated, queued messages are processed all at once, and then the device is marked as ready, and the queue is dropped. Future messages are processed immediately as they come in, with no further queueing. A device indicates that it is ready by sending a message on its socket. The main crosvm event loop can then poll the socket, to be notified when the device is ready. This poll event will only trigger once -- once it has been received, it is removed from the poll context. Currently, the only device type that responds to external control messages AND needs to be activated by an event is the block device. The balloon device does not respond to messages, and the xhci controller device is activated up front. The code is nevertheless structured so that it should be very easy to drop another kind of device in to the queuing system, should that be required. --- devices/src/virtio/block.rs | 5 + src/linux.rs | 201 ++++++++++++++++++++++++++++++------ vm_control/src/lib.rs | 57 +++++++--- 3 files changed, 213 insertions(+), 50 deletions(-) diff --git a/devices/src/virtio/block.rs b/devices/src/virtio/block.rs index 80d51030..5a8972a7 100644 --- a/devices/src/virtio/block.rs +++ b/devices/src/virtio/block.rs @@ -390,6 +390,11 @@ impl Worker { } }; =20 + if let Err(e) =3D self.control_socket.send(&DiskControlResult::R= eady) { + error!("control socket failed to notify readiness: {}", e); + return; + } + 'poll: loop { let events =3D match poll_ctx.wait() { Ok(v) =3D> v, diff --git a/src/linux.rs b/src/linux.rs index ec2067cf..ccd7d88c 100644 --- a/src/linux.rs +++ b/src/linux.rs @@ -3,6 +3,7 @@ // found in the LICENSE file. =20 use std::cmp::max; +use std::collections::{BTreeMap, VecDeque}; use std::convert::TryFrom; use std::error::Error as StdError; use std::ffi::CStr; @@ -32,12 +33,12 @@ use acpi_tables::sdt::SDT; use devices::virtio::EventDevice; use devices::virtio::{self, Console, VirtioDevice}; use devices::{ - self, Ac97Backend, Ac97Dev, HostBackendDeviceProvider, PciDevice, Vf= ioContainer, VfioDevice, - VfioPciDevice, VirtioPciDevice, XhciController, + self, Ac97Backend, Ac97Dev, Bus, HostBackendDeviceProvider, PciDevic= e, VfioContainer, + VfioDevice, VfioPciDevice, VirtioPciDevice, XhciController, }; use io_jail::{self, Minijail}; use kvm::*; -use msg_socket::{MsgError, MsgReceiver, MsgSender, MsgSocket}; +use msg_socket::{MsgError, MsgReceiver, MsgResult, MsgSender, MsgSocket}= ; use net_util::{Error as NetError, MacAddress, Tap}; use remain::sorted; use resources::{Alloc, MmioType, SystemAllocator}; @@ -45,7 +46,7 @@ use sync::{Condvar, Mutex}; use sys_util::net::{UnixSeqpacket, UnixSeqpacketListener, UnlinkUnixSeqp= acketListener}; =20 use sys_util::{ - self, block_signal, clear_signal, drop_capabilities, error, flock, g= et_blocked_signals, + self, block_signal, clear_signal, debug, drop_capabilities, error, f= lock, get_blocked_signals, get_group_id, get_user_id, getegid, geteuid, info, register_rt_signa= l_handler, set_cpu_affinity, validate_raw_fd, warn, EventFd, FlockOperation, Gu= estAddress, GuestMemory, Killable, MemoryMappingArena, PollContext, PollToken, Protection, Sc= opedEvent, SignalFd, @@ -57,7 +58,7 @@ use vm_control::{ DiskControlResult, UsbControlSocket, VmControlResponseSocket, VmIrqR= equest, VmIrqResponse, VmIrqResponseSocket, VmMemoryControlRequestSocket, VmMemoryControlRe= sponseSocket, VmMemoryRequest, VmMemoryResponse, VmMsyncRequest, VmMsyncRequestSoc= ket, VmMsyncResponse, - VmMsyncResponseSocket, VmRunMode, + VmMsyncResponseSocket, VmRequest, VmRunMode, }; =20 use crate::{Config, DiskOption, Executable, SharedDir, SharedDirKind, To= uchDeviceOption}; @@ -1667,6 +1668,45 @@ fn file_to_i64>(path: P) -> io::Res= ult { .ok_or_else(|| io::Error::new(io::ErrorKind::InvalidData, "empty= file")) } =20 +/// Returns a boolean indicating whether the VM should be exited. +fn do_vm_request( + request: VmRequest, + device_socket: Option<&UnixSeqpacket>, + control_socket: &VmControlResponseSocket, + run_mode_arc: &Arc, + vcpu_handles: &mut Vec>, + io_bus: &mut Bus, +) -> MsgResult { + let mut run_mode_opt =3D None; + let response =3D request.execute(&mut run_mode_opt, device_socket); + control_socket.send(&response)?; + if let Some(run_mode) =3D run_mode_opt { + info!("control socket changed run mode to {}", run_mode); + match run_mode { + VmRunMode::Exiting =3D> Ok(true), + VmRunMode::Running =3D> { + if let VmRunMode::Suspending =3D *run_mode_arc.mtx.lock(= ) { + io_bus.notify_resume(); + } + run_mode_arc.set_and_notify(VmRunMode::Running); + for handle in vcpu_handles { + let _ =3D handle.kill(SIGRTMIN() + 0); + } + Ok(false) + } + other =3D> { + run_mode_arc.set_and_notify(other); + for handle in vcpu_handles { + let _ =3D handle.kill(SIGRTMIN() + 0); + } + Ok(false) + } + } + } else { + Ok(false) + } +} + pub fn run_config(cfg: Config) -> Result<()> { if cfg.sandbox { // Printing something to the syslog before entering minijail so = that libc's syslogger has a @@ -1818,7 +1858,7 @@ fn run_control( ) -> Result<()> { const LOWMEM_AVAILABLE: &str =3D "/sys/kernel/mm/chromeos-low_mem/av= ailable"; =20 - #[derive(PollToken)] + #[derive(Debug, PollToken)] enum Token { Exit, Suspend, @@ -1828,6 +1868,7 @@ fn run_control( BalloonResult, VmControlServer, VmControl { index: usize }, + DeviceReady { sock_fd: RawFd }, } =20 stdin() @@ -1912,6 +1953,38 @@ fn run_control( } vcpu_thread_barrier.wait(); =20 + struct QueuedDeviceReq { + request: VmRequest, + control_sock_index: usize, + } + + /// A device can either be waiting, or ready. + /// + /// If it's waiting, we queue up events to send to it once it's read= y. We can't just send the + /// requests straight away and let them sit on the socket until the = device comes up, because we + /// want to syncronously wait for a response after sending. If the = device hasn't been activated + /// when we do this, we'd sit waiting for a response forever and nev= er activate the device. + enum DeviceStatus<'a> { + Waiting(&'a UnixSeqpacket, VecDeque), + Ready, + } + + let mut queued_device_reqs =3D BTreeMap::::new(= ); + + // Each socket will send a ready message when it's ready to receive = requests. Add the sockets + // to the event loop, so that when they become ready, we can process= their queues. After the + // initial queue is processed, the device becomes ready, and the soc= ket is removed from the + // event loop. + for socket in disk_host_sockets.iter().map(AsRef::as_ref) { + let token =3D Token::DeviceReady { + sock_fd: socket.as_raw_fd(), + }; + poll_ctx.add(socket, token).map_err(Error::PollContextAdd)?; + + let status =3D DeviceStatus::Waiting(socket, VecDeque::new()); + queued_device_reqs.insert(socket.as_raw_fd(), status); + } + let mut ioapic_delayed =3D Vec::::default(); 'poll: loop { let events =3D { @@ -2110,40 +2183,44 @@ fn run_control( match socket { TaggedControlSocket::Vm(socket) =3D> match s= ocket.recv() { Ok(request) =3D> { - let mut run_mode_opt =3D None; - let response =3D request.execute( - &mut run_mode_opt, + let device_socket =3D request.socket= ( &balloon_host_socket, - disk_host_sockets, + &disk_host_sockets, &usb_control_socket, ); - if let Err(e) =3D socket.send(&respo= nse) { - error!("failed to send VmRespons= e: {}", e); - } - if let Some(run_mode) =3D run_mode_o= pt { - info!("control socket changed ru= n mode to {}", run_mode); - match run_mode { - VmRunMode::Exiting =3D> { - break 'poll; - } - VmRunMode::Running =3D> { - if let VmRunMode::Suspen= ding =3D - *run_mode_arc.mtx.lo= ck() - { - linux.io_bus.notify_= resume(); - } - run_mode_arc.set_and_not= ify(VmRunMode::Running); - for handle in &vcpu_hand= les { - let _ =3D handle.kil= l(SIGRTMIN() + 0); - } - } - other =3D> { - run_mode_arc.set_and_not= ify(other); - for handle in &vcpu_hand= les { - let _ =3D handle.kil= l(SIGRTMIN() + 0); + + match device_socket.map(|s| { + queued_device_reqs.get_mut(&s.as= _raw_fd()).unwrap() + }) { + None | Some(DeviceStatus::Ready)= =3D> { + match do_vm_request( + request, + device_socket, + socket, + &run_mode_arc, + &mut vcpu_handles, + &mut linux.io_bus, + ) { + Ok(true) =3D> break 'pol= l, + Ok(false) =3D> {} + Err(e) =3D> { + error!("failed to ha= ndle VmRequest: {}", e); + continue 'poll; } } } + + Some(DeviceStatus::Waiting(_, qu= eue)) =3D> { + debug!( + "Device {} not yet ready= . Queueing request.", + device_socket.unwrap().a= s_raw_fd() + ); + + queue.push_back(QueuedDevice= Req { + request, + control_sock_index: inde= x, + }); + } } } Err(e) =3D> { @@ -2204,6 +2281,61 @@ fn run_control( } } } + Token::DeviceReady { sock_fd } =3D> { + if let Some(DeviceStatus::Waiting(device_sock, mut q= ueue)) =3D + queued_device_reqs.remove(&sock_fd) + { + debug!( + "Device {} is ready. Processing its queue of= {} item(s).", + sock_fd, + queue.len() + ); + + // Deal with the message sent to indicate readin= ess. Its contents don't matter. + if let Err(e) =3D device_sock.recv(&mut []) { + error!( + "Failed to recv ready notification from = device socket: {}", + e + ); + continue 'poll; + } + + while let Some(QueuedDeviceReq { + request, + control_sock_index, + }) =3D queue.pop_front() + { + let control_sock =3D match control_sockets.g= et(control_sock_index) { + Some(TaggedControlSocket::Vm(s)) =3D> s, + _ =3D> unreachable!(), + }; + + match do_vm_request( + request, + Some(device_sock), + control_sock, + &run_mode_arc, + &mut vcpu_handles, + &mut linux.io_bus, + ) { + Ok(true) =3D> break 'poll, + Ok(false) =3D> {} + Err(e) =3D> { + error!("failed to handle VmRequest: = {}", e); + continue 'poll; + } + } + } + + if let Err(e) =3D poll_ctx.delete(device_sock) { + error!("failed to delete poll token: {}", e)= ; + } + } else { + error!("Received ready notification for a device= that isn't waiting"); + } + + queued_device_reqs.insert(sock_fd, DeviceStatus::Rea= dy); + } } } =20 @@ -2229,6 +2361,7 @@ fn run_control( _ =3D> {} } } + Token::DeviceReady { .. } =3D> {} } } =20 diff --git a/vm_control/src/lib.rs b/vm_control/src/lib.rs index c63fbfe6..d4922ab9 100644 --- a/vm_control/src/lib.rs +++ b/vm_control/src/lib.rs @@ -19,8 +19,9 @@ use std::os::unix::io::{AsRawFd, FromRawFd, RawFd}; use libc::{EINVAL, EIO, ENODEV}; =20 use kvm::{IrqRoute, IrqSource, Vm}; -use msg_socket::{MsgError, MsgOnSocket, MsgReceiver, MsgResult, MsgSende= r, MsgSocket}; +use msg_socket::{MsgError, MsgOnSocket, MsgResult, MsgSocket, UnixSeqpac= ketExt}; use resources::{Alloc, GpuMemoryDesc, MmioType, SystemAllocator}; +use sys_util::net::UnixSeqpacket; use sys_util::{error, Error as SysError, EventFd, GuestAddress, MemoryMa= pping, MmapError, Result}; =20 /// A data structure that either owns or borrows a file descriptor. @@ -189,6 +190,7 @@ impl Display for DiskControlCommand { =20 #[derive(MsgOnSocket, Debug)] pub enum DiskControlResult { + Ready, Ok, Err(SysError), } @@ -563,17 +565,34 @@ fn register_memory( } =20 impl VmRequest { + pub fn socket<'s>( + &self, + balloon_host_socket: &'s BalloonControlRequestSocket, + disk_host_sockets: &'s [DiskControlRequestSocket], + usb_control_socket: &'s UsbControlSocket, + ) -> Option<&'s UnixSeqpacket> { + use VmRequest::*; + match *self { + Exit =3D> None, + Suspend =3D> None, + Resume =3D> None, + BalloonCommand(_) =3D> Some(balloon_host_socket.as_ref()), + DiskCommand { disk_index, .. } =3D> disk_host_sockets.get(di= sk_index).map(AsRef::as_ref), + UsbCommand(_) =3D> Some(usb_control_socket.as_ref()), + } + } + /// Executes this request on the given Vm and other mutable state. /// /// This does not return a result, instead encapsulating the success= or failure in a /// `VmResponse` with the intended purpose of sending the response b= ack over the socket that /// received this `VmRequest`. + /// + /// The `socket` parameter must be the value that was obtained by ca= lling `VmRequest::socket`. pub fn execute( &self, run_mode: &mut Option, - balloon_host_socket: &BalloonControlRequestSocket, - disk_host_sockets: &[DiskControlRequestSocket], - usb_control_socket: &UsbControlSocket, + socket: Option<&UnixSeqpacket>, ) -> VmResponse { match *self { VmRequest::Exit =3D> { @@ -589,14 +608,18 @@ impl VmRequest { VmResponse::Ok } VmRequest::BalloonCommand(BalloonControlCommand::Adjust { nu= m_bytes }) =3D> { - match balloon_host_socket.send(&BalloonControlCommand::A= djust { num_bytes }) { + match socket + .unwrap() + .send_msg_on_socket(&BalloonControlCommand::Adjust {= num_bytes }) + { Ok(_) =3D> VmResponse::Ok, Err(_) =3D> VmResponse::Err(SysError::last()), } } VmRequest::BalloonCommand(BalloonControlCommand::Stats) =3D>= { - match balloon_host_socket.send(&BalloonControlCommand::S= tats {}) { - Ok(_) =3D> match balloon_host_socket.recv() { + let socket =3D socket.unwrap(); + match socket.send_msg_on_socket(&BalloonControlCommand::= Stats {}) { + Ok(_) =3D> match socket.recv_msg_on_socket() { Ok(BalloonControlResult::Stats { stats, balloon_actual, @@ -612,19 +635,20 @@ impl VmRequest { Err(_) =3D> VmResponse::Err(SysError::last()), } } - VmRequest::DiskCommand { - disk_index, - ref command, - } =3D> { + VmRequest::DiskCommand { ref command, .. } =3D> { // Forward the request to the block device process via i= ts control socket. - if let Some(sock) =3D disk_host_sockets.get(disk_index) = { - if let Err(e) =3D sock.send(command) { + if let Some(sock) =3D socket { + if let Err(e) =3D sock.send_msg_on_socket(command) { error!("disk socket send failed: {}", e); VmResponse::Err(SysError::new(EINVAL)) } else { - match sock.recv() { + match sock.recv_msg_on_socket() { Ok(DiskControlResult::Ok) =3D> VmResponse::O= k, Ok(DiskControlResult::Err(e)) =3D> VmRespons= e::Err(e), + Ok(resp) =3D> { + error!("unexpected disk socket result: {= :?}", resp); + VmResponse::Err(SysError::new(EINVAL)) + } Err(e) =3D> { error!("disk socket recv failed: {}", e)= ; VmResponse::Err(SysError::new(EINVAL)) @@ -636,12 +660,13 @@ impl VmRequest { } } VmRequest::UsbCommand(ref cmd) =3D> { - let res =3D usb_control_socket.send(cmd); + let socket =3D socket.unwrap(); + let res =3D socket.send_msg_on_socket(cmd); if let Err(e) =3D res { error!("fail to send command to usb control socket: = {}", e); return VmResponse::Err(SysError::new(EIO)); } - match usb_control_socket.recv() { + match socket.recv_msg_on_socket() { Ok(response) =3D> VmResponse::UsbResponse(response), Err(e) =3D> { error!("fail to recv command from usb control so= cket: {}", e); --=20 2.26.2