From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.3 (2019-12-06) on atuin.qyliss.net X-Spam-Level: X-Spam-Status: No, score=0.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FORGED_SPF_HELO,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS autolearn=no autolearn_force=no version=3.4.3 Received: by atuin.qyliss.net (Postfix, from userid 496) id 732C2993E; Tue, 16 Jun 2020 01:29:19 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by atuin.qyliss.net (Postfix) with ESMTP id 455C59856; Tue, 16 Jun 2020 01:29:14 +0000 (UTC) Received: by atuin.qyliss.net (Postfix, from userid 496) id 220239853; Tue, 16 Jun 2020 01:29:13 +0000 (UTC) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-oln040092005042.outbound.protection.outlook.com [40.92.5.42]) by atuin.qyliss.net (Postfix) with ESMTPS id 2C1559915 for ; Tue, 16 Jun 2020 01:29:09 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IsqB8mT44/QmRFMfVhVSDmTKDcLf8qjPepZZ4KrOi9gXii8FdWRXD4gMI3m7WW1gLe+Dzw79IhQJaiZ/NPuRk4GAdJboNGhK4b4e2g1drsCZuXSAiesC5HOWpem/sfXGps7HGltYsaLRpYmgmRBTQkpc/tYPKR6VUcQui4IlRWiJdj0fQ+tgjscxe9U4FKg2IAxEIMdhXqnyfeRYo5k0jQlfAeiKus5JMwv2ntiYcPhM3fVrld7rZhz33slGhr3RlJbq9uEOWLjk235mHc6+DNsIpxpn18fdNpVQotZ/UUh2SDJJuomjZ3EjOTMT0qupL4eQusQY3XNsK7WViGkYNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+CNsAQjnni3KIvRx6PUwDmiXnjn3hu/Z4u2gpjS+qIA=; b=VWnrhyD5fkbfr4n6BwhQlEXBEwIU3DTZ5Onl14ViGgtRGf7Cs2W/hm9dd4Uwbxg6GUem/kxcrRqMYPJhv87TT2N7TKLPkayMc9SoO3gGIDRfsFPHIZ0cIl8GNgWm3Teu0o+8rq89Tf1Apoi2ljuC3ODVNzb2RK/pSj9Fchq3VPgljJjW5Qn0hLHYznhdQqwLS6AMk5cAI0FJuQ+fw4tc0pFrf2T1uk63heKNzkQGoTp4msUCHvAxQlQ3z8Yw32GBO5EWgGKoaA8+Sx9qtHKJYgfN2kql2MbPkGpVKPstjCcLI9CXuBbkfW6G8AnqQbmEyJ1u/QjNoUWsKcIs5hLihQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+CNsAQjnni3KIvRx6PUwDmiXnjn3hu/Z4u2gpjS+qIA=; b=brhZponKU2ja0GuNmHfhJGt8acQf4heD70icDDqwgtXslLNf7p+58eRwmTELyCCeY3WvbHAx9HHDzLGG1HdSq2+NB/9glu1wMk4CA5xnWdWoiYUXUiSvJCpakK0GfSPhQ2Ox6D9cBdIDKBPNbrETlPuLTTPXiVU9bYzc4nKvfVvziodXXO3kc1JqgAshjhkf2UDRKwIl+SiHYfVZLIBb0GeCzx6eYSCUnCTRAijPRb1A6HFy1cGERZa2GQjWc8xBRD80U7DYHxN3buC9CA6IDylaHd2W/OK38uQoHjP/Wnyn7GjfnIfzcEOmh4RCcA6cfXHXmvVM+djZflBOaWEgUw== Received: from BL2NAM02FT008.eop-nam02.prod.protection.outlook.com (10.152.76.56) by BL2NAM02HT021.eop-nam02.prod.protection.outlook.com (10.152.76.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3088.18; Tue, 16 Jun 2020 01:29:06 +0000 Received: from CH2PR14MB3579.namprd14.prod.outlook.com (2a01:111:e400:7e46::50) by BL2NAM02FT008.mail.protection.outlook.com (2a01:111:e400:7e46::162) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3088.18 via Frontend Transport; Tue, 16 Jun 2020 01:29:06 +0000 X-IncomingTopHeaderMarker: OriginalChecksum:EC3AD1F78D60DE1CDD87856F92AB3084FEB7CF28E6DE47A6EF82BC3F28710806;UpperCasedChecksum:0EE14AE77CAE33ACA72BD897F30782F603ACB388EE8881ACC0E307B0B4C054A0;SizeAsReceived:8521;Count:45 Received: from CH2PR14MB3579.namprd14.prod.outlook.com ([fe80::2948:142c:3047:102c]) by CH2PR14MB3579.namprd14.prod.outlook.com ([fe80::2948:142c:3047:102c%6]) with mapi id 15.20.3088.028; Tue, 16 Jun 2020 01:29:05 +0000 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 In-Reply-To: <20200614114344.22642-3-hi@alyssa.is> Date: Mon, 15 Jun 2020 18:08:48 -0700 Subject: Re: [PATCH crosvm 2/2] crosvm: fix deadlock on early VmRequest From: "Cole Helbling" To: "Alyssa Ross" , Message-ID: X-ClientProxiedBy: BYAPR02CA0031.namprd02.prod.outlook.com (2603:10b6:a02:ee::44) To CH2PR14MB3579.namprd14.prod.outlook.com (2603:10b6:610:62::18) X-Microsoft-Original-Message-ID: MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from localhost (67.187.170.40) by BYAPR02CA0031.namprd02.prod.outlook.com (2603:10b6:a02:ee::44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3088.21 via Frontend Transport; Tue, 16 Jun 2020 01:29:05 +0000 X-Microsoft-Original-Message-ID: X-TMN: [UKXE/9XmH76gwGH4RCiplK4TIpbMs/Bw] X-MS-PublicTrafficType: Email X-IncomingHeaderCount: 45 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-Correlation-Id: 261b73f3-2ce1-4627-4669-08d81194a8b2 X-MS-TrafficTypeDiagnostic: BL2NAM02HT021: X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Jy8IvzKGWH2Yxuy0OjdbS5jDa+23vGb8vjRhBQhtBPcArfCgMwLdEIJAmmHG5JEhgtve6onnK6kjpUGJJiBnLERiMc+HYXUOEO7Ha0AnC/p3DKLywXqyWbcWOUF0/Ggxl19dXCPJ1l8+H+JsHw9+m1A+FzNXKSRC/qSPIMa+6tcpLML1yzIhR5IF5qWIgVEZFDVQDq4EUrFlF4nEtDrIWw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:0;SRV:;IPV:NLI;SFV:NSPM;H:CH2PR14MB3579.namprd14.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:;DIR:OUT;SFP:1901; X-MS-Exchange-AntiSpam-MessageData: puXZX2x+IB18UpO2fMoGA6BNFoVDg1PjMGRIxcv7xUzQiDL6di6FKqJHISU4TMAqphLQKOqtdsHYCG8p5yulcko3qpb0sb/+zfnDzkl14koS6TCzqAy27sLzbUF5GhZjpHXJORSewsXMbxvKa+7oLw== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 261b73f3-2ce1-4627-4669-08d81194a8b2 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2020 01:29:05.7994 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-FromEntityHeader: Internet X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2NAM02HT021 Message-ID-Hash: 43RGWSMALCFFUDYOFOVJBVP7SJI3HOKX X-Message-ID-Hash: 43RGWSMALCFFUDYOFOVJBVP7SJI3HOKX X-MailFrom: cole.e.helbling@outlook.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.3.1 Precedence: list List-Id: Patches and low-level development discussion Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: On Sun Jun 14, 2020 at 11:43 AM, Alyssa Ross wrote: > If a DiskCommand was received on the crosvm socket before the > virtio-block device was activated, the Token::VmRequest case in the > main event loop would forward the request to the block device socket, > and then wait syncronously for a response. That response would never > come because the device hadn't been activated, and it would never be > activated because the event loop would never continue, and therefore > never be able to respond to the event that causes the device to be > activated. crosvm would therefore just hang forever, waiting for a > response that would never come. > > This patch fixes this deadlock by keeping track of whether devices > that send a response in this way have been activated yet. If they > have already been activated, messages are sent and responses are > received as normal. If they have not been activated, messages are > instead put into a per-device queue. Once the device is activated, > queued messages are processed all at once, and then the device is > marked as ready, and the queue is dropped. Future messages are > processed immediately as they come in, with no further queueing. Sounds like quite the "chicken or the egg"-type problem. > A device indicates that it is ready by sending a message on its > socket. The main crosvm event loop can then poll the socket, to be > notified when the device is ready. This poll event will only trigger > once -- once it has been received, it is removed from the poll > context. > > Currently, the only device type that responds to external control > messages AND needs to be activated by an event is the block device. > The balloon device does not respond to messages, and the xhci > controller device is activated up front. The code is nevertheless > structured so that it should be very easy to drop another kind of > device in to the queuing system, should that be required. +1 for modularity. > --- > devices/src/virtio/block.rs | 5 + > src/linux.rs | 201 ++++++++++++++++++++++++++++++------ > vm_control/src/lib.rs | 57 +++++++--- > 3 files changed, 213 insertions(+), 50 deletions(-) > > diff --git a/src/linux.rs b/src/linux.rs > index ec2067cf..ccd7d88c 100644 > --- a/src/linux.rs > +++ b/src/linux.rs > > ----->8----- > > @@ -1667,6 +1668,45 @@ fn file_to_i64>(path: P) -> > io::Result { > .ok_or_else(|| io::Error::new(io::ErrorKind::InvalidData, "empty file")) > } > =20 > +/// Returns a boolean indicating whether the VM should be exited. > +fn do_vm_request( > + request: VmRequest, > + device_socket: Option<&UnixSeqpacket>, > + control_socket: &VmControlResponseSocket, > + run_mode_arc: &Arc, > + vcpu_handles: &mut Vec>, > + io_bus: &mut Bus, > +) -> MsgResult { > + let mut run_mode_opt =3D None; > + let response =3D request.execute(&mut run_mode_opt, device_socket); > + control_socket.send(&response)?; > + if let Some(run_mode) =3D run_mode_opt { > + info!("control socket changed run mode to {}", run_mode); > + match run_mode { > + VmRunMode::Exiting =3D> Ok(true), > + VmRunMode::Running =3D> { > + if let VmRunMode::Suspending =3D *run_mode_arc.mtx.lock() { > + io_bus.notify_resume(); > + } > + run_mode_arc.set_and_notify(VmRunMode::Running); > + for handle in vcpu_handles { > + let _ =3D handle.kill(SIGRTMIN() + 0); I know this is essentially just moved (and probably isn't even something you wrote), but do you know why this is `+ 0`? Does this somehow coerce to the desired type or something? Maybe I'm overlooking something obvious here. > ----->8----- Either way, the above is just a nit. Good work on tracking this particular issue down! Reviewed-by: Cole Helbling