From 8958407dcb3ba19cddf65579932ccad3d99bb9d5 Mon Sep 17 00:00:00 2001 From: Zide Chen Date: Thu, 14 Nov 2019 10:33:51 -0800 Subject: main: remove EPOLLHUP epoll item from host kernel synchronously control_sockets.swap_remove() could cause host kernel to invoke ep_remove() to remove the epoll item. But it's called from the task work, and it could be deferred after next poll_ctx.wait() which could unexpectedly pick up epoll events from the already closed fd. BUG=chromium:1019986 TEST=launch Crosvm guest from heavy loaded Linux host Change-Id: I474a7a47a484e3acfae4383d61601e1553bd674f Signed-off-by: Zide Chen Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1917495 Reviewed-by: Zach Reizner Reviewed-by: Daniel Verkamp Tested-by: kokoro --- src/linux.rs | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'src/linux.rs') diff --git a/src/linux.rs b/src/linux.rs index 24ef5ec..1177377 100644 --- a/src/linux.rs +++ b/src/linux.rs @@ -1845,10 +1845,26 @@ fn run_control( } // Sort in reverse so the highest indexes are removed first. This removal algorithm - // preserved correct indexes as each element is removed. + // preserves correct indexes as each element is removed. vm_control_indices_to_remove.sort_unstable_by(|a, b| b.cmp(a)); vm_control_indices_to_remove.dedup(); for index in vm_control_indices_to_remove { + // Delete the socket from the `poll_ctx` synchronously. Otherwise, the kernel will do + // this automatically when the FD inserted into the `poll_ctx` is closed after this + // if-block, but this removal can be deferred unpredictably. In some instances where the + // system is under heavy load, we can even get events returned by `poll_ctx` for an FD + // that has already been closed. Because the token associated with that spurious event + // now belongs to a different socket, the control loop will start to interact with + // sockets that might not be ready to use. This can cause incorrect hangup detection or + // blocking on a socket that will never be ready. See also: crbug.com/1019986 + if let Some(socket) = control_sockets.get(index) { + poll_ctx.delete(socket).map_err(Error::PollContextDelete)?; + } + + // This line implicitly drops the socket at `index` when it gets returned by + // `swap_remove`. After this line, the socket at `index` is not the one from + // `vm_control_indices_to_remove`. Because of this socket's change in index, we need to + // use `poll_ctx.modify` to change the associated index in its `Token::VmControl`. control_sockets.swap_remove(index); if let Some(socket) = control_sockets.get(index) { poll_ctx -- cgit 1.4.1