Re: Proxying Wayland for untrusted clients

general high-level discussion about spectrum
 help / color / mirror / Atom feed

From: Jean-Philippe Ouellet <jpo@vt.edu>
To: Alyssa Ross <hi@alyssa.is>
Cc: discuss@spectrum-os.org, Aaron Janse <aaron@ajanse.me>,
	Puck Meerburg <puck@puckipedia.com>,
	Thomas Leonard <talex5@gmail.com>
Subject: Re: Proxying Wayland for untrusted clients
Date: Sat, 22 May 2021 13:13:38 -0400	[thread overview]
Message-ID: <CABQWM_BsEesd2pdEPEJpLicrzozN3-s1Wm1d+FKOmX222r=GwQ@mail.gmail.com> (raw)
In-Reply-To: <8735ueudel.fsf@alyssa.is>

Glad to see this taking form!

First of all, in case the points of technical feedback further down
might be interpreted to suggest otherwise, I *am* in favor of
exploration of this idea. I do believe it to be worthwhile to
implement, just is perhaps not the ultimate nor sole solution one
might want, at least in the form currently conceived.

On Sat, May 22, 2021 at 9:06 AM Alyssa Ross <hi@alyssa.is> wrote:
> [...] I propose a proxy program that sits between
> Wayland clients and the compositor, in the same privelege domain as the
> compositor.  The proxy would decode and re-encode every Wayland request
> (client->compositor message), and would discard any request it didn't
> understand.
>
> This would mitigate the problem of a large, privileged
> program written in a memory-unsafe language being exposed to untrusted
> inputs.

As attractive as this approach sounds at first, I do not believe it
mitigates the problem in a manner which I would be comfortable placing
a large degree of trust in.

I see two fundamental weaknesses:

1. Opening pandora's box of distributed systems.

This proposal turns the overall system of policy enforcement from what
is currently a nice single centralized component (the compositor) with
global observation, total visibility, total control, and nice
non-distributed-system guarantees on message ordering, delivery, etc.
into a decentralized and distributed system without those guarantees,
subject to all manner of race conditions, etc. This makes it much
harder to guarantee that intended policies are indeed enforced in the
manner expected.

There are endless examples from the distributed systems literature of
how this can go wrong, but as a very relevant example from the same
application domain, I would like to bring attention to subtle race
conditions in the implementation of the Qubes clipboard-handling logic
which manifested in QSB #013 [1]. Refer to the linked advisory for
background and details. To summarize with additional context and
considerations as I see them applying here: The way the Qubes GUI
stack works is somewhat similar to what you describe, with two parts,
a "vmside" "agent" [2] (in the untrusted VM whose contents are to be
displayed) and an "xside" "daemon" [3] (in the semi-trusted GUI VM
running the window manager, traditionally dom0). The "vmside"
(gui-agent) implements a minimal X window manager, analogous to
sommelier [4], whereas the "xside" (so-called gui-daemon) implements
what is effectively analogous to the proxy you propose here living on
the compositor-side of the VM trust boundary. For each VM, a separate
vm-side gui-agent and x-side gui-daemon pair is created. A (possibly
outdated) version of the overall scheme and protocol is documented
here [5]. Things like safe clipboard handling is implemented via these
X11 proxies. Where things get a little... difficult... is the fact
that we do not have a single trusted arbiter with global visibility
and synchronous ordering guarantees, but rather a bunch of these GUI
daemon processes (which I posit should be viewed as a distributed
system) which must somehow successfully coordinate and synchronize to
enforce security guarantees. This is hard, and in the case of QSB
#013, corner cases were overlooked wherein an adversary could cause
states to be reached which violate user expectations in unsafe ways.
In hindsight, the fact that something was overlooked should not be a
surprising result, given the difficulty of the nature of the problem,
and that no formal methods had been applied (at the time of the
advisory) to attempt to verify the implementation.

[1]: https://github.com/QubesOS/qubes-secpack/blob/master/QSBs/qsb-013-2015.txt
[2]: https://github.com/QubesOS/qubes-gui-agent-linux/blob/master/gui-agent/vmside.c
[3]: https://github.com/QubesOS/qubes-gui-daemon/blob/master/gui-daemon/xside.c
[4]: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/README.md

I see a couple ways to address weakness 1:

a) exhaustively enumerate what interactions between these proxies
carry security concerns, formalize the security properties of the
protocols involved, and attempt to somehow verify that the
implementations correctly follow the protocols (I propose this mostly
as a straw man to illustrate the comparative apparent simplicity of
the following approach, though, again, I do not wish to discourage
anyone who might wish to try anyway -- for example I very much
appreciate [5], but we'd still need a bunch more work like it to have
meaningful coverage of the higher-level protocols like the clipboard
handling (which exists on top of the gui protocol, which might also
need a parallel model of the window manager for certain properties),
and then once completed initially, it still requires uncommon skill
sets to maintain in sync as the software it models evolves)

[5]: https://roscidus.com/blog/blog/2019/01/01/using-tla-plus-to-understand-xen-vchan/

b) avoid the class of distributed systems issues altogether, by
avoiding ay need to coordinate among proxies, by instead embedding the
relevant decision logic at a single point which already has total
ordering and global visibility, such as within the logical boundary of
the compositor (note "logical boundary" -- which needn't necessarily
mean "within the a single memory-unsafe monolith with guest-reachable
attack surface" -- there's likely a middle ground of compositor
disaggregation and privilege separation which does not also turn
authorization logic into a distributed system)

This point alone is not an argument against having a safer parser in
front, but rather suggests that a proxy seems unlikely to be the best
place for authorization logic, especially when said proxies need to
coordinate, and rather that such logic seems better suited for being
embedded within the compositor.

2. State-synchronization issues, protocol interpretation differences,
"the WAF problem".

Introducing a proxy introduces another implementation of a parser,
data model, and any applicable retained state. All of this has various
opportunities to become desynchronized at various layers.

I suspect matters may be made more complicated by messages potentially
being only contextually legal. It might not be a sufficient condition
for safety for a message to be merely structurally well-formed, but
rather, it may only be legal depending on some state retained by the
compositor which now must be somehow duplicated in the proxy. (For
example, does this target have focus? Are we in some particular mode?)

The parser and the compositor could become desynchronized at the
parsing layer, for example (e.g. malformed messages interpreted
differently, possibly leading to different interpretations of framing
boundaries, and each interpreting subsequent data in the stream as
different messages entirely).

They might become desynchronized by having different data models -- a
field being added, or possible values changing.

For decisions which require acting on state, the proxy must somehow
shadow the state of the compositor, and this state must be constructed
and mutated in the same way, lest they potentially become
desynchronized.

If any desynchronization happens at any layer, a message which is
determined by the proxy to be safe, may be interpreted by the proxy to
mean something which is safe, yet have a different and potentially
unsafe effect on the compositor.

This is rather similar to the fundamental problems plaguing all
so-called "web application firewalls", which have the perilous goal of
trying to determine what input might be safe / unsafe to some "too big
to be safe" giant ball of complexity behind it. To really know for
sure with accuracy would mean a crazy amount of state replication or
introspection into how the machine behind the proxy would interpret
the data, to the point that the complexity of the analysis in front
would exceed the thing it's trying to protect, at which point...
what's the point. The best one can hope for is stopping the
least-common-denominator attacks using heuristics, which is surely not
what one should aim for in this case.

From a theoretical perspective, this has analogues in something like
the halting problem. From a langsec perspective, Wayland is a weird
machine, and a guest's messages to it can be thought of as the "input
program". The proxy's objective is then to compute a partial function
over the wayland input program to resolve "is this sequence of wayland
protocol messages safe?" (rephrased: "does this wayland-lang program
terminate?"), which may be undecidable, (though perhaps not in the
specific case) depending on the power (complexity class, effective
grammar) of the "safe" subset of the instructions exposed by the
wayland weird-machine.

> Additionally, the proxy would support a plugin interface,
> through which the user of the proxy (or their distributor) could
> configure custom behaviour.  This could be used to prompt the user for
> confirmation before allowing a screen capture request, or even to
> implement a similar thing for e.g. clipboard access, for which there is
> no support in the Wayland protocol.  It could even be used to modify
> surfaces, to implement things like Qubes-style unspoofable coloured
> window borders.
>
> This approach would allow permissions systems and other custom Wayland
> behaviour to be implemented in a compositor-independent manner.
> Distributions which suppor tseveral compositors could implement
> customisations in a single place, and users of compositors which lack
> security features and the assurances memory-safety can provide against
> untrusted input would gain access to those things.

For the reasons above, I wonder if implementing this as a library
providing hardened (memory-safe, etc.) wayland protocol parsers and
additional hooks for authorization (ebpf-like?), as a library to be
used by compositors (possibly also leveraging some internal privilege
separation for the more complex components -- something like [6] comes
to mind) might be a preferable approach.

[6]: https://github.com/google/sandboxed-api

> I'd like to hear feedback here, but I think early in the life of this
> idea we should also reach out to the broader Wayland community.  I think
> there's a lot of potential for this idea beyond Spectrum, and it would
> be great if it could be something developed with input from a big
> breadth of Wayland users.

Completely agree.

Others in the broader secure desktops community who may be interested
that come to mind:
1. Qubes OS (wayland has been on the to-do list for far too long)
2. OpenXT (similar boat to qubes)
3. Flatpak (separating out permissions besides "whatever your
compositor lets you do" seems a logical evolution of their permissions
model)
4. Subgraph OS (less dead than it once seemed?)
5. Chromium OS (possibly? idk)
6. Genode (having a general interest in secure GUI architectures --
[7] and much since)
7. wlroots folks
8. mutter/gnome folks (possibly? if the embedding-into-compositor
route is taken)

[7]: https://genode-labs.com/publications/nitpicker-secure-gui-2005.pdf

Cheers,
Jean-Philippe

next prev parent reply	other threads:[~2021-05-22 17:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-22 13:05 Alyssa Ross
2021-05-22 13:45 ` Michael Raskin
2021-05-22 15:08   ` Alyssa Ross
2021-05-22 16:18   ` Michael Raskin
2021-05-22 17:22     ` Alyssa Ross
2021-05-22 18:48       ` Aaron Janse
2021-05-22 20:00     ` Michael Raskin
2021-05-22 17:13 ` Jean-Philippe Ouellet [this message]
2021-05-25 11:40   ` Alyssa Ross
2021-05-22 17:52 Josh DuBois
2021-05-22 20:05 ` Michael Raskin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABQWM_BsEesd2pdEPEJpLicrzozN3-s1Wm1d+FKOmX222r=GwQ@mail.gmail.com' \
    --to=jpo@vt.edu \
    --cc=aaron@ajanse.me \
    --cc=discuss@spectrum-os.org \
    --cc=hi@alyssa.is \
    --cc=puck@puckipedia.com \
    --cc=talex5@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).