[-- Attachment #1: Type: text/plain, Size: 2922 bytes --] I've been thinking a lot about this for a while, thanks to conversations with Thomas and Puck, CCed here. I think it's time to put it into words properly and start working towards making it happen. One of the benefits that Wayland is supposed to have over X11 is security. A Wayland application isn't supposed to be able to record the screen without user permission, for example. But in most compositors, it can, with no restrictions. Existing Wayland compositors are monolithic, and each one would have to implement its own access controls. (Mutter already does this to some extent, at least for screen sharing, I believe.) The popular Wayland compositors are largely focused on being feature-complete reimplementations of their X11 equivalents, and so taking advantage of the security features and access controls the Wayland protocol makes possible hasn't been a priority for them. Additionally, every popular Wayland compositor is written in a memory-unsafe language, and this combined with the complexity of the Wayland protocol, with all the extensions involved, presents a serious concern to applications of Wayland that involve untrusted clients. To solve these problems, I propose a proxy program that sits between Wayland clients and the compositor, in the same privelege domain as the compositor. The proxy would decode and re-encode every Wayland request (client->compositor message), and would discard any request it didn't understand. This would mitigate the problem of a large, privileged program written in a memory-unsafe language being exposed to untrusted inputs. Additionally, the proxy would support a plugin interface, through which the user of the proxy (or their distributor) could configure custom behaviour. This could be used to prompt the user for confirmation before allowing a screen capture request, or even to implement a similar thing for e.g. clipboard access, for which there is no support in the Wayland protocol. It could even be used to modify surfaces, to implement things like Qubes-style unspoofable coloured window borders. This approach would allow permissions systems and other custom Wayland behaviour to be implemented in a compositor-independent manner. Distributions which suppor tseveral compositors could implement customisations in a single place, and users of compositors which lack security features and the assurances memory-safety can provide against untrusted input would gain access to those things. I'd like to hear feedback here, but I think early in the life of this idea we should also reach out to the broader Wayland community. I think there's a lot of potential for this idea beyond Spectrum, and it would be great if it could be something developed with input from a big breadth of Wayland users. If we can do that, it might be sensible for it to live at freedesktop.org? I'm not sure how that works. Let me know what you all think. :) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
>One of the benefits that Wayland is supposed to have over X11 is >security. A Wayland application isn't supposed to be able to record the >screen without user permission, for example. But in most compositors, >it can, with no restrictions. Existing Wayland compositors are … and theoretically, an X server could feed empty capture to a client it does not like … (of course with literal decades of actual backwards compatibility, X11 protocol has accumulated enough extensions that assigning permissions to all of them might be somewhat painful) >monolithic, and each one would have to implement its own access >controls. (Mutter already does this to some extent, at least for screen >sharing, I believe.) The popular Wayland compositors are largely >focused on being feature-complete reimplementations of their X11 >equivalents, and so taking advantage of the security features and access >controls the Wayland protocol makes possible hasn't been a priority for … unsurprisingly, as these are typically WM teams who are now deprived of what Xorg server did for everyone. >To solve these problems, I propose a proxy program that sits between >Wayland clients and the compositor, in the same privelege domain as the >compositor. The proxy would decode and re-encode every Wayland request >(client->compositor message), and would discard any request it didn't >understand. This would mitigate the problem of a large, privileged >program written in a memory-unsafe language being exposed to untrusted Presumably, also validating that the shared memory buffers passed around have the same size and protection as promised? >inputs. Additionally, the proxy would support a plugin interface, >through which the user of the proxy (or their distributor) could >configure custom behaviour. This could be used to prompt the user for >confirmation before allowing a screen capture request, or even to >implement a similar thing for e.g. clipboard access, for which there is >no support in the Wayland protocol. It could even be used to modify >surfaces, to implement things like Qubes-style unspoofable coloured >window borders. I am tempted to ask how close it will be to providing a socket for WM and window decorator implementation (with some suitably limited compositor as the backend behind the proxy). (So basically, defining a scope will be hard, and defining a scope in a usefully extensible way might be even harder) >This approach would allow permissions systems and other custom Wayland >behaviour to be implemented in a compositor-independent manner. >Distributions which suppor tseveral compositors could implement >customisations in a single place, and users of compositors which lack >security features and the assurances memory-safety can provide against >untrusted input would gain access to those things. That surely sounds good…
[-- Attachment #1: Type: text/plain, Size: 4162 bytes --] Michael Raskin <7c6f434c@mail.ru> writes: >>One of the benefits that Wayland is supposed to have over X11 is >>security. A Wayland application isn't supposed to be able to record the >>screen without user permission, for example. But in most compositors, >>it can, with no restrictions. Existing Wayland compositors are > > … and theoretically, an X server could feed empty capture to a client it > does not like … > > (of course with literal decades of actual backwards compatibility, X11 > protocol has accumulated enough extensions that assigning permissions > to all of them might be somewhat painful) Considering that the world has convered on a single X server implementation, and it's apparently pretty horrible to maintain, I'm not sure I feel very positive about the idea of a custom one! And yeah, with Wayland clients are already expecting to have to ask for permission. (I've just learned that this is actually done over DBus, so the proxy would have to implement that as well, ugh.) I suspect X11 clients wouldn't be very happy if it took them seconds to get the result of their attempted screen capture, and users wouldn't be very happy if screen sharing was just a blank box by default, rather than a permission request. (I'm not sure which of those would be possible with X11.) >>monolithic, and each one would have to implement its own access >>controls. (Mutter already does this to some extent, at least for screen >>sharing, I believe.) The popular Wayland compositors are largely >>focused on being feature-complete reimplementations of their X11 >>equivalents, and so taking advantage of the security features and access >>controls the Wayland protocol makes possible hasn't been a priority for > > … unsurprisingly, as these are typically WM teams who are now deprived > of what Xorg server did for everyone. Only the ones that don't use wlroots, I think. wlroots has its own problems, but in large part those are the problems we're trying to mitigate here. >>To solve these problems, I propose a proxy program that sits between >>Wayland clients and the compositor, in the same privelege domain as the >>compositor. The proxy would decode and re-encode every Wayland request >>(client->compositor message), and would discard any request it didn't >>understand. This would mitigate the problem of a large, privileged >>program written in a memory-unsafe language being exposed to untrusted > > Presumably, also validating that the shared memory buffers passed around > have the same size and protection as promised? Aren't shared memory buffers usually handed out by the compositor, to the client? IIRC this was the reason virtio wayland can work when it only supports shared memory that was allocated by the host. >>inputs. Additionally, the proxy would support a plugin interface, >>through which the user of the proxy (or their distributor) could >>configure custom behaviour. This could be used to prompt the user for >>confirmation before allowing a screen capture request, or even to >>implement a similar thing for e.g. clipboard access, for which there is >>no support in the Wayland protocol. It could even be used to modify >>surfaces, to implement things like Qubes-style unspoofable coloured >>window borders. > > I am tempted to ask how close it will be to providing a socket for WM > and window decorator implementation (with some suitably limited > compositor as the backend behind the proxy). > > (So basically, defining a scope will be hard, and defining a scope in > a usefully extensible way might be even harder) I don't understand this point. Can you rephrase / expand? >>This approach would allow permissions systems and other custom Wayland >>behaviour to be implemented in a compositor-independent manner. >>Distributions which suppor tseveral compositors could implement >>customisations in a single place, and users of compositors which lack >>security features and the assurances memory-safety can provide against >>untrusted input would gain access to those things. > > That surely sounds good… [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
>>>One of the benefits that Wayland is supposed to have over X11 is >>>security. A Wayland application isn't supposed to be able to record the >>>screen without user permission, for example. But in most compositors, >>>it can, with no restrictions. Existing Wayland compositors are >> >> … and theoretically, an X server could feed empty capture to a client it >> does not like … >> >> (of course with literal decades of actual backwards compatibility, X11 >> protocol has accumulated enough extensions that assigning permissions >> to all of them might be somewhat painful) > >Considering that the world has convered on a single X server >implementation, and it's apparently pretty horrible to maintain, I'm not >sure I feel very positive about the idea of a custom one! Considering how much works fine in Xvnc which kind of lacks half the extensions, and considering that all the bad hardware stuff now needs to be handled in each compositor, it is unclear how much worse it would actually be. So OpenGL-first design sounds like the real reason for all the mess. >And yeah, with Wayland clients are already expecting to have to ask for >permission. (I've just learned that this is actually done over DBus, so >the proxy would have to implement that as well, ugh.) I suspect X11 >clients wouldn't be very happy if it took them seconds to get the result >of their attempted screen capture, and users wouldn't be very happy if >screen sharing was just a blank box by default, rather than a permission >request. (I'm not sure which of those would be possible with X11.) Assuming the client does not use the protocol extensions to control the request, presumably black rectangle immediately and a permission dialog from WM to the user, then the capture suddenly getting real data. Single-window capture depends on whether the client can handle capture resize, if yes, no problem (capture the generic «window» in the client, get black box while the user picks the true target). Otherwise there might be some juggling indeed. >> … unsurprisingly, as these are typically WM teams who are now deprived >> of what Xorg server did for everyone. > >Only the ones that don't use wlroots, I think. wlroots has its own >problems, but in large part those are the problems we're trying to >mitigate here. Yes, but it seems memory-unsafe even by the «careful C» standards, and it seems to crash noticeably often with reasonably well-behaved clients, so just a protocol compliance filter would not be enough. >>>To solve these problems, I propose a proxy program that sits between >>>Wayland clients and the compositor, in the same privelege domain as the >>>compositor. The proxy would decode and re-encode every Wayland request >>>(client->compositor message), and would discard any request it didn't >>>understand. This would mitigate the problem of a large, privileged >>>program written in a memory-unsafe language being exposed to untrusted >> >> Presumably, also validating that the shared memory buffers passed around >> have the same size and protection as promised? > >Aren't shared memory buffers usually handed out by the compositor, to >the client? IIRC this was the reason virtio wayland can work when it >only supports shared memory that was allocated by the host. Looks like for wl_shm the client creates and shm object and sends FD to the server, then both sides mmap from there. Given that one could cook an FD with approximately arbitrary combination of properties, I guess some care should be taken about this, too. >>>no support in the Wayland protocol. It could even be used to modify >>>surfaces, to implement things like Qubes-style unspoofable coloured >>>window borders. >> >> I am tempted to ask how close it will be to providing a socket for WM >> and window decorator implementation (with some suitably limited >> compositor as the backend behind the proxy). >> >> (So basically, defining a scope will be hard, and defining a scope in >> a usefully extensible way might be even harder) > >I don't understand this point. Can you rephrase / expand? Well, you start describing a proxy that is basically «only valid messages pass» and that's it, then add that it could also implement some more functionality. Then you say plugins for policy-heavy functionality. (I guess at that point the natural next step is a socket so that safety-handling and user logic could be in different processes) This sounds like a recipe for scope creep, although it might also be good as a single well-vetted Thing being safety-critical and a lot of policy decisions being pushed to restartable and functionality-restricted separate processes is actually better than the current recommended approach to Wayland compositors. If you want to try pushing the project to freedesktop, I suspect it is a good idea to define how much scope you want to include into the pitch. Although apparently they don't have any objections to hosting software with heavy scope creep, when reaching for a high profile, it is a good idea to set internal expectations before having to manage external ones.
Glad to see this taking form! First of all, in case the points of technical feedback further down might be interpreted to suggest otherwise, I *am* in favor of exploration of this idea. I do believe it to be worthwhile to implement, just is perhaps not the ultimate nor sole solution one might want, at least in the form currently conceived. On Sat, May 22, 2021 at 9:06 AM Alyssa Ross <hi@alyssa.is> wrote: > [...] I propose a proxy program that sits between > Wayland clients and the compositor, in the same privelege domain as the > compositor. The proxy would decode and re-encode every Wayland request > (client->compositor message), and would discard any request it didn't > understand. > > This would mitigate the problem of a large, privileged > program written in a memory-unsafe language being exposed to untrusted > inputs. As attractive as this approach sounds at first, I do not believe it mitigates the problem in a manner which I would be comfortable placing a large degree of trust in. I see two fundamental weaknesses: 1. Opening pandora's box of distributed systems. This proposal turns the overall system of policy enforcement from what is currently a nice single centralized component (the compositor) with global observation, total visibility, total control, and nice non-distributed-system guarantees on message ordering, delivery, etc. into a decentralized and distributed system without those guarantees, subject to all manner of race conditions, etc. This makes it much harder to guarantee that intended policies are indeed enforced in the manner expected. There are endless examples from the distributed systems literature of how this can go wrong, but as a very relevant example from the same application domain, I would like to bring attention to subtle race conditions in the implementation of the Qubes clipboard-handling logic which manifested in QSB #013 [1]. Refer to the linked advisory for background and details. To summarize with additional context and considerations as I see them applying here: The way the Qubes GUI stack works is somewhat similar to what you describe, with two parts, a "vmside" "agent" [2] (in the untrusted VM whose contents are to be displayed) and an "xside" "daemon" [3] (in the semi-trusted GUI VM running the window manager, traditionally dom0). The "vmside" (gui-agent) implements a minimal X window manager, analogous to sommelier [4], whereas the "xside" (so-called gui-daemon) implements what is effectively analogous to the proxy you propose here living on the compositor-side of the VM trust boundary. For each VM, a separate vm-side gui-agent and x-side gui-daemon pair is created. A (possibly outdated) version of the overall scheme and protocol is documented here [5]. Things like safe clipboard handling is implemented via these X11 proxies. Where things get a little... difficult... is the fact that we do not have a single trusted arbiter with global visibility and synchronous ordering guarantees, but rather a bunch of these GUI daemon processes (which I posit should be viewed as a distributed system) which must somehow successfully coordinate and synchronize to enforce security guarantees. This is hard, and in the case of QSB #013, corner cases were overlooked wherein an adversary could cause states to be reached which violate user expectations in unsafe ways. In hindsight, the fact that something was overlooked should not be a surprising result, given the difficulty of the nature of the problem, and that no formal methods had been applied (at the time of the advisory) to attempt to verify the implementation. [1]: https://github.com/QubesOS/qubes-secpack/blob/master/QSBs/qsb-013-2015.txt [2]: https://github.com/QubesOS/qubes-gui-agent-linux/blob/master/gui-agent/vmside.c [3]: https://github.com/QubesOS/qubes-gui-daemon/blob/master/gui-daemon/xside.c [4]: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/README.md I see a couple ways to address weakness 1: a) exhaustively enumerate what interactions between these proxies carry security concerns, formalize the security properties of the protocols involved, and attempt to somehow verify that the implementations correctly follow the protocols (I propose this mostly as a straw man to illustrate the comparative apparent simplicity of the following approach, though, again, I do not wish to discourage anyone who might wish to try anyway -- for example I very much appreciate [5], but we'd still need a bunch more work like it to have meaningful coverage of the higher-level protocols like the clipboard handling (which exists on top of the gui protocol, which might also need a parallel model of the window manager for certain properties), and then once completed initially, it still requires uncommon skill sets to maintain in sync as the software it models evolves) [5]: https://roscidus.com/blog/blog/2019/01/01/using-tla-plus-to-understand-xen-vchan/ b) avoid the class of distributed systems issues altogether, by avoiding ay need to coordinate among proxies, by instead embedding the relevant decision logic at a single point which already has total ordering and global visibility, such as within the logical boundary of the compositor (note "logical boundary" -- which needn't necessarily mean "within the a single memory-unsafe monolith with guest-reachable attack surface" -- there's likely a middle ground of compositor disaggregation and privilege separation which does not also turn authorization logic into a distributed system) This point alone is not an argument against having a safer parser in front, but rather suggests that a proxy seems unlikely to be the best place for authorization logic, especially when said proxies need to coordinate, and rather that such logic seems better suited for being embedded within the compositor. 2. State-synchronization issues, protocol interpretation differences, "the WAF problem". Introducing a proxy introduces another implementation of a parser, data model, and any applicable retained state. All of this has various opportunities to become desynchronized at various layers. I suspect matters may be made more complicated by messages potentially being only contextually legal. It might not be a sufficient condition for safety for a message to be merely structurally well-formed, but rather, it may only be legal depending on some state retained by the compositor which now must be somehow duplicated in the proxy. (For example, does this target have focus? Are we in some particular mode?) The parser and the compositor could become desynchronized at the parsing layer, for example (e.g. malformed messages interpreted differently, possibly leading to different interpretations of framing boundaries, and each interpreting subsequent data in the stream as different messages entirely). They might become desynchronized by having different data models -- a field being added, or possible values changing. For decisions which require acting on state, the proxy must somehow shadow the state of the compositor, and this state must be constructed and mutated in the same way, lest they potentially become desynchronized. If any desynchronization happens at any layer, a message which is determined by the proxy to be safe, may be interpreted by the proxy to mean something which is safe, yet have a different and potentially unsafe effect on the compositor. This is rather similar to the fundamental problems plaguing all so-called "web application firewalls", which have the perilous goal of trying to determine what input might be safe / unsafe to some "too big to be safe" giant ball of complexity behind it. To really know for sure with accuracy would mean a crazy amount of state replication or introspection into how the machine behind the proxy would interpret the data, to the point that the complexity of the analysis in front would exceed the thing it's trying to protect, at which point... what's the point. The best one can hope for is stopping the least-common-denominator attacks using heuristics, which is surely not what one should aim for in this case. From a theoretical perspective, this has analogues in something like the halting problem. From a langsec perspective, Wayland is a weird machine, and a guest's messages to it can be thought of as the "input program". The proxy's objective is then to compute a partial function over the wayland input program to resolve "is this sequence of wayland protocol messages safe?" (rephrased: "does this wayland-lang program terminate?"), which may be undecidable, (though perhaps not in the specific case) depending on the power (complexity class, effective grammar) of the "safe" subset of the instructions exposed by the wayland weird-machine. > Additionally, the proxy would support a plugin interface, > through which the user of the proxy (or their distributor) could > configure custom behaviour. This could be used to prompt the user for > confirmation before allowing a screen capture request, or even to > implement a similar thing for e.g. clipboard access, for which there is > no support in the Wayland protocol. It could even be used to modify > surfaces, to implement things like Qubes-style unspoofable coloured > window borders. > > This approach would allow permissions systems and other custom Wayland > behaviour to be implemented in a compositor-independent manner. > Distributions which suppor tseveral compositors could implement > customisations in a single place, and users of compositors which lack > security features and the assurances memory-safety can provide against > untrusted input would gain access to those things. For the reasons above, I wonder if implementing this as a library providing hardened (memory-safe, etc.) wayland protocol parsers and additional hooks for authorization (ebpf-like?), as a library to be used by compositors (possibly also leveraging some internal privilege separation for the more complex components -- something like [6] comes to mind) might be a preferable approach. [6]: https://github.com/google/sandboxed-api > I'd like to hear feedback here, but I think early in the life of this > idea we should also reach out to the broader Wayland community. I think > there's a lot of potential for this idea beyond Spectrum, and it would > be great if it could be something developed with input from a big > breadth of Wayland users. Completely agree. Others in the broader secure desktops community who may be interested that come to mind: 1. Qubes OS (wayland has been on the to-do list for far too long) 2. OpenXT (similar boat to qubes) 3. Flatpak (separating out permissions besides "whatever your compositor lets you do" seems a logical evolution of their permissions model) 4. Subgraph OS (less dead than it once seemed?) 5. Chromium OS (possibly? idk) 6. Genode (having a general interest in secure GUI architectures -- [7] and much since) 7. wlroots folks 8. mutter/gnome folks (possibly? if the embedding-into-compositor route is taken) [7]: https://genode-labs.com/publications/nitpicker-secure-gui-2005.pdf Cheers, Jean-Philippe
[-- Attachment #1: Type: text/plain, Size: 5357 bytes --] Michael Raskin <7c6f434c@mail.ru> writes: >>>>One of the benefits that Wayland is supposed to have over X11 is >>>>security. A Wayland application isn't supposed to be able to record the >>>>screen without user permission, for example. But in most compositors, >>>>it can, with no restrictions. Existing Wayland compositors are >>> >>> … and theoretically, an X server could feed empty capture to a client it >>> does not like … >>> >>> (of course with literal decades of actual backwards compatibility, X11 >>> protocol has accumulated enough extensions that assigning permissions >>> to all of them might be somewhat painful) >> >>Considering that the world has convered on a single X server >>implementation, and it's apparently pretty horrible to maintain, I'm not >>sure I feel very positive about the idea of a custom one! > > Considering how much works fine in Xvnc which kind of lacks half the > extensions, and considering that all the bad hardware stuff now needs to > be handled in each compositor, it is unclear how much worse it would > actually be. > > So OpenGL-first design sounds like the real reason for all the mess. What do you mean by bad hardware stuff? Most hardware should be handled by KMS and libinput, shouldn't it? >>> … unsurprisingly, as these are typically WM teams who are now deprived >>> of what Xorg server did for everyone. >> >>Only the ones that don't use wlroots, I think. wlroots has its own >>problems, but in large part those are the problems we're trying to >>mitigate here. > > Yes, but it seems memory-unsafe even by the «careful C» standards, and > it seems to crash noticeably often with reasonably well-behaved clients, > so just a protocol compliance filter would not be enough. My experience with wlroots has been that when it crashes, it's because of quirks with monitor state and stuff, not anything from the Wayland protocol. Have you had a different experience? Another thing the proxy would be really cool for that I didn't mention before is debugging. Potentially you could even record and replay Wayland messages, which would help with any crash that was from Wayland. >>>>To solve these problems, I propose a proxy program that sits between >>>>Wayland clients and the compositor, in the same privelege domain as the >>>>compositor. The proxy would decode and re-encode every Wayland request >>>>(client->compositor message), and would discard any request it didn't >>>>understand. This would mitigate the problem of a large, privileged >>>>program written in a memory-unsafe language being exposed to untrusted >>> >>> Presumably, also validating that the shared memory buffers passed around >>> have the same size and protection as promised? >> >>Aren't shared memory buffers usually handed out by the compositor, to >>the client? IIRC this was the reason virtio wayland can work when it >>only supports shared memory that was allocated by the host. > > Looks like for wl_shm the client creates and shm object and sends FD to > the server, then both sides mmap from there. Given that one could cook > an FD with approximately arbitrary combination of properties, I guess > some care should be taken about this, too. Hmm, yes, looks like you're right. I wonder how virtio wayland can work the way it does, then... But yes, that is something we could validate. >>>>no support in the Wayland protocol. It could even be used to modify >>>>surfaces, to implement things like Qubes-style unspoofable coloured >>>>window borders. >>> >>> I am tempted to ask how close it will be to providing a socket for WM >>> and window decorator implementation (with some suitably limited >>> compositor as the backend behind the proxy). >>> >>> (So basically, defining a scope will be hard, and defining a scope in >>> a usefully extensible way might be even harder) >> >>I don't understand this point. Can you rephrase / expand? > > Well, you start describing a proxy that is basically «only valid > messages pass» and that's it, then add that it could also implement some > more functionality. Then you say plugins for policy-heavy functionality. > (I guess at that point the natural next step is a socket so that > safety-handling and user logic could be in different processes) > > This sounds like a recipe for scope creep, although it might also be > good as a single well-vetted Thing being safety-critical and a lot of > policy decisions being pushed to restartable and > functionality-restricted separate processes is actually better than the > current recommended approach to Wayland compositors. > > If you want to try pushing the project to freedesktop, I suspect it is > a good idea to define how much scope you want to include into the pitch. > Although apparently they don't have any objections to hosting software > with heavy scope creep, when reaching for a high profile, it is a good > idea to set internal expectations before having to manage external ones. I actually think the scope is fairly limited? 1. Receive request 2. Validate request 3. Asynchronously run request through plugins (you're right that external processes are probably a better idea, although I worry about complexity and performance). 4. Forward request to compositor [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
On May 22, 2021, at 8:05 AM, Alyssa Ross <hi@alyssa.is> wrote: > > One of the benefits that Wayland is supposed to have over X11 is > security. A Wayland application isn't supposed to be able to record the > screen without user permission, for example. But in most compositors, > it can, with no restrictions. <snip> > > To solve these problems, I propose a proxy program that sits between > Wayland clients and the compositor, in the same privelege domain as the > compositor. <snip> > If we can do that, it might be sensible for > it to live at freedesktop.org? I'm not sure how that works. I am curious, if you have time, to hear more on why the approach of a proxy vs picking a compositor and implementing security there. If the problem is that the Wayland community so far has not considered security a priority, it seems that a security proxy may suffer from those same forces. Basically, will it be easier to attract developers or gain widespread adoption of a proxy as opposed to getting buy-in to do security directly in a compositor? You mention writing in a memory safe language and having a compositor neutral solution as technical advantages. Do you think a proxy is a good choice primarily because it can achieve a better technical result, or is the choice of a new component more a matter of difficulty getting community buy-in from a popular compositor and doing security there? How would you weigh the upsides of a new project against the difficulties of getting a new thing off the ground and adopted? (This is really just curiosity on my part and my $0.02 from the outside. You may have already had a lot of discussions about that, or even already tried talking to compositor folk and not gotten traction. Seems worth some explicit consideration.)
> This proposal turns the overall system of policy enforcement from what > is currently a nice single centralized component (the compositor) with > global observation, total visibility, total control, and nice > non-distributed-system guarantees on message ordering, delivery, etc. > into a decentralized and distributed system without those guarantees, > subject to all manner of race conditions, etc. This makes it much > harder to guarantee that intended policies are indeed enforced in the manner expected. How would the new system be distributed? My understanding is that it would just be a single process alongside the compositor. > I actually think the scope is fairly limited? Agreed, and I don't see a way to implement Spectrum without something like this. At a bare minimum we'd need a proxy for window decorations, unless we feel comfortable forking an existing window compositor. I'm on mobile so I can't quote easily (I also have trouble reading hard-wrapped emails here), but we do have to place trust somewhere, and writing our own compositor would be too difficult, so it makes sense to have a simple parse/encode system if only to prevent special extensions that we don't recognize. - Aaron On Sat, May 22, 2021, at 10:22 AM, Alyssa Ross wrote: > Michael Raskin <7c6f434c@mail.ru> writes: > > >>>>One of the benefits that Wayland is supposed to have over X11 is > >>>>security. A Wayland application isn't supposed to be able to record the > >>>>screen without user permission, for example. But in most compositors, > >>>>it can, with no restrictions. Existing Wayland compositors are > >>> > >>> … and theoretically, an X server could feed empty capture to a client it > >>> does not like … > >>> > >>> (of course with literal decades of actual backwards compatibility, X11 > >>> protocol has accumulated enough extensions that assigning permissions > >>> to all of them might be somewhat painful) > >> > >>Considering that the world has convered on a single X server > >>implementation, and it's apparently pretty horrible to maintain, I'm not > >>sure I feel very positive about the idea of a custom one! > > > > Considering how much works fine in Xvnc which kind of lacks half the > > extensions, and considering that all the bad hardware stuff now needs to > > be handled in each compositor, it is unclear how much worse it would > > actually be. > > > > So OpenGL-first design sounds like the real reason for all the mess. > > What do you mean by bad hardware stuff? Most hardware should be handled > by KMS and libinput, shouldn't it? > > >>> … unsurprisingly, as these are typically WM teams who are now deprived > >>> of what Xorg server did for everyone. > >> > >>Only the ones that don't use wlroots, I think. wlroots has its own > >>problems, but in large part those are the problems we're trying to > >>mitigate here. > > > > Yes, but it seems memory-unsafe even by the «careful C» standards, and > > it seems to crash noticeably often with reasonably well-behaved clients, > > so just a protocol compliance filter would not be enough. > > My experience with wlroots has been that when it crashes, it's because > of quirks with monitor state and stuff, not anything from the Wayland > protocol. Have you had a different experience? > > Another thing the proxy would be really cool for that I didn't mention > before is debugging. Potentially you could even record and replay > Wayland messages, which would help with any crash that was from Wayland. > > >>>>To solve these problems, I propose a proxy program that sits between > >>>>Wayland clients and the compositor, in the same privelege domain as the > >>>>compositor. The proxy would decode and re-encode every Wayland request > >>>>(client->compositor message), and would discard any request it didn't > >>>>understand. This would mitigate the problem of a large, privileged > >>>>program written in a memory-unsafe language being exposed to untrusted > >>> > >>> Presumably, also validating that the shared memory buffers passed around > >>> have the same size and protection as promised? > >> > >>Aren't shared memory buffers usually handed out by the compositor, to > >>the client? IIRC this was the reason virtio wayland can work when it > >>only supports shared memory that was allocated by the host. > > > > Looks like for wl_shm the client creates and shm object and sends FD to > > the server, then both sides mmap from there. Given that one could cook > > an FD with approximately arbitrary combination of properties, I guess > > some care should be taken about this, too. > > Hmm, yes, looks like you're right. I wonder how virtio wayland can work > the way it does, then... But yes, that is something we could validate. > > >>>>no support in the Wayland protocol. It could even be used to modify > >>>>surfaces, to implement things like Qubes-style unspoofable coloured > >>>>window borders. > >>> > >>> I am tempted to ask how close it will be to providing a socket for WM > >>> and window decorator implementation (with some suitably limited > >>> compositor as the backend behind the proxy). > >>> > >>> (So basically, defining a scope will be hard, and defining a scope in > >>> a usefully extensible way might be even harder) > >> > >>I don't understand this point. Can you rephrase / expand? > > > > Well, you start describing a proxy that is basically «only valid > > messages pass» and that's it, then add that it could also implement some > > more functionality. Then you say plugins for policy-heavy functionality. > > (I guess at that point the natural next step is a socket so that > > safety-handling and user logic could be in different processes) > > > > This sounds like a recipe for scope creep, although it might also be > > good as a single well-vetted Thing being safety-critical and a lot of > > policy decisions being pushed to restartable and > > functionality-restricted separate processes is actually better than the > > current recommended approach to Wayland compositors. > > > > If you want to try pushing the project to freedesktop, I suspect it is > > a good idea to define how much scope you want to include into the pitch. > > Although apparently they don't have any objections to hosting software > > with heavy scope creep, when reaching for a high profile, it is a good > > idea to set internal expectations before having to manage external ones. > > I actually think the scope is fairly limited? > > 1. Receive request > 2. Validate request > 3. Asynchronously run request through plugins (you're right that > external processes are probably a better idea, although I worry about > complexity and performance). > 4. Forward request to compositor > > Attachments: > * signature.asc
>> Considering how much works fine in Xvnc which kind of lacks half the >> extensions, and considering that all the bad hardware stuff now needs to >> be handled in each compositor, it is unclear how much worse it would >> actually be. >> >> So OpenGL-first design sounds like the real reason for all the mess. > >What do you mean by bad hardware stuff? Most hardware should be handled >by KMS and libinput, shouldn't it? I think there is somehow still some buffer-management code which is not fully indifferent to the GPU used, and apparently that goes beyond proprietary drivers. Has it been completely cleaned up for nouveau? >>>> … unsurprisingly, as these are typically WM teams who are now deprived >>>> of what Xorg server did for everyone. >>> >>>Only the ones that don't use wlroots, I think. wlroots has its own >>>problems, but in large part those are the problems we're trying to >>>mitigate here. >> >> Yes, but it seems memory-unsafe even by the «careful C» standards, and >> it seems to crash noticeably often with reasonably well-behaved clients, >> so just a protocol compliance filter would not be enough. > >My experience with wlroots has been that when it crashes, it's because >of quirks with monitor state and stuff, not anything from the Wayland >protocol. Have you had a different experience? I still avoid Wayland because it would be too much of a drop in features compared to my X11 setup. I have just checked segmentation fault bugs in the tracker… Of course you do not get many crashes due to Wayland protocol quirks, if only because your applications are not going out of their way to crash wl_roots. >Another thing the proxy would be really cool for that I didn't mention >before is debugging. Potentially you could even record and replay >Wayland messages, which would help with any crash that was from Wayland. Indeed. Then if we have a good proxy codebase, yet another proxy might even help some («good, clean») applications shrug off a compositor crash. >>>Aren't shared memory buffers usually handed out by the compositor, to >>>the client? IIRC this was the reason virtio wayland can work when it >>>only supports shared memory that was allocated by the host. >> >> Looks like for wl_shm the client creates and shm object and sends FD to >> the server, then both sides mmap from there. Given that one could cook >> an FD with approximately arbitrary combination of properties, I guess >> some care should be taken about this, too. > >Hmm, yes, looks like you're right. I wonder how virtio wayland can work >the way it does, then... But yes, that is something we could validate. Wait, I remember you explaining a pretty complicated dance to make some buffer host-allocated from the VM point of view but client-managed from the Wayland point of view. Am I misremembering? >>>>>no support in the Wayland protocol. It could even be used to modify >>>>>surfaces, to implement things like Qubes-style unspoofable coloured >>>>>window borders. >>>> >>>> I am tempted to ask how close it will be to providing a socket for WM >>>> and window decorator implementation (with some suitably limited >>>> compositor as the backend behind the proxy). >>>> >>>> (So basically, defining a scope will be hard, and defining a scope in >>>> a usefully extensible way might be even harder) >>> >>>I don't understand this point. Can you rephrase / expand? >> >> Well, you start describing a proxy that is basically «only valid >> messages pass» and that's it, then add that it could also implement some >> more functionality. Then you say plugins for policy-heavy functionality. >> (I guess at that point the natural next step is a socket so that >> safety-handling and user logic could be in different processes) >> >> This sounds like a recipe for scope creep, although it might also be >> good as a single well-vetted Thing being safety-critical and a lot of >> policy decisions being pushed to restartable and >> functionality-restricted separate processes is actually better than the >> current recommended approach to Wayland compositors. >> >> If you want to try pushing the project to freedesktop, I suspect it is >> a good idea to define how much scope you want to include into the pitch. >> Although apparently they don't have any objections to hosting software >> with heavy scope creep, when reaching for a high profile, it is a good >> idea to set internal expectations before having to manage external ones. > >I actually think the scope is fairly limited? > >1. Receive request >2. Validate request This alone could become interesting really quickly. >3. Asynchronously run request through plugins (you're right that > external processes are probably a better idea, although I worry about > complexity and performance). And then it all depends on what we want the plugins and external processes to do, as this functionality needs interfaces. (Re: performance — I guess «simple but high-volume» and «complicated but with small request and response» need to be separated, policy vs. mechanism, and all that) >4. Forward request to compositor Re-serialise more than forward, I really hope.
>On May 22, 2021, at 8:05 AM, Alyssa Ross <hi@alyssa.is> wrote:
>>
>> One of the benefits that Wayland is supposed to have over X11 is
>> security. A Wayland application isn't supposed to be able to record the
>> screen without user permission, for example. But in most compositors,
>> it can, with no restrictions.
><snip>
>>
>> To solve these problems, I propose a proxy program that sits between
>> Wayland clients and the compositor, in the same privelege domain as the
>> compositor.
><snip>
>> If we can do that, it might be sensible for
>> it to live at freedesktop.org? I'm not sure how that works.
>
>I am curious, if you have time, to hear more on why the approach of a proxy vs picking a compositor and implementing security there.
>
>If the problem is that the Wayland community so far has not considered security a priority, it seems that a security proxy may suffer from those same forces. Basically, will it be easier to attract developers or gain widespread adoption of a proxy as opposed to getting buy-in to do security directly in a compositor? You mention writing in a memory safe language and having a compositor neutral solution as technical advantages.
>
>Do you think a proxy is a good choice primarily because it can achieve a better technical result, or is the choice of a new component more a matter of difficulty getting community buy-in from a popular compositor and doing security there? How would you weigh the upsides of a new project against the difficulties of getting a new thing off the ground and adopted?
>
>(This is really just curiosity on my part and my $0.02 from the outside. You may have already had a lot of discussions about that, or even already tried talking to compositor folk and not gotten traction. Seems worth some explicit consideration.)
Most programs do zero things right, especially popular ones. With an effort, you could get one thing right. Two things (like handling graphics hot-reconfiguration and complicated policy filtering) done right in the same program require either heroical effort, or huge resources, or something like that.
Of from less jaded and more technical point of view, hijacking a compositor means that you need to make sure changes forced from driver side do not break security side and people could forget. A «I am just a client» proxy could have that nice property that breaking compatibility with it usually comes together with breaking compatibility with Firefox (on server side) or Plasma (on client side); and breaking safety properties it expects also increases the risk of crashes in the mainstream usage, too.
[-- Attachment #1: Type: text/plain, Size: 13290 bytes --] On Sat, May 22, 2021 at 01:13:38PM -0400, Jean-Philippe Ouellet wrote: > Glad to see this taking form! > > First of all, in case the points of technical feedback further down > might be interpreted to suggest otherwise, I *am* in favor of > exploration of this idea. I do believe it to be worthwhile to > implement, just is perhaps not the ultimate nor sole solution one > might want, at least in the form currently conceived. > > On Sat, May 22, 2021 at 9:06 AM Alyssa Ross <hi@alyssa.is> wrote: > > [...] I propose a proxy program that sits between > > Wayland clients and the compositor, in the same privelege domain as the > > compositor. The proxy would decode and re-encode every Wayland request > > (client->compositor message), and would discard any request it didn't > > understand. > > > > This would mitigate the problem of a large, privileged > > program written in a memory-unsafe language being exposed to untrusted > > inputs. > > As attractive as this approach sounds at first, I do not believe it > mitigates the problem in a manner which I would be comfortable placing > a large degree of trust in. > > I see two fundamental weaknesses: > > 1. Opening pandora's box of distributed systems. > > This proposal turns the overall system of policy enforcement from what > is currently a nice single centralized component (the compositor) with > global observation, total visibility, total control, and nice > non-distributed-system guarantees on message ordering, delivery, etc. > into a decentralized and distributed system without those guarantees, > subject to all manner of race conditions, etc. This makes it much > harder to guarantee that intended policies are indeed enforced in the > manner expected. > > There are endless examples from the distributed systems literature of > how this can go wrong, but as a very relevant example from the same > application domain, I would like to bring attention to subtle race > conditions in the implementation of the Qubes clipboard-handling logic > which manifested in QSB #013 [1]. Refer to the linked advisory for > background and details. To summarize with additional context and > considerations as I see them applying here: The way the Qubes GUI > stack works is somewhat similar to what you describe, with two parts, > a "vmside" "agent" [2] (in the untrusted VM whose contents are to be > displayed) and an "xside" "daemon" [3] (in the semi-trusted GUI VM > running the window manager, traditionally dom0). The "vmside" > (gui-agent) implements a minimal X window manager, analogous to > sommelier [4], whereas the "xside" (so-called gui-daemon) implements > what is effectively analogous to the proxy you propose here living on > the compositor-side of the VM trust boundary. For each VM, a separate > vm-side gui-agent and x-side gui-daemon pair is created. A (possibly > outdated) version of the overall scheme and protocol is documented > here [5]. Things like safe clipboard handling is implemented via these > X11 proxies. Where things get a little... difficult... is the fact > that we do not have a single trusted arbiter with global visibility > and synchronous ordering guarantees, but rather a bunch of these GUI > daemon processes (which I posit should be viewed as a distributed > system) which must somehow successfully coordinate and synchronize to > enforce security guarantees. This is hard, and in the case of QSB > #013, corner cases were overlooked wherein an adversary could cause > states to be reached which violate user expectations in unsafe ways. > In hindsight, the fact that something was overlooked should not be a > surprising result, given the difficulty of the nature of the problem, > and that no formal methods had been applied (at the time of the > advisory) to attempt to verify the implementation. > > [1]: https://github.com/QubesOS/qubes-secpack/blob/master/QSBs/qsb-013-2015.txt > [2]: https://github.com/QubesOS/qubes-gui-agent-linux/blob/master/gui-agent/vmside.c > [3]: https://github.com/QubesOS/qubes-gui-daemon/blob/master/gui-daemon/xside.c > [4]: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/README.md > > I see a couple ways to address weakness 1: > > a) exhaustively enumerate what interactions between these proxies > carry security concerns, formalize the security properties of the > protocols involved, and attempt to somehow verify that the > implementations correctly follow the protocols (I propose this mostly > as a straw man to illustrate the comparative apparent simplicity of > the following approach, though, again, I do not wish to discourage > anyone who might wish to try anyway -- for example I very much > appreciate [5], but we'd still need a bunch more work like it to have > meaningful coverage of the higher-level protocols like the clipboard > handling (which exists on top of the gui protocol, which might also > need a parallel model of the window manager for certain properties), > and then once completed initially, it still requires uncommon skill > sets to maintain in sync as the software it models evolves) > > [5]: https://roscidus.com/blog/blog/2019/01/01/using-tla-plus-to-understand-xen-vchan/ > > b) avoid the class of distributed systems issues altogether, by > avoiding ay need to coordinate among proxies, by instead embedding the > relevant decision logic at a single point which already has total > ordering and global visibility, such as within the logical boundary of > the compositor (note "logical boundary" -- which needn't necessarily > mean "within the a single memory-unsafe monolith with guest-reachable > attack surface" -- there's likely a middle ground of compositor > disaggregation and privilege separation which does not also turn > authorization logic into a distributed system) > > This point alone is not an argument against having a safer parser in > front, but rather suggests that a proxy seems unlikely to be the best > place for authorization logic, especially when said proxies need to > coordinate, and rather that such logic seems better suited for being > embedded within the compositor. Did you interpret my proposal as being about one proxy per client? Just to clarify: I imagine a single proxy, running in front of a single Wayland compositor, that all clients connect to (via Sommelier). I envisage the proxy having a complete view of the world, and presenting that view to the compositor (your next point notwithstanding). One change that would probably make sense to reinforce this is limiting an instance of the proxy to using a single plugin, so that that plugin doesn't have to worry about state that may be tracked anywhere else (except for the compositor). This would still allow users/distributors to provide custom policies without having to patch the proxy or compositor, but would mitigate the issues you've described here. > 2. State-synchronization issues, protocol interpretation differences, > "the WAF problem". > > Introducing a proxy introduces another implementation of a parser, > data model, and any applicable retained state. All of this has various > opportunities to become desynchronized at various layers. > > I suspect matters may be made more complicated by messages potentially > being only contextually legal. It might not be a sufficient condition > for safety for a message to be merely structurally well-formed, but > rather, it may only be legal depending on some state retained by the > compositor which now must be somehow duplicated in the proxy. (For > example, does this target have focus? Are we in some particular mode?) > > The parser and the compositor could become desynchronized at the > parsing layer, for example (e.g. malformed messages interpreted > differently, possibly leading to different interpretations of framing > boundaries, and each interpreting subsequent data in the stream as > different messages entirely). > > They might become desynchronized by having different data models -- a > field being added, or possible values changing. > > For decisions which require acting on state, the proxy must somehow > shadow the state of the compositor, and this state must be constructed > and mutated in the same way, lest they potentially become > desynchronized. > > If any desynchronization happens at any layer, a message which is > determined by the proxy to be safe, may be interpreted by the proxy to > mean something which is safe, yet have a different and potentially > unsafe effect on the compositor. > > This is rather similar to the fundamental problems plaguing all > so-called "web application firewalls", which have the perilous goal of > trying to determine what input might be safe / unsafe to some "too big > to be safe" giant ball of complexity behind it. To really know for > sure with accuracy would mean a crazy amount of state replication or > introspection into how the machine behind the proxy would interpret > the data, to the point that the complexity of the analysis in front > would exceed the thing it's trying to protect, at which point... > what's the point. The best one can hope for is stopping the > least-common-denominator attacks using heuristics, which is surely not > what one should aim for in this case. > > From a theoretical perspective, this has analogues in something like > the halting problem. From a langsec perspective, Wayland is a weird > machine, and a guest's messages to it can be thought of as the "input > program". The proxy's objective is then to compute a partial function > over the wayland input program to resolve "is this sequence of wayland > protocol messages safe?" (rephrased: "does this wayland-lang program > terminate?"), which may be undecidable, (though perhaps not in the > specific case) depending on the power (complexity class, effective > grammar) of the "safe" subset of the instructions exposed by the > wayland weird-machine. > > > Additionally, the proxy would support a plugin interface, > > through which the user of the proxy (or their distributor) could > > configure custom behaviour. This could be used to prompt the user for > > confirmation before allowing a screen capture request, or even to > > implement a similar thing for e.g. clipboard access, for which there is > > no support in the Wayland protocol. It could even be used to modify > > surfaces, to implement things like Qubes-style unspoofable coloured > > window borders. > > > > This approach would allow permissions systems and other custom Wayland > > behaviour to be implemented in a compositor-independent manner. > > Distributions which suppor tseveral compositors could implement > > customisations in a single place, and users of compositors which lack > > security features and the assurances memory-safety can provide against > > untrusted input would gain access to those things. > > For the reasons above, I wonder if implementing this as a library > providing hardened (memory-safe, etc.) wayland protocol parsers and > additional hooks for authorization (ebpf-like?), as a library to be > used by compositors (possibly also leveraging some internal privilege > separation for the more complex components -- something like [6] comes > to mind) might be a preferable approach. > > [6]: https://github.com/google/sandboxed-api I think pushing for a solution that has to be adopted by compositors isn't really something that Spectrum is in a place to do. The proxy idea was exciting because it required no special support from compositors, and that meant that we wouldn't have to pick a compositor, which I'm not willing to do at this point in Spectrum's life, because of how quickly the Wayland compositor ecosystem is moving. If the proxy won't get us the required level of security (and your WAF parallel here is very convincing) then the right thing to do is probably just wait until things have progressed a bit more, pick a compositor already written in a memory-safe language (anvil[1]?) and implement security features we want from there. An additional downside here is that I worry about the reaction from users when they can't choose to bring the compositor they already like with them to Spectrum. [1]: https://github.com/Smithay/smithay#anvil > > I'd like to hear feedback here, but I think early in the life of this > > idea we should also reach out to the broader Wayland community. I think > > there's a lot of potential for this idea beyond Spectrum, and it would > > be great if it could be something developed with input from a big > > breadth of Wayland users. > > Completely agree. > > Others in the broader secure desktops community who may be interested > that come to mind: > 1. Qubes OS (wayland has been on the to-do list for far too long) > 2. OpenXT (similar boat to qubes) > 3. Flatpak (separating out permissions besides "whatever your > compositor lets you do" seems a logical evolution of their permissions > model) > 4. Subgraph OS (less dead than it once seemed?) > 5. Chromium OS (possibly? idk) > 6. Genode (having a general interest in secure GUI architectures -- > [7] and much since) > 7. wlroots folks > 8. mutter/gnome folks (possibly? if the embedding-into-compositor > route is taken) > > [7]: https://genode-labs.com/publications/nitpicker-secure-gui-2005.pdf > > Cheers, > Jean-Philippe [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]