Sane Session-Switching

In a previous article I talked about the history of VT switching. Created as a simple way to switch between text-mode sessions it has grown into a fragile API to protect one testosterone monster (also called XServer) from another. XServers used to poke in PCI bars, modified MMIO registers and messed around with DMA controllers. If they hauled out the big guns, it was almost absurd to believe a simple signal-flinging VT could ever successfully negotiate. Fortunately, today’s XServer is a repentant sinner. With common desktop hardware, all direct I/O is done in the kernel (thanks KMS!) and chances of screwing up your GPUs are rather minimal. This allows us to finally implement proper device-handover during session-switches.

If we look at sessions at a whole, the XServer isn’t special at all. A lot of session-daemons may run today that provide some service to the session as a whole. This includes pulseaudio, dbus, systemd –user, colord, polkit, ssh-keychain, and a lot more. All these daemons don’t need any special synchronization during session-switch. So why does the XServer require it?

Any graphics-server like the XServer is responsible of providing access to input and graphics devices to a session. When a session is activated, they need to re-initialize the devices. Before a session is deactivated, they need to cleanup the devices so the to-be-activated session can access them. The reason they need to do this is missing infrastructure to revoke their access. If a session would not cleanup graphics devices, the kernel would prevent any new session from accessing the graphics device. For input devices it is even worse: If a session doesn’t close the devices during deactivation, it would continue reading input events while the new session is active. So while typing in your password, the background session might send these key-strokes to your IRC client (which is exactly what XMir did). What we need is a kernel feature to forcibly revoke access to a graphics or input device. Unfortunately, it is not as easy at it sounds. We need to find some-one who is privileged and trusted enough to do this. You don’t want your background session to revoke your foreground session’s graphics access, do you? This is were systemd-logind enters the stage.

systemd-logind is already managing sessions on a system. It keeps track on which session is active and sets ACLs in /dev to give the foreground session access to device nodes. To implement device-handover, we extend the existing logind-API by a new function: RequestDevice(deviceNode). A graphics-server can pass a file-system path for a device-node in /dev to systemd-logind, which checks permissions, opens the node and returns a file-descriptor to the caller. But systemd-logind retains a copy of the file-descriptor. This allows logind to disable it as long as the session is inactive. During a session-switch, logind can now disable all devices of the old session, re-enable the devices of the new session and notify both of the session-switch. We now have a clean handover from one session to the other. With this technology in place, we can start looking at real scenarios.

1) Session-management with VTs

Session management using VTs for foreground control and logind for device management

Session management with VTs and logind

Based on the graphs for VT-switching, I drew a new one considering logind. VTs are still used to switch between sessions, but sessions no longer open hardware devices directly. Instead, they ask logind as described above. The big advantage is that VT-switches are no longer fragile. If a VT is active, it can be sure that it has exclusive hardware-access. And if a session is dead-locked, we can force a VT-switch and revoke their device-access. This allows to recover from situations where your XServer hangs without SSH’ing from a remote machine or using SysRq.

2) Session-management without VTs

Session management based solely on logind

Pure logind session management

While sane VT-switching is a nice feature, the biggest win is that we can implement proper multi-session support for seats without VTs. While previously only a single session could run on such seats, with logind device-management, we can now support session-switching on any seat.

Instead of using VTs to notify sessions when they are activated or deactivated, we use the logind-dbus-API. A graphics-server can now request input and graphics devices via the logind RequestDevice API and use it while active. Once a session-switch occurs, logind will disable the device file-descriptors and switch sessions. A dbus signal is sent asynchronously to the old and new session. The old session can stop rendering while inactive to save power.

3) Asynchonous events and backwards-compatibility

One thing changes almost unnoticed when using RequestDevice. An active graphics-server might be almost about to display an image on screen while a session-switch occurs. logind revokes access to graphics devices and sends an asynchronous event that the session is now inactive. However, the graphics-server might not have received this event, yet. Instead, it tries to invoke a system-call to update the screen. But this will fail with EACCES or EPERM as it doesn’t have access to it, anymore. Currently, for most graphics servers this is a fatal error. Instead of handling EACCES and interpreting it as “this device is now paused”, they don’t care for the error code and abort. We could fix all the graphics-servers, but to simplify the transition, we introduced negotiated session-switches.

Whenever logind is asked to perform a session-switch, it first sends PauseDevice signals for every open device to the foreground graphics-server. This must respond with a PauseDeviceComplete call to logind for each device. Once all devices are paused, the session-switch is performed. If the foreground session does not respond in a timely manner, logind will forcibly revoke device access and then perform the session-switch, anyway.

Note that negotiated session-switches are only meant for compatibility. Any graphics-server is highly encouraged to handle EACCES just fine!

All my local tests ran fine so far, but all this is still under development. systemd patches can be found at github (frequently rebased!). Most tests I do rely on an experimental novt library, also available at github (I will push it during next week; this is only for testing!). Feedback is welcome! The RFC can be found on systemd-devel. Now I need a day off..

Happy Switching!

27 thoughts on “Sane Session-Switching

  1. Aigars Mahinovs (@aigarius)

    At the first glance it looks fragile that there is a different layering for seat0 versus all other seats. Different enough that it is bound to create errors that will go unnoticed for a long time just because almost everyone just uses seat0.
    I would suggest breaking that.
    If I were you I would promote the default configuration that would have seat0 reserved for situations when the systemd is failed or not started at all. If systemd is started and working, then you can not get to seat0, all devices that were seat0 get automatically reassigned to seat1 and proceed as normal from there.
    With above in mind I would recommended to recommend the defaults of Ctrl+Alt+F1 as the login shell (greeter), Ctrl+Alt+F2 as the first session, Ctrl+Alt+F3 as second session and so on.
    Optionally there might be a consideration of seat0 access from any other seat via something like Ctrl+Alt+F9-F12, but this would be funky if, for example, seat4 user connects to seat0 tty1 and then seat3 user tries to do the same.
    But on the other hand, that could be an opportunity to think about seat-sharing: it would be an interesting function to allow two or more seats to share a session. This also brings up the question of remote sessions and sharing between a local and a remote session.

    Reply
    1. David Herrmann Post author

      First of all, if you disable VTs, seat0 will be the same as all others. That’s the ultimate goal. As long as there are VTs, I *really* think we should be backwards-compatible. We want legacy applications like the XServer to work alongside logind-sessions. We cannot forcibly disable seat0 if booted via systemd. We currently _need_ VTs!
      Also seats are really bound to physical interfaces. We should never assign two seats to the same interface. We cannot *switch* between them. All seats are always active. You also cannot “share” seats. Really. Same for sessions. What you _can_ do is sharing objects or devices across session or seat boundaries. And this is what I think should be worked on (if someone needs that).

      Apart from that, I like your default-rules for ctrl+alt+Fx. Currently, each active session is responsible of capturing these. Without VTs, we don’t have the standard Fx-VTx 1-to-1 mapping. So an application needs to make sense out of these. I haven’t looked into it, yet, as it’s a rather trivial problem. But I’d propose just forwarding the “Fx”-number to logind, which then maps it accordingly. We could then implement any default-mappings and users could adjust them to their needs.

      Reply
    1. David Herrmann Post author

      Regarding that bug: Why would you assign two sessions to the same VT? That doesn’t make sense. Either disable VTs or correctly assign them to different VTs. logind *must* follow the VT-rules on seat0 for backwards-compatibility reasons.

      Reply
      1. Christopher James Halse Rogers

        Because both sessions are on a VT, namely VT 7? They’re both running under the system-compositor, which is on VT 7.

        Disabling VTs is what we’d like to do – eventually – but does somewhat burn our debugging bridges :)

      2. David Herrmann Post author

        Yes, I understand what you are trying to achieve. It’s just that this cannot work with VTs, that’s why I asked why you would do that. But rethinking it, maybe we can teach logind to allow switching between sessions on the same VT seamlessly via the logind-API. However, these sessions would be bound to the logind-dbus-API and cannot use VTs. They would get pretty confused when waiting for VT signals.

        Downside of all this is: How does the system-compositor get notified? It doesn’t run in a session, so strictly speaking it shouldn’t access VTs. Ehh, long story short: system-compositors+VTs is broken.. If you run a system-compositor, it has to be *always* active. No debug hooks, no VT switching, it must be the only process accessing graphics and input devices.

  2. Christopher James Halse Rogers

    Obviously the sessions under the system-compositor wouldn’t touch the VTs. Indeed, the system-compositor itself handles¹ a significant fraction of what logind sessions need to track. Sessions under the system-compositor get their input and graphics stack from the system-compositor, so we really only need the ACL and user bits of session switching.

    It’s basically a two-layer session system – there’s the raw VT sessions + the system compositor on a VT and the sessions under the system-compositor. You need to do different things for different layers.

    ¹: In the correct case – at the moment we’re in the marvellous no-man’s land of semi-VTs and unimplemented cooperative resource relinquishment.

    Reply
    1. David Herrmann Post author

      (sorry, the wordpress interface only allows 2-layered replies..)

      Yes, in fact the two-layer session system is what breaks here. You want VTs as the master layer (with a linux-console, system-compositor or legacy system running on each) and inside of a compositor you want another additional session-layout. logind wasn’t designed for that. So either you implement logind itself in the compositor or you bind logind session-management to the compositor and leave other VTs out in the cold (they’re only for debugging..).
      At least that’s what I think of it. Maybe you come up with a fancy workaround?

      Reply
  3. Drago

    Hi,
    How these ideas are accepted in the community. I mean, can we expect sometime in the future this will be the way sessions/seats are managed in the distributions. Along with convincing Linus to kill VTs :)

    Reply
    1. David Herrmann Post author

      This idea is driven by wayland development. We basically already have something similar with the weston-launch, mutter-launch, .. helpers (but unreliable!). Furthermore, systemd maintainers support it, too. So yeah, it is quite likely that we will see this soon.

      Reply
  4. greatemerald1

    This whole transition is very reminiscent of the whole Real mode/Protected mode mess way back then. The first idea when creating infrastructure most often seems to be “Let’s have the applications handle that, I’m sure the authors are smart enough to make sure everything works correctly”. Cue horrible software. And only then it’s “On the other hand, maybe we should make sure that no matter how horrible the software is, it wouldn’t screw everything up!” :)

    Reply
    1. David Herrmann Post author

      For DRM-devices we already have DRM_IOCTL_SET_MASTER and DRM_IOCTL_DROP_MASTER. For input-evdev devices, there are pending patches for EVIOCMUTE or EVIOCREVOKE. These aren’t upstream, yet.
      Obviously, for every device-type we want to support, we have to introduce something similar.

      Reply
  5. Anduchs

    Coming back to the SET_MASTER and REVOCE-MUTE things regarding security and robustness (in terms of being robust against an XMir-behaviour)…

    From what I know, it seams that a malicious display-server could just unmute the input devices after a session switch and eavesdrop on another session…
    Or did I miss some priviledge-differentiation between logind and the session-manager ?

    Maybe this current transition would be the right point in time to actually fix this (security related) issue at the same time ?

    Some proposals:
    - Allow for EVIOCMUTE/REVOKE only to happen from the PID that create the FD ?
    - Dub the FD and drop the MUTE/REVOKE rights capabilities on the new FD (that is passed forward) ?
    - Allow for Unmute only from the PID that called for the Mute ?
    - Move the ioctl to sysfs or some other interface ?
    - Resrict MUTE/REVOKE to root / logind-capability, since the session-manager is non-root ?

    (Actually, thinking of it, is the last proposal actually already implemented ? I guess for SET_MASTER it is, but also for EVIOCMUTE ?)

    Great work anyways !

    Cheers,
    Andreas

    Reply
    1. David Herrmann Post author

      Your analysis is correct. For DRM_MASTER we already have a CAP_SYS_ADMIN restriction. We cannot modify it for historical reasons.

      For evdev, we are still working on a solution. The current idea is to provide EVIOCREVOKE(). This ioctl is called on an fd and basically disables it (actually the whole open-file context so all dup()ed fds are affected). You cannot revert this, though. So once an evdev fd is revoked, it’s almost useless (but not invalid!).

      Whenever a session becomes active again, logind opens a new fd and passes it to the session. So all we need to do is make sure logind can access /dev/input/eventX but normal sessions cannot. This is *way* more flexible than requiring CAP_SYS_ADMIN. Even though we need to run logind as root (includes CAP_SYS_ADMIN) on current systems, we still should make sure any new APIs don’t require this.

      Reply
      1. Anduchs

        Thinking back to the wiimote-drivers and stuff, I guess REVOKE would be quite a pain to use there…

        With a MUTE call, it was way more convenient for the application, since it would work seemlessly…
        Regarding the MUTE, who will do the UNMUTE and what capabilities are needed for this ?
        How about UNMUTE to just be CAP_SYS_ADMIN as well ?

        btw, could this be some new CAP ? CAP_SYS_ADMIN is used quite a lot and I possibly wouldn’t want it for my logind to begin with… maybe CAP_SESSION_MGMT or similar could make sense ?

        Also, how is REVOKED signalled ? EACCESS ?

      2. David Herrmann Post author

        No-one except logind is supposed to use MUTE or REVOKE. So if you have an application needing raw device access (eg., to access wiimotes), they would simply call RequestDevice() from logind and get a file-descriptor. This is *not* exclusive to a compositor but any process in a session can do that!
        For security reasons, a compositor _might_ lock access to RequestDevice(), but then you simply ask the compositor for a device and it forwards the descriptor to you. You will then get the Pause/Resume events from the compositor.

        But maybe I don’t understand the use-case exactly?

        MUTE will not get implemented exactly for the capability reasons. We don’t want that. It’s static and ugly. Regarding REVOKE, hmm, patches are not merged but I guess it’ll send a SYN_REVOKED event via the input stream (thanks for pointing that out, I haven’t thought of it!).

  6. Anduchs

    (in reply to #comment-576)

    The use case comes down to protecting a “good” session from a compromised session…
    I thought the MUTE/UNMUTE approach would be an option and in that case having the session “not be supposed to use UNMUTE” would not be sufficient…

    I guess, the only abuse I could still see, is a session program (e.g. malicious firefox) asking logind for an input FD and then calling REVOKE on it, which would lead to a Denial-Of-Service of the input device for the complete session until the next session-switch. (Note, malicious could also be disfunctional)

    I guess that we already have a huge increase in security and stability with this scheme…
    … as long as logind will only answer to requests from session-manager and not subsequent programs. Otherwise, any program can intercept and/or REVOKE an input device any time…

    The other part was just about user-session-driver programming convinience… Having a MUTE/UNMUTE-scheme would be less complex to the user-session-driver than the need to switch-case for SYN_REVOKED (or whatever it’ll be called) and rerequesting an FD for the input device…
    Also sounds rather time-consuming with all the context-switching…

    Cheers, Andreas

    Reply
    1. David Herrmann Post author

      If two different applications call RequestDevice() on logind, they will end up with _two_ different contexts. So if one gets revoked, the other is still valid! But logind correctly revokes all during session-switch. So your firefox example doesn’t work this way, fortunately.

      Also note that there is nothing like a master-session process. All processes in a session normally run as the same user and have thus the same capabilities. However, *if* you start a compositor, the compositor wants some guarantees that it runs exclusively on this session. So besides RequestDevice() I also introduced session-controllers with scopes. That means, the first process in a session that tries to become a graphics+input controller will get it. Only this process is allowed to call RequestDevice() on input and graphics devices. Once it’s done, it can release the control again so another compositor can start.

      We do this for security reasons. It isn’t strictly necessary, because if a user runs a process in a session, it normally trusts it. It’s other sessions you don’t trust and this is what RequestDevice protects you from. However, we also want to protect against misbehaving session-processes. That’s what the session-controllers protect you from. If your compositor has control and prints a password-entry, it can be sure that there is _no_ other session-process that currently can access input device. But really, this isn’t strictly necessary! Because if a malicious process runs in the session, it could simply kill the compositor and take over. Inside of a session, all processes are equal!

      Reply
      1. Anduchs

        OK, the different contexts rule out that issue, that is great.
        And I really like to session controller scope. I think that will be of great value for hardened UI systems…

        That’s also why I wouldn’t rule out the security aspect of the implemented feature. Once you drop some capabilities or AppAmor/Yama/Tomoyo/… processes, they may not be able to kill and impose other (same-user&-session) processes anymore, but could have revoked some device for the session if granted input-access. Since you got the session-controller protection and multiple FD-contexts, it’s of course not a problem, but worth mentioning… :-)

    2. David Herrmann Post author

      Regarding the MUTE/UNMUTE: your example doesn’t work. Imagine we limit UNMUTE to CAP_SESSION_MGMT and run the compositor _with_ this capability. This would allow the compositor to mute and unmute the FDs that it passes to user-session-drivers. Obviously something useful. However, it would also allow to call it on the FDs it gets from logind! Which is *bad* and circumvents this whole system.

      The problematic concept is stacking. With capabilities you have *always* only a 2-layer stack: logind->session
      With REVOKE (which doesn’t depend on capabilities) you can have an endless stack: logind->session->userDriver
      With REVOKE, a compositor can simple request a device twice from logind. One for himself, one for the userDriver. Now if the compositor wants to drop the device, it calls REVOKE on it for the user-driver. But on a session-switch, logind calls revoke on it on behalf of both.
      This sounds like overhead (logind, compositor *and* driver need the same fd). But note that whenever you want multi-layer access-management, you *somewhere* need this. You can hide it in the kernel, but I doubt it makes it somehow better.

      Reply
      1. Anduchs

        Actually, my thought was on granting this CAP_SESSION_MGMT only to logind and have it be the single instance to mute/unmute… However, the argument for the cases of “more than 2 layer”-stacks is of course a good one. Had not thought of this…
        So it comes down to simplicity (2 layer stack with simpler drivers) vs flexibility (3 and more layer stack with SYN_REVOKED handlers). I guess, I’d go with the more flexible one as well (since all code must be touched anyways), now that I know of the additional things this allows you to do… :-)

        Thanks for the patient explainations…

        Cheers,
        Andreas

    1. David Herrmann Post author

      Yeah, I could write a lot more. But I want to hold back with the technical details until the discussion on dri-devel takes an end. If we agreed on an implementation, I will try to write about it again.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s