Wayland versus X - a quick summary

Categories: Linux

A new display server has been developed to compete with the traditional X Window System - Wayland 1. There appears to be a lot of confusion around about what Wayland does, and how it compares to X.

Overview

A display server acts as an intermediary between applications and physical input or output devices, allowing multiple applications to share use of common mouse/keyboard and graphics cards. Typically, sharing of input devices means that ‘the application with focus’ receives input events, and typically sharing of graphics is done by allocating to each application one or more ‘windows’ (graphics areas which are subsets of the actual screen).

Wayland and X both provide their own API that applications can use to receive input events and tell the server what output to display. However most desktop applications don’t interact with a display server directly, instead using widgets from some toolkit such as GTK or QT; the implementation of these widgets (ie toolkit code) interacts with the display server but apps just configure the widgets to generate the desired output, and receive input via ‘events’ from the widgets. The applications therefore interact with the toolkit and not the display server, and don’t actually care which display server is in use. Unfortunately there are some less common operations that toolkits don’t cover, requiring the app to interact directly with the display server; examples are screensavers and screenshot-capturing apps - ie the kind of apps provided as part of a Desktop Environment (DE) such as Gnome or KDE.

Wayland really is a system that focuses exclusively on device handling, while X does that and a lot more. It is therefore reasonable to think of Wayland as an extraction (factoring out) of the lowest layers of X. In fact, the Wayland project also provides an application named XWayland which is basically the “high level” part of X that remains after the low-level stuff has been factored out - and that can then run on top of Wayland to provide exactly the same functionality that the full X implementation previously did.

So what is the point of splitting X into two and then rewriting the lower layer? There are many benefits including the following:

  • The rewrite really is significant, ie Wayland’s handling of input and output devices is much improved over the original X implementation.
  • Systems (particularly embedded ones) can choose to just run Wayland without XWayland for reduced code-size and memory usage.
  • Graphics libraries such as QT and GTK can optionally connect directly to Wayland rather than use the X11 protocol; this can provide better performance, smoother window dragging, etc. The disadvantage is that apps running against a toolkit configured in this mode cannot use the traditional X remoting protocols to send the output to some other server.
  • Handling devices requires elevated system privileges, so at least parts of X must be run with these privileges, and it was very regular for bugs to be uncovered in X which allowed supposedly non-privileged X client applications to misuse X to get access to these higher privileges. Wayland is much smaller, so security problems are expected to be fewer, and XWayland runs as a layer on top of Wayland rather than integrated into it so privilege escalation bugs are much less likely to occur.

More Background

An X Server (the server component of the overall X Window System) is an application that runs on a computer with a graphics card. A graphical application can then use X client libraries to send drawing commands over the network to an X server, eg “draw a line”, “draw a box”, “display this bitmap”, “display this string in font zzz”. Note that the concept of “client” and “server” are the reverse of perhaps more familiar examples such as database client/database server - the X server runs on your desktop, the client can run somewhere in a datacenter. Think about apps processing major datasets and then generating some output…makes sense then for the graphical “client” (data analyser) to be on the larger computer. And as always, multiple clients (apps) can connect to a single server (the system with a screen attached).

The problem with X is that the whole design no longer matches what client apps want to do - eg interact with 3d-capable GPUs, use exactly the fonts they want (rather than asking the X server to use the font with a specific name, and hoping the server has that font available). And even when running X client and server on the same host, communication still goes over a network socket; some optimisations are made but this nevertheless adds latency. And the set of commands that X supports is now so large that the server is huge - making it buggy, full of security holes, and difficult to maintain.

To resolve some of the limitations of X with regards to hardware graphics acceleration (using 3d-capable GPUs etc), an X extension was created which allows a client application to ask X to allocate a “window buffer”, directly tell the GPU to render graphics into this buffer, and then ask X to display that buffer’s content on the screen. While very popular, this extension does not work well across networks - it really requires the X client application to be running on the same computer that the screen is attached to - which bypasses one of the greatest advantages X has over simpler protocols.

Wayland defines only a very simple API for clients : it accepts bitmaps only, no “draw a line” stuff, ie very similar to the way X can perform local rendering. And Wayland provides no network support - clients are local only. Client apps can then code directly against the Wayland APIs (ie pass bitmaps, often generated by interacting directly with a GPU to render 3d graphics into a buffer). Fast, simple. Or clients can code against the original X API, and then communicate in the traditional way with the XWayland application (or equivalent) which which executes the commands and passes the resulting buffer to the local Wayland server.

In practice of course, most apps will code to the GTK or QT apis, and it is GTK/QT which is responsible for interacting with Wayland or X.

Note that there is significant overhead imposed by the X client->network->server separation that many people never need. Wayland turns this around - it assumes client/server are on same host, and supports over-the-network communication as an extra layer on top by having some “proxy client” handle network traffic and then act as a normal local client to the Wayland server. XWayland is just one possible ‘proxy’ application.

Remoting with Wayland

Nowadays most desktop users only run apps locally on the desktop, ie the client/server are on the same machine. But I’m old enough to remember the “thin client” wave, where the latest coolest thing for businesses was to have a low-powered desktop system that was just screen/keyboard/operating-system/X11, and all the apps were run on servers. The networking ability of X made this possible. And even now, sysadmins often appreciate the ability to run some admin-type apps remotely.

The easiest way to run client applications on a different host than the screen is attached to is simply to use the traditional X protocol on the client, and an X implementation running on top of Wayland on the server (eg XWayland). Or more likely, just ensure that an app that uses GTK or QT is running with the GTK or QT library configured to make calls to the X client libraries - and thus generate X-format output. This mode will be supported for a long time.

The alternative is to run an app against a GTK or QT library which is configured to use native Wayland calls as its back end. And then provide a “proxying” wayland library which provides the standard Wayland API (ie appears to the app to be a normal Wayland server). However this proxy compresses the bitmaps it receives from the application and sends them over the network to a corresponding proxy on the target host which uncompresses the bitmaps and passes them to a real Wayland display server for output. Events generated by input devices are passed back in the equivalent manner. The elegance of this solution is that (a) the client app uses the same API, whether talking to a local Wayland display server or a local proxy, and (b) there are no special hooks in the Wayland display server implementation needed to implement this proxying; the network protocol has been cleanly layered on top. Work on the first implementation of this approach is currently in progress.

There is some debate about how efficient this “wayland proxying” approach is going to be. Some people say that transferring images is inherently inefficient compared to the X approach of transferring drawing commands. Others say that it will actually be more efficient. What is certain is that existing tools such as Citrix and RDP are very popular in the Windows world as ways to achieve exactly what X builds in (remote graphical applications) - and use this approach of sending images over the network rather than graphics commands.

Note that having a wayland proxy transfer images is very much a per-window remoting approach, rather than a full “remote desktop” solution.

References

Footnotes

  1. Actually, Wayland is just an abstract API that can have multiple implementations. The term ‘Wayland’ is used above to mean either the API or some compatible implementation depending on context. For the purposes of this article, the distinction isn’t particularly important.