Python - Asynchronous Programming with Coroutines

Categories: Programming

Overview

The goal of asynchronous programming is not to run Python code in multiple threads, but instead to make the best possible use of a single thread to run a set of semi-independent data processing tasks.

This article gives an overview of how asynchronous programming is done in Python, and is an extension to another more general article on Python.

This article concentrates more on the “how” (ie underlying principles) and “why” (what are the tradeoffs compared to other parallel-programming techniques) than on the “what” (how to write asynchronous programs). I find that understanding the underlying principles makes using something far easier.

Dave Peticolas’ article on asynchronous programming with Twisted starts with an excellent introduction to asynchronous programming in general - highly recommended.

The official documentation on the standard asyncio module provides great detail, but not much context and covers both older (obsolete) and newer approaches - ie is IMO somewhat confusuing.

WARNING: I am an experienced developer, but new to Python. There may therefore be errors/misunderstandings in this article - if so, feedback is very welcome!

DOUBLE WARNING: I have only theoretical knowledge of asynchronous programming in Python, ie have read various sources and considered the implications - but have not actually implemented anything except trivial test-cases.

Why Asynchronous Programming

The goal of asynchronous programming is not to run Python code in multiple threads, but instead to make the best possible use of a single thread. Because multiple os-level threads are not involved, the CPython GIL is not a limiting factor when using asynchronous programming.

Async programming is quite useful for a number of common problems, including:

  • implementing an http server which needs to handle multiple concurrent http requests
  • implementing a message-broker server which needs to handle multiple concurrent client requests
  • handling incoming requests which trigger calls to a remote database

As noted previously, asynchronous code is still limited to 1 CPU - it just makes better use of that CPU than blocking-style programming.

Asynchronous programming is particularly effective for programs that concurrently perform many tasks that are IO-bound. When concurrent tasks spend 99% of their time waiting for IO and 1% of their time actually executing on the CPU, then 100 such tasks fit on a single CPU…

Alternatively, when a system has N cpus, then spawning N instances of the Python app and using asynchronous programming within each instance is an effective way to get good performance.

From PEP-342 (generator-based coroutines), the motivation was to:

… be able to support performing asynchronous operations without needing to write the entire application as a series of callbacks, and without requiring the use of resource-intensive threads for programs that need hundreds or even thousands of co-operatively multitasking pseudothreads

Threads are sometimes not the optimal solution for such problems, due to:

  • needing a preallocated stack for each thread (ie significant memory-usage overhead)
  • requiring os-level context-switches (ie significant cpu-usage overhead)
  • having significant setup time
  • requiring care (synchronization) when sharing data between threads, and
  • competing for the CPython GIL (global interpreter lock)

Asynchronous programming (all kinds) have none of the above problems.

Different Kinds of Asynchronous Programming

Asynchronous programming requires code be broken up into chunks of non-blocking user code, joined together with operations that either block or need to be repeatedly retried until they succeed (eg non-blocking IO operations). A scheduler (aka event-loop or reactor) then runs each chunk; blocking operations are handled by delegating to a pool of background threads. the thread running the scheduler also runs the user code chunks, repeatly selecting chunks which are “rnnable” (not waiting for input from a different chunk). When a chunk completes (produces a value), other chunks that are waiting for that value become available for the scheduler to run.

Support for asynchronous programming in Python has gone through four major revisions:

  • using callback functions (not coroutines)
  • using coroutines based on generators with generator.send
  • improved coroutines based on generators with yield from
  • new implementation of coroutines based on new keywords async/await (since Python 3.5)

These approaches are all used in various third-party libraries. Eventually, a new module asyncio was added to the Python standard libraries, based on experiences from external libraries. The first version of asyncio (Python 3.4) was based on generator-based coroutines with yield-from; since version 3.5 there is also support for async-based coroutines.

Hopefully the most recent approach based on keyword async (PEP 492) will be a long-term solution; the chances are good as the async approach has been copied from other languages where it has been successful (eg in node.js).

Some Python-based external frameworks that use asynchronous programming techniques are:

Asynchronous programming techniques are of course also used in languages other than Python; eg:

  • nginx webserver (implemented in C) - a “worker” is effectively an async event-loop
  • node.js (Javascript)
  • Erlang processes (Erlang)
  • Ktor (Kotlin) - asynchronous programming with coroutines on the Java Virual Machine

Asynchronous programming is similar to reactive programming (because of the scheduler aka event-loop).

Asynchronous Programming with Callbacks

Python frameworks (eg Twisted) have supported asynchronous programming without coroutines for a long time. Frameworks in other languages (eg node.js) also have successfully done this without coroutines.

However it does require adopting a somewhat odd programming style based on callbacks. Because code can never invoke a blocking operation in the middle of a function, it is necessary to break larger functions up into pieces that contain no blocking function-calls, and then pass references to these chunks of code (callbacks) around.

This article won’t address callback-based asynchronous programming any further - there are lots of examples on the internet if you are interested.

Module asyncio

The module “asyncio” has been in the Python standard library since version 3.4. It supports asynchronous programming based on multiple approaches:

  • callbacks
  • coroutines based on Generators with yield from
  • coroutines based on ‘async’ (which did not become part of Python until v3.5..)

The last approach is highly recommended for new code.

AIUI, supporting all these different solutions in one library is done by representing them as instances of asyncio.Future, ie this Future type is the “unifying concept” that isolates the underlying implementations. Note that class concurrent.futures.Future has a similar concept, but the types are not related.

Note that many of the concepts of asyncio were first developed in third-party libraries before being standardised as part of the Python library.

The original PEP for asyncio is interesting, and describes how all of the above is supported. PEP 492 describes the changes to asyncio for async-based coroutines.

Module asyncio is strongly influenced by the Twisted framework, adding not only an event-loop but also protocol and transport types which are basically equivalent to those from Twisted.

Some example code for module asyncio starts processing with method asyncio.run(..) - a function added in Python 3.7. This is indeed the most elegant way to start a main event loop, but is only a thin wrapper around the event-loop functions that existed in earlier versions of Python - ie there is no need to wait for Python 3.7 to implement asynchronous programming with async-based coroutines.

Coroutine Common Principles

Asyncio’s support for both types of coroutine has a lot of similarity at the conceptual level, even when some of the APIs are different.

In both cases, the primary thread of the Python interpreter runs an event-loop. The event-loop has:

  • a set of runnable tasks
  • a set of tasks that are suspended, waiting for other tasks to complete
  • a pool of tasks to run in a threadpool
  • a pool of background threads for running “blocking” operations such as file IO or network IO

The loop basically picks a runnable task, and executes it using the event-loop thread. When that tasks waits for output from some other task, control is returned to the event-loop which moves that task to the “suspended set”. When a task produces a value, then it is suspended and any task that is waiting for its output gets moved from the suspended set to the runnable set.

Code is forbidden from directly calling any blocking function; that would block the main application thread. Instead, a function should “queue a task” to perform the needed operation, and suspend itself until the output of that task is available - with the event loop (ie task scheduler) receiving control in order to decide what to execute next. This ability to “suspend” a function, and resume it later when its needed input is available, is what a coroutine provides.

Generator-based coroutines do this “suspend, and resume later” by calling “yield” or “yield from”. The async-based coroutines do this by calling “await”. In either case, the asyncio module (or other framework) provides a nice selection of functions for creating “tasks” that actually do blocking-like operations such as file or network IO.

This ability to “suspend and give control back” means that a single thread can run both the event-loop (scheduler) and the user code (coroutines). Blocking-like operations might be implemented as components that need to be called multiple times (eg code that uses operating-system-level nonblocking IO apis), or might be implemented by handing the operation off to a small pool of real threads which actually do block - while the scheduler thread continues to run. Any code executed via a thread-pool. Code scheduled to run in the thread-pool should not have much/any Python code in it, as that causes competition for the GIL - instead, such tasks should just perform native IO or run other native code that does not need the GIL.

The asyncio module provides a bunch of wrapper methods for common blocking operations such as fileIO, networkIO, sleep, etc.

Actually, the component that runs blocking code is defined in module asyncio using an Executor abstraction; a threadpool is one possibility (but the usual one).

Asynchronous programming with Coroutines based on Generators (pre-3.5)

The “yield from” approach works by having logic implemented as generators that use the “yield” operator to return values, and “yield from” to wait for other generator-based coroutines.

A task that can be run from the “main event loop” is always a generator whose code basically looks like:

  • do some processing
  • suspend until data is available from coroutine A (via yield-from)
  • do some processing
  • suspend until data is available from coroutine B (via yield-from)
  • etc
  • produce a value (by returning from the end of the function)

The “suspend until” steps are implemented by invoking “yield from {other coroutine}”. This in fact suspends the current generator and returns {other coroutine} to the caller - which is the event loop. The event loop then marks the coroutine that it just invoked as “suspended, depending on {othe coroutine}” and adds {other coroutine} to its set of runnable tasks. At some time, that task will be chosen by the scheduler and executed; it may in turn use yield from itself, thus suspending for a while. However eventually that {other coroutine} will return a value (in the normal way) and terminate. Control returns to the event loop, which sees that a real value has been emitted, and then marks the generator that was “suspended depending on ..” as runnable, with dependent-value being whatever that {other coroutine} returned. And then at some time the event-loop will select that original task for execution; the “send” function will be used to pass the dependent-value in to that generator and it will resume. In effect, that “get data from coroutine A” resumes with the desired data element.

Here is the official documentation for using “older” asyncio eventloops: https://docs.python.org/3.6/library/asyncio-dev.html#asyncio-dev

For more information on generators, and yield from, see my main article on Python.

Asynchronous Programming with Coroutines based on Async and Await

The technical details are specified in PEP 492 - Coroutines with async and await.

Here’s an example of a trivial async-based program:

import asyncio

async def returnit(value):
    print(f"returning {value}")

    # complete this coroutine with a return value
    return value

coro_a1 = returnit("a1-val")
coro_a2 = returnit("a2-val")

async def waitforit():
    print("running waitforit..")

    print("awaiting nested1..")
    # add the object returned by returnit to set of runnable tasks, and suspend this coroutine until it has completed
    await returnit("nested1")

    print("awaiting nested2..")
    # add the object returned by returnit to set of runnable tasks, and suspend this coroutine until it has completed
    await returnit("nested2")

    # and complete this coroutine with no return value

async def runit():
    print("running..")

    print("awaiting a1..")
    # add coro_a1 to set of runnable tasks, and suspend this coroutine until coro_a1 has completed
    v1 = await coro_a1  

    print("awaiting a2..")
    # add coro_a1 to set of runnable tasks, and suspend this coroutine until coro_a2 has completed
    v2 = await coro_a2

    print("awaiting waitforit..")
    # add the object returned by waitforit to set of runnable tasks, and suspend this coroutine until it has completed
    v4 = await waitforit()

    print("done")
    # and complete this coroutine with no return value

print("starting")
coro_main = runit()
asyncio.run(coro_main)  # python 3.7 or later (though similar functionality is available on python 3.5+)
print("stopping")

And here is the output:

starting
running..
awaiting a1..
returning a1-val
awaiting a2..
returning a2-val
awaiting waitforit..
running waitforit..
awaiting nested1..
returning nested1
awaiting nested2..
returning nested2
done
stopping

The call to asyncio.run starts the (new-style) event-loop, with one asynchronous object in the “runnable set”. The event-loop selects that single runnable, and invokes it. This results in execution of function runit until the line await coro_a1 at which point the coroutine is suspended (its local-variables and its current-bytecode-offset are saved in the async wrapper object) and control returns to the event-loop. The event-loop sees that coro_a1 is being waited on, so adds it to the set of runnables. The event-loop then chooses an element from the runnables set (there is only one) and executes it - resulting in a call to function returnit with local-variables containing value=”a1-val”. When that function returns, control returns to the event-loop which moves coro_main into the runnable set again. The event-loop then chooses a runnable from the set (just one is available), and calls it, etc.

Replacing the first few lines of runit with the following will ensure that multiple elements are added to the runnable set:

async def runit():
  async for coro in [coro_a1, coro_a2]:
	val = await coro
        print(val)
  print("done")

or

async def runit():
  t1 = asyncio.create_task(coro_a1) # add to the set of runnable tasks
  t2 = asyncio.create_task(coro_a2) # add to the set of runnable tasks

  await t1 # suspend until 
  print("done")

Explicitly creating a “task” also causes it to be added to the runnable set; the await later returns immediately if the object has already completed.

Async/await keywords are available since Python 3.5; expressions async for and async with are newer.

The event-loop in more detail

An event loop maintains:

  • a set of tasks (callable objects) that are in runnable state
  • a set of tasks (callable objects) that are waiting for other tasks to complete first
  • a pool of background threads for performing blocking operations

When a task completes (returns a value) then any tasks that are waiting for it to complete are moved from the waiting set to the runnable set.

On each pass through the event-loop, one task is taken from the runnable set and invoked. It either completes, or suspends itself with a dependency on another task.

Other Related Frameworks

Module asyncore

Python’s standard library includes module asyncore which has primitive asynchronous-programming support. However it really is basic; sockets can be registered with it and callbacks to invoke when data is available on the socket. An event-loop then uses operating-system-level “asynchronous IO” apis to detect when data is available on any of the sockets, and call the associated callbacks one after another.

If an invoked callback needs to perform a blocking operation - well, there is no support offered for that. Module asyncore therefore really does not qualify as an asynchronous programming framework.

Zope

The well-known Zope CMS (content management system) uses threads rather than async programming.

Twisted

The well-known Twisted framework is based on an event loop that it calls a reactor.

There is an excellent tutorial on Twisted (from 2010) that walks through the Twisted framework in great detail. Eventually, it explains that Twisted was originally callback-based, but also supports generator-based coroutines.

As mentioned in the Twisted tutorial at part 8:

Deferreds are a solution (a particular one invented by the Twisted developers) to the problem of managing callbacks. They are neither a way of avoiding callbacks nor a way to turn blocking callbacks into non-blocking callbacks.

Data can be sent via a call to self.transport.write(data). This would seem to be a potentially blocking call - but actually, this just adds a standard Twisted operation (send from a buffer) to a list of things to be done. Twisted can later execute that send-operation piece-by-piece using non-blocking-io (from the main event loop). Similarly, the call to close the connection is simply a standard Twisted operation that is dependent on the previous operation completing - ie is also something that can be added to a list of tasks to be done by the reactor. In other words, this particular code example is a special case (even if a common one): a series of operations are being done which do not include any user code (send, and close) and none of the operations require any data from a previous step.

Dealing with invocation of operations that might block, and where you also need the result in order to continue processing, is addressed in part 13 of the tutorial. And there, it is explained that Twisted takes the usual approach here: the developer must separate the code that runs before the blocking operation, and the code that runs after the blocking operation, into separate callback objects. This is doable, but ugly. Using coroutines makes such code look much more readable.

And then in part 17, it explains that generator-based coroutines are also supported. In fact, newer versions of Twisted (since the tutorial was written) support async-based coroutines too.

We can think of … the generator as a series of callbacks separated by yield statements, with the interesting fact that all the callbacks share the same local variable namespace, and the namespace persists from one callback to the next.

Other Notes

Async-for and Async-with are described here.

If an app creates an async-based coroutine object, but its destructor is called without it ever having been “awaited on”, then a warning is logged - there is presumably a bug somewhere.

An async-based coroutine object can never be awaited-on twice.

References