4: Implementing simple generator coroutines in C++20
Coroutines: Basics, Py->C++20,
Benchmarking
- Introduction
- What are coroutines?
- What coroutines can look like in C++
- Benchmark generators: C++03, C++20 & Python
This is a mini-series leading up to implementing coroutines in C++20, (“what I learnt on my holiday”). It also has comparison with other implementations, as part of my ModernCPP explorations. (first post)
In a previous post I showed some sample generators in C++ and noted that you could not know what they really did without knowing about the implementations. In the last post, I discussed the implementation in C++03, so naturally this post discusses the C++20 implementation.
Implementation
The implementation discussed here is in my “blog-extras” repo on github, and specifically the base implementation can be found here:
Basic usage of this can be found here:
And here:
Implementation Overview
This is more complex than C++03 or python generators, so I’ll break in down in parts:
- Templated to allow changing the return type
- An external API for using the generator
- An internal API for the compiler’s generated code to call when managing/running the coroutine
- An external API for use in for loops.
In more detail:
- Templated to allow changing the return type
- Provides an external API for things that are
simple_generator
s to use:- constructors/destructors (normal, move)
start()
- to start the generatortry_next()
- to give the generator some CPU timetake()
- to get the last returned valuerunning()
- to confirm it’s running/runnable
- An internal API for the compiler to call when certain things happen
in the coroutine body, including:
- How to construct it
- What to do when the coroutine is started.
- What to do when a
co_yield
statement is run. - What to do if you drop off the bottom of the coroutine
- What to do if
co_return
is called.
- An external API designed to be used by the ranges format of
for
, to allow for the Modern C++ style of iteration through a thing.
The fact that these 3 APIs are a bit intermingled can be confusing upfront - most tutorials I’ve seen don’t seem to look at it from this perspective.
Out of these:
The external API you provide is up to you. If you want a particularly wacky style of API, you can do this. The fact there isn’t a standard user API (like python’s) I feel is problematic. There’s one coming in C++23 apparently. I’ve not looked into details yet though because compiling C++20 can be challenging depending on the features you need, let alone C++23.
The internal API is well defined and fixed. You have to implement this, and it has a predetermined name -
promise_type
. This is incredibly badly named. A better name would be “coroutine_frame_state_machine” or similar.The external API support for ranges/iterator protocol is well defined and mercifully brief.
High Level C++20 Skeleton
The high level skeleton for our code looks like this:
template<typename T>
class simple_generator {
public:
class promise_type; // Forward decl - to be defined lower inside this class
using handle_type = std::coroutine_handle<promise_type>;
private:
handle_type mCoro;
public:
// External API implementation
class promise_type { } // Internal coroutine API
class iterator { } // Actual iterator impl
// iterator external API
// TBD - see below
}
Of note:
We must have an internal structure called (or findable as)
promise_type
. If you don’t the compiler complains, exits and has an identity crisis.You have to use that
handle_type
in lots of places. You can call it what you like, but something shorter than it’s full name is a blessing. Most people tend to use something likehandle_type
as far as I can see.When your generator is created, via a bounce, you are given a handle to the frame for your coroutine. This is of type
handle_type
, and you interact with your coroutine using this. So you need to have a place to store one. Here I call itmCoro
.
Constructors / etc - Basics needed to function
You don’t really have many options here, so we’ll cover them here.
explicit simple_generator(handle_type h) : mCoro(h) {}
As noted, when someone uses your coroutine, your coroutine gets
created. Because the compiler detects the co_yield
,
co_await
or co_return
keywords, it knows it
needs to construct a co-routine handle (which uses your promise type)
and uses that to initialise your coroutine. As a result, this is why you
end up with this odd looking initialiser. (odd, because you
never trigger this call this directly)
The next two are the move constructor and move assignment constructor. These are pretty standard fare at this point:
(simple_generator &&other_sg) noexcept : mCoro(other_sg.mCoro) {
simple_generator.mCoro = nullptr;
other_sg}
&operator=(simple_generator &&other) noexcept {
simple_generator if (this != other) {
= other.mCoro;
mCoro .mCoro = nullptr;
otherreturn *this;
}
}
If you don’t understand what’s happening here, please go off and read up on move constructors and move assignment constructors. (We’ll be here when you’re back.) This stuff is necessary if you want to do something like move coroutines into a vector or queue for example. (think: run queue)
We want to remove copy constructors, because things will not work the way you want if you leave them there:
(const simple_generator &) = delete;
simple_generator&operator=(const simple_generator &) = delete; simple_generator
Lastly we can’t just use the default destructor, since we need some explicit shutdown type code:
~simple_generator() { if (mCoro) { mCoro.destroy(); } }
So far so good/normal.
External API - used by people using these generators
So let’s start with the external API that we want to provide. The decisions we make here will affect everything else. Bear in mind, I didn’t develop this API from this approach, but derived it from experimentation, fiddling, thinking and refactoring.
First off some design choices. When reviewing the C++20 coroutine
interface, I initially started doing so thinking like a python
programmer. (hence why my “old” coroutine interface copies python and
uses a StopIteration
exception). When I stopped and
rethought about it, from a C++ perspective it made more sense. There are
some platforms (not many) where exceptions are problematic.
So where, in python, you might write this…
try:
= next(f)
x except StopIteration:
print("Done")
… because this matches python’s norm of “Better to Ask forgiveness”, you would normally try to avoid forcing exceptions onto your user in C++20 - even if exception handling is the recommended approach in Modern C++. The reason for this, because in C++ the default is normally “look before you leap”. The equivalent approach is something like this:
if (f.running() ) {
.try_next();
f= f.take();
x } else {
std::cout << "Done\n");
}
So this gives us our basic external API:
void start()
- Function just to make it clearer we’re starting the generator. Actually callstry_next()
void try_next()
- resumesmCoro
- which is the actual thread of control. We also rethrow any exception raised within the coroutine to avoid masking errors.bool running()
- returns the opposite ofmCoro.done
- which gets set by the compiler generated code when the coroutine can’t be resumed.T take()
- Returns/moves the last value yielded back to the caller.
The implementation of this external API is relatively brief.
// Implementation of the external API called by the user to actually use the generator
void start() { try_next(); }
bool running() { return not mCoro.done(); }
void try_next() {
.resume();
mCoroif (mCoro.promise().m_latest_exception)
std::rethrow_exception(mCoro.promise().m_latest_exception);
}
() { return std::move(mCoro.promise().m_current_value); } T take
We don’t really need a start() method, but it does make things clearer.
Internal
API to manage the coroutine frame - promise_type
The questions here are:
- What is
mCoro
- how is it created? - What’s with this
promise()
method? - Where do the attributes
m_latest_exception
andm_current_value
get set?
Well, these are all handled by the compiler creating code that calls something you create called a promise_type object. This is badly named, but really implements functions within a simple state machine to control activities during a coroutine. The reason for this is because C++20 coroutines leave it up to you how they operate. This is both useful, but also annoying if you just want to use them.
The promise_type
handles
Rather than dive into the implementation of
promise_type
, let’s step back to our fibonacci and see what
needs handling. I’ll annotate those points here:
<int> // NOTE: Need to create this return object
simple_generator(int max) // NOTE: Do we run immediately or wait?
fibs{
int a{1}, b{1}, n{0};
for(int i=0; i< max; i++) {
co_yield a; // NOTE: How do we handle the yielded value?
= a + b;
n = b;
a = n;
b }
// NOTE: What if an exception was thrown and not caught?
// NOTE: What if they just did co_return?
// NOTE: What if we reached the end normally?
}
Rather than provide you with default behaviours for these, you’re expected to pick an implementation and use that. So let’s provide an implementation.
Note, you can’t just ignore this - there are no defaults. If you try,
the compiler whinges at you that you’ve missed off bits. (I tested this
by starting with an empty promise_type
and adding pieces as
necessary)
The promise_type
skeleton
Let’s taking this API a piece at a time. First of all the skeleton of
the promise_type
itself. This sits within our
simple_generator class:
// Implementation of the internal API called when co_yield/etc are triggered inside the coroutine
class promise_type {
m_current_value;
T std::exception_ptr m_latest_exception;
friend simple_generator;
public:
// TBD - Note - all these methods need to be public
...
}
promise_type
and simple_generator
initialisation
First, let’s look at the how the mCoro object gets created, as part of the initialisation process.
Specifically compiler looks for
WHATEVER::promise_type::get_return_object
in order to
create a promise_type
object to manage the coroutine stack
frame. In our case it looks like this:
// NOTE: Need to create this return object
auto get_return_object() { // Inside promise_type
{ return simple_generator{ handle_type::from_promise(*this) } ; }
This essentially calls the simple_generator
constructor:
explicit simple_generator(handle_type h) : mCoro(h) {}
This is where our mCoro
object comes from. The type of
this is handle_type
. That’s an alias for
std::coroutine_handle<promise_type>
. This allows the
external API to interact with the coroutine hande.
promise_type
state machine management
Now let’s step through our simple_generator the
promise_type
methods needed and what they relate to. All
these methods sit within promise_type
.
The functionality to actually drive the coroutine is relatively short, so we’ll use the comments we added to the generator and place them next to the implementation. Note that next to each I’ve also annotated reponses to the notes from the generator.
// NOTE: Need to create this return object
auto get_return_object() {
{ return simple_generator{ handle_type::from_promise(*this) } ; }
// NOTE: Do we run immediately or wait?
// -> We wait for a resume
auto initial_suspend() { return std::suspend_always{}; }
// NOTE: How do we handle the yielded value?
// -> We get passed a value, so we capture it
// -> We then wait to resume
auto yield_value(T some_value)
m_current_value = some_value;
return std::suspend_always{};
// NOTE: What if an exception was thrown and not caught?
// -> Catch it to allow rethrowing to the caller
auto unhandled_exception() { m_latest_exception = std::current_exception(); }
// NOTE: What if they just did co_return?
// -> Should not resume after co_return
auto return_void() { return std::suspend_never{}; }
// NOTE: What if we reached the end normally?
// -> If we suspend_never, this promise gets destroyed here
// *and* in our simple_generator definition (segfault)
auto final_suspend() noexcept { return std::suspend_always{}; }
}
As a result, ultimately these methods control the statemachine. Some key points:
- The compiler generates code that calls these callbacks as noted.
- The generated code calls our
yield_value
coroutine whenco_yield value
is called, and provides us with a value we can capture in the promise object. - We can do the same to capture any exception thrown from within the coroutine.
- The promise_type object is accessible via mCoro.promise(), which gives the external API access to this.
Control flow example
simple_generator<int>
fib = fibbonaci() Is called, but that actually causessimple_generator::promise_type::get_return_object()
is to be called- That creates an object of type
std::coroutine_handle<simple_generator::promise_type>
- And that uses
simple_generator::simple_generator(handle h)
to actually create thesimple_generator<int>
Additionally,
promise_type::initial_suspend is called
- which controls what happens immediately after construction. In our generators we suspend immediately.The user calls
fib.start()
. This actually does this:- Calls
try_next
- That calls
mCoro.resume()
to give it some CPU. (InternallymCoro.done()
may get toggled) - That may raise an exception which wasn’t handled. If there was it gets rethrown to the caller.
- Calls
Inside the resumption, inside the coroutine, the code may
co_yield a
to return a value to the generator consumer.- That triggers
*yield_value(some_value)*
to be called - Inside there, we store
some_value
inide thepromise_type
object as m_current_value, - If an exception was raise we store that inside the
promise_type
asm_latest_exception
(let’s assume here that doesn’t happen)
- That triggers
The user then calls
fib.running()
to see what to do next- This returns the logical inverse of
mCoro.done()
- Let’s assume initially this returns true.
- This returns the logical inverse of
The user calls
fib.try_next()
- That calls
mCoro.resume()
to give it some CPU. (InternallymCoro.done()
may get toggled) - That may raise an exception which wasn’t handled. If there was it gets rethrown to the caller.
- That calls
The user then calls
fib.running()
to see what to do next- This returns the logical inverse of
mCoro.done()
- Let’s assume initially this returns true. If it does, we can loop back to 4.
- If it doesn’t we carry on.
- This returns the logical inverse of
The reason we got
true
frommCoro.done()
is because either:- The coroutine exited at the end of the body
- An exception was raised
co_return
was called. But either way, the coroutine has finished.
If the co_routine finishes by
co_return
ing a value,promise_type::return_void
gets called, which renders the coroutine unrestartable.If the co_routine finishes by an
unhandled_exception**, then
promise_type::unhandled_exception` gets called, which in our case doesn’t destroy the promiseSimlar for normal exit -
final_suspend
is calledLastly, as
fib()
drops out of scope, the destructor is called, which destroys the handle if it is (still) valid.
Interlude
It’s worth noting, we’re not done yet!!! There’s still the ranges/iterator support API to write. When you consider this, there’s a remarkably large amount to deal with, from hidden compiler transforms to a whole slew of other details.
Get one wrong and the house of cards falls down.
Supporting range/iterator in C++20 Coroutine based generator
This consists of two halves:
- An internal implementation
- An external API
We’ll start with the latter and work backwards.
External simple_generator iterator API
The API for this is short, and sits directly withing simple_generator:
() { return iterator{*this, false}; }
iterator begin() { return iterator{*this, true}; } iterator end
This creates begin()
and end
endpoints
which are used to the ranges API to determine whether it’s done or not.
And that’s literally the since the prototype for iterator is:
the simple_generator
being iterated and a flag to signify
done or not.
For out iterator to function, it needs
- to capture the owner of the iterator - the
the simple_generator
being iterated - a done flag
- a means of driving the generator forward.
These look like this:
class iterator {
<T> &owner;
simple_generatorbool done;
void iter_next() {
.try_next();
owner= not owner.running();
done }
// NOTE: TBD
}
Once those are in place, we then need to implement the standard iterator operations: - operator != - to see if we’re done or not - operator * - to return the current value - operator ++ - to advance to the next value - A constructor
These look like this:
public:
bool operator != (const iterator &r) const { return done != r.done; }
auto operator * () const { return owner.take(); }
&operator ++ () { iter_next(); return *this; }
iterator (simple_generator<T> &o, bool d) : owner(o), done(d) { if ( not done ) iter_next(); } iterator
Once these are in place the simple_generator can be used in a “ranges-style” for loop.
Where’s the code?
Update: 11/4/2023
Want to read and runcode, not the blog? I know the feeling. I recently created a github repo specifically for putting code from blog posts in. Though recognising that it might also have non-code stuff added at some point, I’ve called it blog-extras.
The blog-extras repo can be found here:
The code specific to this series of short posts can be found here:
That repo will get fleshed out with code/examples from other posts in this blog too.
Brief Discussion
That was an awful lot to go through, of seemingly escalating complexity. There’s a longer discussion/set of thoughts on the next/last post in this short series.
However, I think it’s useful to reflect on what this escalating complexity. has bought us. It’s a lot.
Let’s take another look at the C++03 fibonnaci generator:
class Fibonnaci: public Generator {
int a, b, s;
public:
() : a(1), b(1), s(0) { }
Fibonnaciint next() {
GENERATOR_CODE_STARTwhile ( true ) {
(a);
YIELD= a + b;
s = b;
a = s;
b };
GENERATOR_CODE_END};
};
… and also the C++20 fibonnaci generator:
<int> fibs(int max) {
simple_generatorint a{1}, b{1}, n{0};
for(int i=0; i< max; i++) {
co_yield a;
= a + b;
n = b;
a = n;
b }
}
… versus the python original:
def fibonnaci():
= 1,1
a, b while True:
yield a
= b, a+b a,b
At the end of the day, the C++20 generator looks a lot simpler and
cleaner than the C++03 version. But the effort to get there was quite
substantional. That said, once the simple_generator<>
class is defined, it’s simple to use - and unlike the C++03 version,
you’re not having to remember to build the scaffolding each time.
But is that it?
Not really, no.
NEXT POST: I got curious about performance…
Updated: 2023/09/15 21:12:34