Learning Resources
Implementing Actors
Part 1
Introduction
CAF attaches many labels to actors: event-based, blocking, statically typed, dynamically typed, and so on. When delving into the depths of the API, users may get lost in technical details and subtleties without grasping the big picture first.
In this guide, we first take a high-level look at the basic ideas and concepts behind the API before we discuss how the pieces fit together and delve into implementation details. We aim to provide a solid foundation for understanding the API and its design choices.
The Actor Model
Before we discuss API details of CAF, here is a quick refresher on actors in general. The original formulation of the Actor Model of computation by Hewitt, Bishop, and Steiger, ties three things to an actor:
- Processing: CPU cycles and internal control flow.
- Communications: Sending and receiving messages.
- Storage: Member variables and other state.
Conceptually, we can imagine an actor as depicted in this diagram:
As the name suggests, an actor is an active software entity. It shields its inside from the outside world. Only messages may cross the boundary between inside and outside. No internal variable may gets accessed from the outside, only the mailbox.
When holding an handle to an actor, you may do one of two things: send a message to the actor by enqueueing to its mailbox or observe the lifetime of the actor by monitoring or linking to it.
Handle Types: The Outside View
The concept view now allows us to categorize some of the building blocks in the CAF API. Remember, there is an inside and an outside. When standing outside of an actor, it appears as a black box with only a mailbox standing out for interacting with it. What kind of mailbox, though? That depends on the handle type:
strong | weak | |
---|---|---|
dynamically typed | actor |
|
statically typed | typed_actor<...> |
|
untyped | strong_actor_ptr |
actor_addr |
There are four different handle types in CAF in total. We ignore the strong/weak categorization for now. The most important difference between the handle types is: what kind of mailbox do you see on the outside?
The handle type actor
gives you a dynamically typed mailbox on the outside.
This means you can put any message into the mailbox. Whether the actor
actually understands this input or not is decided once the actor processes the
message. This means errors only happen at run-time.
The handle type typed_actor<...>
gives you a statically typed mailbox on the
outside. This means there is a list of allowed message types and you may not
enqueue anything else. The compiler catches type errors for you. This comes at
the cost of more boilerplate code.
Lastly, there are two untyped handle types. Here, you see no mailbox at all!
The best way to think of these two is as a type-erased pointer to an actor.
Usually, you need to restore type information via actor_cast
before you can do
anything useful with these handles.
Typing aside, CAF also distinguishes between strong and weak references to an actor. Actors in CAF are reference counted in order to enable the run-time system to detect and dispose unreachable actors.
If they do not allow sending messages, what use case do the untyped handles
have? Let us start with actor_addr
, because it has a very specific use case:
exit messages. When two actors are linked, CAF sends an exit_msg
to the other
if one of them terminates. CAF has no type information about the terminated
actor. Further, exit messages cannot include a strong reference to the
terminated actor, because this actor is probably already destroyed and we may no
longer access its mailbox. The most useful thing actor_addr
has to offer is
its operator==
. Once an actor receives an exit message, it may compare the
received actor_addr
to some list or map of known actors. For example, a
supervisor may monitor its workers for re-spawning them on error. After
receiving a down message, the supervisor iterates its list of workers to replace
the matching entry with a newly spawned worker. The handle type actor_addr
can
generally be used to store a reference to an actor without keeping it alive.
Our second untyped handle is strong_actor_ptr
. Unlike its weak counterpart,
this handle type keeps actors from becoming unreachable. The most notable use
case for strong_actor_ptr
is for storing the sender of a message. CAF has no
knowledge of whether a message came from a dynamically or statically typed
actor. A receiver can reasonably assume that the sender is going to provide a
handler for the response message (if any), but the receiver has no further
knowledge regarding the type of the sender. Actors can access the source of a
message when inside a message handler by calling self->current_sender()
. If
CAF would use a weak handle for storing sender information, then an actor could
become unreachable immediately after sending a message and before receiving the
response. Aside from sender information, there remain only a few places in CAF
where users may encounter a strong_actor_ptr
. Notable examples include the
actor registry and when communicating with the middleman actor.
Implementation Types: The Inside View
At a first glance, there seem to be many actor types in CAF. However, we can
prune some choices right away by following this simple rule: do not implement
blocking actors! If you ever find yourself in need of a blocking way to
interact with other actors, then use scoped_actor
. If you need an actor to
have its own thread of execution, then spawn it using the detached
flag.
Consider blocking_actor
an implementation detail.
When reading the Actors Section in the official Manual without having a solid grasp on terminology and concepts, the various actor types in the class hierarchy may confuse.
Luckily, the picture drastically simplifies if we just focus on the choices we
need to make when implementing an actor. We really have only one choice to make:
what type do we need to assign to self
. The concept of self
is analogous to
the this
pointer in object-oriented programming and allows actors to refer
back to themselves. Furthermore, CAF will also look at our choice for self
to
pick the appropriate handle type for the actor, i.e., actor
or
typed_actor<...>
.
So ultimately, we have two distinguish between dynamically and statically typed
actors. For dynamically typed actors, self
is of type event_based_actor*
.
For statically typed actors, self
is of type typed_actor<...>::pointer_view
.
In part 1, we will only look at the dynamically typed actors, i.e., self
will
be of type event_based_actor*
. In part 2, we will discuss statically typed
actors.
To break things down further, we can step back and remember the three things an actor encapsulates: Processing, Communications, and Storage.
Processing
The Processing part is the control flow of an actor. Actors in CAF are event-based, meaning they follow a simple state machine:
- Wait for an event (usually a message).
- Process the event.
- Terminate when done, otherwise goto 1.
With this in mind, we can look at our first example.
Source Code
void my_actor_impl(caf::event_based_actor* self) {
self->println("initialize my_actor");
}
void caf_main(caf::actor_system& sys) {
sys.spawn(my_actor_impl);
}
Output
initialize my_actor
The simplest way to write an actor is to provide a free function. It optionally
takes a self
argument as first parameter. When omitting the self
parameter,
CAF will assign the actor a dynamically typed handle.
The function body of my_actor_impl
is called to handle the first event for the
actor: the implicit init
event. Here, we print a message to the console and do
nothing else. Hence, the actor terminates immediately after the first event.
So how can we tell CAF what to do with the next event? Any other event besides
the init
event is a message. We can tell an actor how to process a message by
providing a behavior. In other words, defining what to do in step 2 of the
state machine. Actors may set a behavior by calling self->become(...)
, as
shown in our next example.
Source Code
void my_actor_impl(caf::event_based_actor* self) {
self->println("initialize my_actor");
self->become(
[self](const std::string& str) {
self->println("my_actor received: {}", str);
}
);
}
void caf_main(caf::actor_system& sys) {
auto my_actor = sys.spawn(my_actor_impl);
caf::anon_mail("hello!").send(my_actor);
}
Output
initialize my_actor
my_actor received: hello!
Because actors will almost always set a behavior, there is shortcut in CAF
to define the initial behavior. Instead of calling self->become(...)
in the
function body, we can simply return a caf::behavior
, as shown below.
Source Code
caf::behavior my_actor_impl(caf::event_based_actor* self) {
self->println("initialize my_actor");
return {
[self](const std::string& str) {
self->println("my_actor received: {}", str);
},
};
}
void caf_main(caf::actor_system& sys) {
auto my_actor = sys.spawn(my_actor_impl);
caf::anon_mail("hello!").send(my_actor);
}
Output
initialize my_actor
my_actor received: hello!
These two examples are equivalent, but the second one is more idiomatic. This
time, we use the init
event to tell the actor how to proceed with the next
events, i.e., messages. The actor will use its behavior to process any incoming
messages until it terminates. However, we did not tell it to terminate in our
example. So why does the application terminate? This goes back to our
observation that there are strong and weak references to actors. The actor
system will automatically terminate any actor once its strong reference count
drops to zero. In our example, the variable my_actor
holds a strong reference
to the actor. Once my_actor
goes out of scope, the reference count decreases
by one. We also create a strong reference to the actor by sending it a message.
As long as there is at least one message in the mailbox, the scheduler in CAF
will hold a strong reference to the actor. Once the mailbox is empty and the
actor is no longer running, CAF will decrease the reference count by one. So in
our example, the actor will be cleaned up as unreachable after caf_main
returns and the message has been processed.
We can also explicitly tell an actor to terminate by calling self->quit()
. The
actor will then discard any further messages and terminate after processing the
current one, as shown in the next example.
Source Code
caf::behavior my_actor_impl(caf::event_based_actor* self) {
self->println("initialize my_actor");
return {
[self](const std::string& str) {
self->println("my_actor received: {}", str);
self->quit();
},
};
}
void caf_main(caf::actor_system& sys) {
auto my_actor = sys.spawn(my_actor_impl);
caf::anon_mail("message 1").send(my_actor);
caf::anon_mail("message 2").send(my_actor);
}
Output
initialize my_actor
my_actor received: message 1
Although we are sending two messages to the actor, it will only process the first one and then terminate. The second message will be discarded.
One note on self->quit()
: the actor will terminate with exit reason normal
.
The function also takes an optional argument to specify a different exit reason
in case the actor terminates due to an error. As part of the termination
process, CAF send the exit reason to all linked actors and monitors (to learn
more about that topic, please read our guide on
Monitoring and Linking).
Communications
We have seen how to define the control flow of an actor. The second part of the
actor model is about Communications. In CAF, actors communicate by sending
messages. We have already seen how to send a message to an actor by using
anon_mail
. However, usually we are using self->mail(...)
to initiate a
message. There are generally two styles of messages: fire-and-forget and
request-response.
A fire-and-forget message is sent without expecting a response. The sender does not wait for a response and any message that the receiver sends back is processed by the regular actor behavior. In contrast, a request-response message has a timeout and an explicit response handler.
Our next example shows how to send both styles of messages and how actors process responses.
Source Code
caf::behavior adder_impl() {
return {
[](int32_t x, int32_t y) {
return x + y;
},
};
}
caf::behavior client1_impl(caf::event_based_actor *self, caf::actor adder) {
self->mail(int32_t{1}, int32_t{2}).send(adder);
return {
[self](int32_t y) {
self->println("client1 received: {}", y);
},
};
}
caf::behavior client2_impl(caf::event_based_actor *self, caf::actor adder) {
self->mail(int32_t{1}, int32_t{2}).request(adder, 1s).then([self](int32_t y) {
self->println("client2 received: {} (response)", y);
});
return {
[self](int32_t y) {
self->println("client2 received: {} (behavior)", y);
},
};
}
void caf_main(caf::actor_system &sys) {
auto adder = sys.spawn(adder_impl);
sys.spawn(client1_impl, adder);
sys.spawn(client2_impl, adder);
}
Output
client1 received: 3
client2 received: 3 (response)
When looking at the adder_impl
, we can see that there is no explicit messaging
at all. The actor simply returns the sum of two integers. When returning a value
from a message handler, CAF automatically sends the value back to the sender.
This mechanism allows CAF to correlate input and output messages correctly.
In client 1, we use self->mail(...).send(...)
to send a fire-and-forget
message to the adder. The response is processed by the regular behavior of the
client and it will print client1 received: 3
.
In client 2, we use self->mail(...).request(...).then(...)
instead. This will
cause CAF to send a request-response message to the adder. The client will wait
up to one second for a response. If the response arrives in time, the client
will print client2 received: 3 (response)
. Otherwise, it will terminate with
an error by default. Optionally, we could pass a second lambda to then
that
takes a caf::error
as argument to override the default error handling. From
the output, we can see that client 2 will use the response handler when the
message from the adder arrives and the handler from the regular behavior will
not be called.
We an also call await
instead of then
for passing a response handler. The
difference is that await
will suspend the regular actor behavior until the
response arrives while then
will multiplex the response handler with the
regular behavior. Usually, await
is only necessary to enforce a specific
ordering of messages and we will use then
in most cases.
As we can see, the request-response style of messaging directly feeds into the control flow of an actor.
Storage
Now, we know how to define the control flow of an actor and how to communicate with other actors. The last part of the actor model is about Storage. Here, we mean Storage in the sense of any state that an actor may hold.
We have actually already seen one way of holding on to some state in an actor:
the capture list of the lambda expressions that we have used to define the
behavior. We captured self
in the lambda expressions, but we could have
captured any other state needed by a particular message handler.
To illustrate this, we implement a simple cell actor that stores a single 32-bit integer. The actor provides two message handlers: one to get the current value and one to set a new value.
Source Code
caf::behavior cell_impl(caf::event_based_actor* self) {
auto value = std::make_shared<int32_t>(0);
return {
[value](caf::get_atom) {
return *value;
},
[self, value](caf::put_atom, int32_t new_value) {
self->println("cell changes its value from {} to {}", *value, new_value);
*value = new_value;
},
};
}
void client_impl(caf::event_based_actor *self, caf::actor cell) {
self->mail(caf::put_atom_v, int32_t{1}).send(cell);
self->mail(caf::put_atom_v, int32_t{2}).send(cell);
self->mail(caf::put_atom_v, int32_t{3}).send(cell);
self->mail(caf::get_atom_v).request(cell, 1s).then([self](int32_t value) {
self->println("client received: {}", value);
});
}
void caf_main(caf::actor_system &sys) {
auto cell = sys.spawn(cell_impl);
sys.spawn(client_impl, cell);
}
Output
cell changes its value from 0 to 1
cell changes its value from 1 to 2
cell changes its value from 2 to 3
client received: 3
Since multiple message handlers access the same state, we use a shared pointer to store the value. While this is a simple way to store state, it is not the most efficient and it gets very cumbersome and hard to maintain when we need to store more complex state.
CAF allows us to store state in a more structured way by writing a state class.
A state class is a user-defined class that must provide a (public)
make_behavior
member function to initialize the actor. CAF automatically
constructs and destructs the state object when the actor starts and stops. To
spawn an actor from a state class, we use the actor_from_state
utility and
pass the state class as template argument, as shown in our next example.
Source Code
struct cell_state {
cell_state(caf::event_based_actor* ptr, int32_t init_value)
: self(ptr), value(init_value) {
// nop
}
caf::behavior make_behavior() {
return {
[this](caf::get_atom) {
return value;
},
[this](caf::put_atom, int32_t new_value) {
self->println("cell changes its value from {} to {}", value, new_value);
value = new_value;
},
};
}
caf::event_based_actor* self = nullptr;
int32_t value = 0;
};
void client_impl(caf::event_based_actor* self, caf::actor cell) {
self->mail(caf::put_atom_v, int32_t{1}).send(cell);
self->mail(caf::put_atom_v, int32_t{2}).send(cell);
self->mail(caf::put_atom_v, int32_t{3}).send(cell);
self->mail(caf::get_atom_v).request(cell, 1s).then([self](int32_t value) {
self->println("client received: {}", value);
});
}
void caf_main(caf::actor_system& sys) {
auto cell = sys.spawn(caf::actor_from_state<cell_state>, -1);
sys.spawn(client_impl, cell);
}
Output
cell changes its value from -1 to 1
cell changes its value from 1 to 2
cell changes its value from 2 to 3
client received: 3
Our implementation for client_impl
did not change. However, this time we have
a new cell_state
class that encapsulates the state of the cell actor. Now,
message handlers only need to capture this
to get access of all of the member
variables of the state class. This makes it easier to maintain and extend the
state of an actor. The state class also provides the make_behavior
member
function to initialize the actor.
In our example, the constructor of cell_state
takes the self
pointer and the
initial value of the cell. The self
pointer is optional. The rules for
actor_from_state
are straightforward:
- The type for
self
is deduced from the signature of themake_behavior
member function. If themake_behavior
member function returns acaf::behavior
, thenself
is of typecaf::event_based_actor*
. In part 2, we will also see examples with statically typed actors. - Any additional arguments
args...
that follow after theactor_from_state
argument are passed to the constructor of the state class. CAF will try to find a constructor for the state class by first trying to construct the state with(self, args...)
. If no such constructor exists, CAF will try to construct the state with just(args...)
.
Next Up
In this part, we have learned how to implement actors in an idiomatic way. From
simple actors implemented as a free function to more complex actors implemented
using a state class. From actors that terminate after processing the init
event to actors with more complex behavior that multiplex regular message
handlers with one-shot response handlers.
In part 2, we will re-visit some of the examples from this part and learn how to implement them with statically typed actors.