Learning Resources

Implementing Actors

Part 1


Introduction

CAF attaches many labels to actors: event-based, blocking, statically typed, dynamically typed, and so on. When delving into the depths of the API, users may get lost in technical details and subtleties without grasping the big picture first.

In this guide, we first take a high-level look at the basic ideas and concepts behind the API before we discuss how the pieces fit together and delve into implementation details. We aim to provide a solid foundation for understanding the API and its design choices.

The Actor Model

Before we discuss API details of CAF, here is a quick refresher on actors in general. The original formulation of the Actor Model of computation by Hewitt, Bishop, and Steiger, ties three things to an actor:

  1. Processing: CPU cycles and internal control flow.
  2. Communications: Sending and receiving messages.
  3. Storage: Member variables and other state.

Conceptually, we can imagine an actor as depicted in this diagram:

Conceptual view of an actor.

As the name suggests, an actor is an active software entity. It shields its inside from the outside world. Only messages may cross the boundary between inside and outside. No internal variable may gets accessed from the outside, only the mailbox.

When holding an handle to an actor, you may do one of two things: send a message to the actor by enqueueing to its mailbox or observe the lifetime of the actor by monitoring or linking to it.

Handle Types: The Outside View

The concept view now allows us to categorize some of the building blocks in the CAF API. Remember, there is an inside and an outside. When standing outside of an actor, it appears as a black box with only a mailbox standing out for interacting with it. What kind of mailbox, though? That depends on the handle type:

strong weak
dynamically typed actor
statically typed typed_actor<...>
untyped strong_actor_ptr actor_addr

There are four different handle types in CAF in total. We ignore the strong/weak categorization for now. The most important difference between the handle types is: what kind of mailbox do you see on the outside?

The handle type actor gives you a dynamically typed mailbox on the outside. This means you can put any message into the mailbox. Whether the actor actually understands this input or not is decided once the actor processes the message. This means errors only happen at run-time.

The handle type typed_actor<...> gives you a statically typed mailbox on the outside. This means there is a list of allowed message types and you may not enqueue anything else. The compiler catches type errors for you. This comes at the cost of more boilerplate code.

Lastly, there are two untyped handle types. Here, you see no mailbox at all! The best way to think of these two is as a type-erased pointer to an actor. Usually, you need to restore type information via actor_cast before you can do anything useful with these handles.

Typing aside, CAF also distinguishes between strong and weak references to an actor. Actors in CAF are reference counted in order to enable the run-time system to detect and dispose unreachable actors.

If they do not allow sending messages, what use case do the untyped handles have? Let us start with actor_addr, because it has a very specific use case: exit messages. When two actors are linked, CAF sends an exit_msg to the other if one of them terminates. CAF has no type information about the terminated actor. Further, exit messages cannot include a strong reference to the terminated actor, because this actor is probably already destroyed and we may no longer access its mailbox. The most useful thing actor_addr has to offer is its operator==. Once an actor receives an exit message, it may compare the received actor_addr to some list or map of known actors. For example, a supervisor may monitor its workers for re-spawning them on error. After receiving a down message, the supervisor iterates its list of workers to replace the matching entry with a newly spawned worker. The handle type actor_addr can generally be used to store a reference to an actor without keeping it alive.

Our second untyped handle is strong_actor_ptr. Unlike its weak counterpart, this handle type keeps actors from becoming unreachable. The most notable use case for strong_actor_ptr is for storing the sender of a message. CAF has no knowledge of whether a message came from a dynamically or statically typed actor. A receiver can reasonably assume that the sender is going to provide a handler for the response message (if any), but the receiver has no further knowledge regarding the type of the sender. Actors can access the source of a message when inside a message handler by calling self->current_sender(). If CAF would use a weak handle for storing sender information, then an actor could become unreachable immediately after sending a message and before receiving the response. Aside from sender information, there remain only a few places in CAF where users may encounter a strong_actor_ptr. Notable examples include the actor registry and when communicating with the middleman actor.

Implementation Types: The Inside View

At a first glance, there seem to be many actor types in CAF. However, we can prune some choices right away by following this simple rule: do not implement blocking actors! If you ever find yourself in need of a blocking way to interact with other actors, then use scoped_actor. If you need an actor to have its own thread of execution, then spawn it using the detached flag. Consider blocking_actor an implementation detail.

When reading the Actors Section in the official Manual without having a solid grasp on terminology and concepts, the various actor types in the class hierarchy may confuse.

Luckily, the picture drastically simplifies if we just focus on the choices we need to make when implementing an actor. We really have only one choice to make: what type do we need to assign to self. The concept of self is analogous to the this pointer in object-oriented programming and allows actors to refer back to themselves. Furthermore, CAF will also look at our choice for self to pick the appropriate handle type for the actor, i.e., actor or typed_actor<...>.

So ultimately, we have two distinguish between dynamically and statically typed actors. For dynamically typed actors, self is of type event_based_actor*. For statically typed actors, self is of type typed_actor<...>::pointer_view. In part 1, we will only look at the dynamically typed actors, i.e., self will be of type event_based_actor*. In part 2, we will discuss statically typed actors.

To break things down further, we can step back and remember the three things an actor encapsulates: Processing, Communications, and Storage.

Processing

The Processing part is the control flow of an actor. Actors in CAF are event-based, meaning they follow a simple state machine:

  1. Wait for an event (usually a message).
  2. Process the event.
  3. Terminate when done, otherwise goto 1.

With this in mind, we can look at our first example.

Source Code

void my_actor_impl(caf::event_based_actor* self) {
  self->println("initialize my_actor");
}

void caf_main(caf::actor_system& sys) {
  sys.spawn(my_actor_impl);
}

Output

initialize my_actor

The simplest way to write an actor is to provide a free function. It optionally takes a self argument as first parameter. When omitting the self parameter, CAF will assign the actor a dynamically typed handle.

The function body of my_actor_impl is called to handle the first event for the actor: the implicit init event. Here, we print a message to the console and do nothing else. Hence, the actor terminates immediately after the first event.

So how can we tell CAF what to do with the next event? Any other event besides the init event is a message. We can tell an actor how to process a message by providing a behavior. In other words, defining what to do in step 2 of the state machine. Actors may set a behavior by calling self->become(...), as shown in our next example.

Source Code

void my_actor_impl(caf::event_based_actor* self) {
  self->println("initialize my_actor");
  self->become(
    [self](const std::string& str) {
      self->println("my_actor received: {}", str);
    }
  );
}

void caf_main(caf::actor_system& sys) {
  auto my_actor = sys.spawn(my_actor_impl);
  caf::anon_mail("hello!").send(my_actor);
}

Output

initialize my_actor
my_actor received: hello!

Because actors will almost always set a behavior, there is shortcut in CAF to define the initial behavior. Instead of calling self->become(...) in the function body, we can simply return a caf::behavior, as shown below.

Source Code

caf::behavior my_actor_impl(caf::event_based_actor* self) {
  self->println("initialize my_actor");
  return {
    [self](const std::string& str) {
      self->println("my_actor received: {}", str);
    },
  };
}

void caf_main(caf::actor_system& sys) {
  auto my_actor = sys.spawn(my_actor_impl);
  caf::anon_mail("hello!").send(my_actor);
}

Output

initialize my_actor
my_actor received: hello!

These two examples are equivalent, but the second one is more idiomatic. This time, we use the init event to tell the actor how to proceed with the next events, i.e., messages. The actor will use its behavior to process any incoming messages until it terminates. However, we did not tell it to terminate in our example. So why does the application terminate? This goes back to our observation that there are strong and weak references to actors. The actor system will automatically terminate any actor once its strong reference count drops to zero. In our example, the variable my_actor holds a strong reference to the actor. Once my_actor goes out of scope, the reference count decreases by one. We also create a strong reference to the actor by sending it a message. As long as there is at least one message in the mailbox, the scheduler in CAF will hold a strong reference to the actor. Once the mailbox is empty and the actor is no longer running, CAF will decrease the reference count by one. So in our example, the actor will be cleaned up as unreachable after caf_main returns and the message has been processed.

We can also explicitly tell an actor to terminate by calling self->quit(). The actor will then discard any further messages and terminate after processing the current one, as shown in the next example.

Source Code

caf::behavior my_actor_impl(caf::event_based_actor* self) {
  self->println("initialize my_actor");
  return {
    [self](const std::string& str) {
      self->println("my_actor received: {}", str);
      self->quit();
    },
  };
}

void caf_main(caf::actor_system& sys) {
  auto my_actor = sys.spawn(my_actor_impl);
  caf::anon_mail("message 1").send(my_actor);
  caf::anon_mail("message 2").send(my_actor);
}

Output

initialize my_actor
my_actor received: message 1

Although we are sending two messages to the actor, it will only process the first one and then terminate. The second message will be discarded.

One note on self->quit(): the actor will terminate with exit reason normal. The function also takes an optional argument to specify a different exit reason in case the actor terminates due to an error. As part of the termination process, CAF send the exit reason to all linked actors and monitors (to learn more about that topic, please read our guide on Monitoring and Linking).

Communications

We have seen how to define the control flow of an actor. The second part of the actor model is about Communications. In CAF, actors communicate by sending messages. We have already seen how to send a message to an actor by using anon_mail. However, usually we are using self->mail(...) to initiate a message. There are generally two styles of messages: fire-and-forget and request-response.

A fire-and-forget message is sent without expecting a response. The sender does not wait for a response and any message that the receiver sends back is processed by the regular actor behavior. In contrast, a request-response message has a timeout and an explicit response handler.

Our next example shows how to send both styles of messages and how actors process responses.

Source Code

caf::behavior adder_impl() {
  return {
      [](int32_t x, int32_t y) {
        return x + y;
      },
  };
}

caf::behavior client1_impl(caf::event_based_actor *self, caf::actor adder) {
  self->mail(int32_t{1}, int32_t{2}).send(adder);
  return {
      [self](int32_t y) {
        self->println("client1 received: {}", y);
      },
  };
}

caf::behavior client2_impl(caf::event_based_actor *self, caf::actor adder) {
  self->mail(int32_t{1}, int32_t{2}).request(adder, 1s).then([self](int32_t y) {
    self->println("client2 received: {} (response)", y);
  });
  return {
      [self](int32_t y) {
        self->println("client2 received: {} (behavior)", y);
      },
  };
}

void caf_main(caf::actor_system &sys) {
  auto adder = sys.spawn(adder_impl);
  sys.spawn(client1_impl, adder);
  sys.spawn(client2_impl, adder);
}

Output

client1 received: 3
client2 received: 3 (response)

When looking at the adder_impl, we can see that there is no explicit messaging at all. The actor simply returns the sum of two integers. When returning a value from a message handler, CAF automatically sends the value back to the sender. This mechanism allows CAF to correlate input and output messages correctly.

In client 1, we use self->mail(...).send(...) to send a fire-and-forget message to the adder. The response is processed by the regular behavior of the client and it will print client1 received: 3.

In client 2, we use self->mail(...).request(...).then(...) instead. This will cause CAF to send a request-response message to the adder. The client will wait up to one second for a response. If the response arrives in time, the client will print client2 received: 3 (response). Otherwise, it will terminate with an error by default. Optionally, we could pass a second lambda to then that takes a caf::error as argument to override the default error handling. From the output, we can see that client 2 will use the response handler when the message from the adder arrives and the handler from the regular behavior will not be called.

We an also call await instead of then for passing a response handler. The difference is that await will suspend the regular actor behavior until the response arrives while then will multiplex the response handler with the regular behavior. Usually, await is only necessary to enforce a specific ordering of messages and we will use then in most cases.

As we can see, the request-response style of messaging directly feeds into the control flow of an actor.

Storage

Now, we know how to define the control flow of an actor and how to communicate with other actors. The last part of the actor model is about Storage. Here, we mean Storage in the sense of any state that an actor may hold.

We have actually already seen one way of holding on to some state in an actor: the capture list of the lambda expressions that we have used to define the behavior. We captured self in the lambda expressions, but we could have captured any other state needed by a particular message handler.

To illustrate this, we implement a simple cell actor that stores a single 32-bit integer. The actor provides two message handlers: one to get the current value and one to set a new value.

Source Code

caf::behavior cell_impl(caf::event_based_actor* self) {
  auto value = std::make_shared<int32_t>(0);
  return {
    [value](caf::get_atom) {
      return *value;
    },
    [self, value](caf::put_atom, int32_t new_value) {
      self->println("cell changes its value from {} to {}", *value, new_value);
      *value = new_value;
    },
  };
}

void client_impl(caf::event_based_actor *self, caf::actor cell) {
  self->mail(caf::put_atom_v, int32_t{1}).send(cell);
  self->mail(caf::put_atom_v, int32_t{2}).send(cell);
  self->mail(caf::put_atom_v, int32_t{3}).send(cell);
  self->mail(caf::get_atom_v).request(cell, 1s).then([self](int32_t value) {
    self->println("client received: {}", value);
  });
}

void caf_main(caf::actor_system &sys) {
  auto cell = sys.spawn(cell_impl);
  sys.spawn(client_impl, cell);
}

Output

cell changes its value from 0 to 1
cell changes its value from 1 to 2
cell changes its value from 2 to 3
client received: 3

Since multiple message handlers access the same state, we use a shared pointer to store the value. While this is a simple way to store state, it is not the most efficient and it gets very cumbersome and hard to maintain when we need to store more complex state.

CAF allows us to store state in a more structured way by writing a state class. A state class is a user-defined class that must provide a (public) make_behavior member function to initialize the actor. CAF automatically constructs and destructs the state object when the actor starts and stops. To spawn an actor from a state class, we use the actor_from_state utility and pass the state class as template argument, as shown in our next example.

Source Code

struct cell_state {
  cell_state(caf::event_based_actor* ptr, int32_t init_value)
    : self(ptr), value(init_value) {
    // nop
  }

  caf::behavior make_behavior() {
    return {
      [this](caf::get_atom) {
        return value;
      },
      [this](caf::put_atom, int32_t new_value) {
        self->println("cell changes its value from {} to {}", value, new_value);
        value = new_value;
      },
    };
  }

  caf::event_based_actor* self = nullptr;
  int32_t value = 0;
};

void client_impl(caf::event_based_actor* self, caf::actor cell) {
  self->mail(caf::put_atom_v, int32_t{1}).send(cell);
  self->mail(caf::put_atom_v, int32_t{2}).send(cell);
  self->mail(caf::put_atom_v, int32_t{3}).send(cell);
  self->mail(caf::get_atom_v).request(cell, 1s).then([self](int32_t value) {
    self->println("client received: {}", value);
  });
}

void caf_main(caf::actor_system& sys) {
  auto cell = sys.spawn(caf::actor_from_state<cell_state>, -1);
  sys.spawn(client_impl, cell);
}

Output

cell changes its value from -1 to 1
cell changes its value from 1 to 2
cell changes its value from 2 to 3
client received: 3

Our implementation for client_impl did not change. However, this time we have a new cell_state class that encapsulates the state of the cell actor. Now, message handlers only need to capture this to get access of all of the member variables of the state class. This makes it easier to maintain and extend the state of an actor. The state class also provides the make_behavior member function to initialize the actor.

In our example, the constructor of cell_state takes the self pointer and the initial value of the cell. The self pointer is optional. The rules for actor_from_state are straightforward:

  • The type for self is deduced from the signature of the make_behavior member function. If the make_behavior member function returns a caf::behavior, then self is of type caf::event_based_actor*. In part 2, we will also see examples with statically typed actors.
  • Any additional arguments args... that follow after the actor_from_state argument are passed to the constructor of the state class. CAF will try to find a constructor for the state class by first trying to construct the state with (self, args...). If no such constructor exists, CAF will try to construct the state with just (args...).

Next Up

In this part, we have learned how to implement actors in an idiomatic way. From simple actors implemented as a free function to more complex actors implemented using a state class. From actors that terminate after processing the init event to actors with more complex behavior that multiplex regular message handlers with one-shot response handlers.

In part 2, we will re-visit some of the examples from this part and learn how to implement them with statically typed actors.