Learning Resources

Configuration


Introduction

Barely any application runs without a user-provided configuration. At the very least, a distributed system needs to allow users to tell how to reach remote nodes in the network or how other nodes in the network may reach this node.

CAF makes it easy to configure various aspects of applications and also allows users to fine-tune CAF components. As we will see, CAF uses a simple and yet powerful API to handle configurations. Whether the user provides a configuration file, a command line argument, or environment variables, CAF can handle it all.

The Actor System Configuration

The central customization point in CAF is the actor_system_config. On this class, we can register various settings before creating the actor system. Once CAF has parsed the user-provided configuration, it stores the settings in the actor_system_config object. This object is then passed to the actor system constructor.

Of course, the actor_system_config does not only store the user-provided settings. It also allows us to read settings programmatically. Not just for primitive types, but also for arbitrary user-defined types as long as they provide an inspect function.

To retrieve values from a configuration object, CAF offers functions such as get_as and get_or. These functions allow us to convert the stored values to the desired type. If the requested value does not exist or the type conversion fails, CAF either provides an error when using get_as or falls back to a default value when using get_or.

Adding Custom Options

The idiomatic way to run a CAF application with user-defined settings is to implement a subclass of actor_system_config. This subclass may add custom options to the configuration object in the constructor. CAF already adds the default options to the object in the parent constructor.

To see the pieces in motion, let us consider an example service that needs to connect to a database and listens for clients on a specific port. We want to have the environment variables DB_PORT, DB_HOST, and DB_TABLE to override the default values. To register our custom options, we subclass actor_system_config and add the options in the constructor as follows:

class my_config : public caf::actor_system_config {
public:
  my_config() {
    opt_group{custom_options_, "global"}
      .add<bool>("verbose", "enable additional output");
    opt_group{custom_options_, "server"}
      .add<std::string>("listen-address,l", "the optional listen address")
      .add<uint16_t>("port,p", "the port to listen on");
    opt_group{custom_options_, "database"}
      .add<uint16_t>("port,,DB_PORT", "the port to connect to for the database")
      .add<std::string>("host,,DB_HOST", "the database host")
      .add<std::string>("table,,DB_TABLE", "the name of the table to use");
  }
};

When passing a name for an option, CAF expects a comma-separated list of arguments. The first argument is the long name, the second argument is a series of short names (where each character represents a short name), and the third argument is the environment variable name. Only the long name is mandatory.

CAF organizes options in a tree-like structure. Each option group is a node in the tree. The root node is the actor_system_config object itself. When passing global as the group name, the options will be stored in the root node directly.

In our example, the options are organized as follows:

  • root:
    • verbose: a boolean flag to enable additional output
    • server:
      • listen-address: the optional listen address
      • port: the port to listen on
    • database:
      • port: the port to connect to for the database
      • host: the database host
      • table: the name of the table to use

Writing Configuration Files

The tree-based structure of the configuration object is also reflected in the format for the configuration files. For our custom configuration object from above, a configuration file would look like this in JSON:

{
  "verbose": true,
  "server": {
    "listen-address": "127.0.0.1",
    "port": 8080
  },
  "database": {
    "port": 3306,
    "host": "localhost",
    "table": "my_table"
  }
}

JSON is a good choice if you generate your configuration programmatically. However, if configurations are written by hand, JSON is not the most user-friendly format. Luckily, CAF supports a more streamlined syntax for configuration files. We simply drop the outermost braces, omit quotes for keys, use '=' instead of ':', and get rid of unnecessary commas:

verbose = true
server = {
  listen-address = "127.0.0.1"
  port = 8080
}
database = {
  port = 3306
  host = "localhost"
  table = "my_table"
}

These two examples are equivalent. The second one is simply more concise. Of course, the user can also input the configuration via environment variables as well as on the command line. We get to this later. First, we want to see how we can use custom types in our configuration to consume sub-trees of the configuration in one go.

Reading Custom Types from the Configuration

In order to read custom types from the configuration, we need to implement an inspect function for the type. This is the same mechanism that allows CAF to serialize and deserialize custom types on the network.

First, we define a server_config struct that holds the server configuration.

struct server_config {
  uint16_t port;
  std::optional<std::string> listen_address;
};

template <class Inspector>
bool inspect(Inspector& f, server_config& x) {
  return f.object(x).fields(f.field("port", x.port),
                            f.field("listen-address", x.listen_address));
}

Next, we define a database_config struct that holds the database configuration.

struct database_config {
  uint16_t port;
  std::optional<std::string> host;
  std::string table;
};

template <class Inspector>
bool inspect(Inspector& f, database_config& x) {
  return f.object(x).fields(f.field("port", x.port),
                            f.field("host", x.host),
                            f.field("table", x.table));
}

As you probably have noticed, we have defined some member variables as std::optional. This tells CAF that a missing value is not an error. If the user does not provide a value for server_config::listen_address or database_config::host, CAF will simply leave the member variable empty.

With these two structs in place, we can start playing with our custom configuration. We define a caf_main function that takes the actor system and our custom configuration as arguments. CAF will look at the type of the second parameter and automatically use that configuration type to initialize the actor system. Then, we will use get_as and get_or to give a couple examples of how to read values from the configuration.

void caf_main(caf::actor_system& sys, const my_config& cfg) {
  // Get the verbose flag, defaults to false.
  sys.println("verbose: {}", caf::get_or(cfg, "verbose", false));
  // Get the server configuration in one go.
  if (auto server = caf::get_as<server_config>(cfg, "server")) {
    sys.println("server: port = {}, listen-address = {}",
                server->port, server->listen_address);
  } else {
    sys.println("no valid server config available");
  }
  // Get the database configuration in one go.
  if (auto db = caf::get_as<database_config>(cfg, "database")) {
    sys.println("database: port = {}, host = {}, table = {}",
                db->port, db->host, db->table);
  } else {
    sys.println("no valid database config available");
  }
}

CAF_MAIN()

Running the Example

If we compile our example application from before as example1 and then run it without arguments, we will see:

$ ./example1
verbose: false
no valid server config available
no valid database config available

The only field required for the server configuration is the port. We can set this parameter on the command line and see how the output changes:

$ ./example1 -p 8080
verbose: false
server: port = 8080, listen-address = null
no valid database config available

We can also use the long name for the port option:

$ ./example1 --server.port=8080
verbose: false
server: port = 8080, listen-address = null
no valid database config available

Note that the long options support both --option=value and --option value:

$ ./example1 --server.port 8080
verbose: false
server: port = 8080, listen-address = null
no valid database config available

Now, let's pass the JSON configuration from above as a command line argument:

$ ./example1 --config-file example1.json
verbose: true
server: port = 8080, listen-address = *"127.0.0.1"
database: port = 3306, host = *"localhost", table = my_table

Note: CAF renders values with a nullable type using a * prefix, e.g., *"localhost" to indicate that this value could be null.

The config-file options is an implicit option that CAF provides. By default, CAF also provides a help option with short names -h and -? as well as few other default options. Running the binary with -h prints the following help text for our example:

$ ./example1 -h
database options:
  --database.port=<uint16_t>                 : the port to connect to for the database
  --database.host=<std::string>              : the database host
  --database.table=<std::string>             : the name of the table to use

global options:
  (-h|-?|--help)                             : print help text to STDERR and exit
  --long-help                                : same as --help but list options that are omitted by default
  --dump-config                              : print configuration and exit
  --config-file=<std::string>                : sets a path to a configuration file
  --verbose                                  : enable additional output

server options:
  (-l|--server.listen-address) <std::string> : the optional listen address
  (-p|--server.port) <uint16_t>              : the port to listen on

The last way to pass the configuration is via environment variables. By default, CAF converts the option name to all-uppercase and separates words with an underscore. We didn't explicitly set a environment variable name for the server.port option, so CAF will auto-generate one:

$ export SERVER_PORT=8080
$ ./example1
verbose: false
server: port = 8080, listen-address = null
no valid database config available

For the database options, we did override the default environment variable names with DB_PORT, DB_HOST, and DB_TABLE. When setting these environment variables, we see the following output:

$ export DB_HOST=127.0.0.1
$ export DB_TABLE=foo
$ export DB_PORT=1234
$ ./example1
verbose: false
no valid server config available
database: port = 1234, host = *"127.0.0.1", table = foo

Before we move on to the next section, let's see what happens if we provide a configuration file and environment variables:

$ export DB_HOST=127.0.0.1
$ export DB_TABLE=foo
$ export DB_PORT=1234
$ ./example1 --config-file example1.json
verbose: true
server: port = 8080, listen-address = *"127.0.0.1"
database: port = 1234, host = *"127.0.0.1", table = foo

Remember, our configuration file sets the database port to 3306, the host to localhost, and the table to my_table. As we can see, the environment variables override the configuration file.

When parsing the configuration, CAF will use the following order of precedence:

  1. Command line arguments
  2. Environment variables
  3. Configuration file

The command line arguments have the highest precedence, as we can see in the following example:

$ export DB_HOST=127.0.0.1
$ export DB_TABLE=foo
$ export DB_PORT=1234
$ ./example1 --config-file example1.json --database.port=2200
verbose: true
server: port = 8080, listen-address = *"127.0.0.1"
database: port = 2200, host = *"127.0.0.1", table = foo

Using and Extending Dump Config

The --dump-config option is a built-in option that prints the configuration and exits. The output of this option is generated from the dump_content member function. The output includes any options that were set by the user, as well as the default values if available.

In order to customize the output of --dump-config, we can override the dump_content member function. For example, we could use port 8080 as the default port for the server (by always using get_or with a default value of 8080). To show this default value in the output of --dump-config, we need to override the dump_content member function and add the default value to the output.

The following code snippet shows how to override the dump_content member function. By calling super::dump_content(), we first retrieve the default output. Then, we extend the output with caf::put_missing (which will not modify the dictionary if a value is already present) and return it.

class my_config : public caf::actor_system_config {
public:
  using super = actor_system_config;

  my_config() {
    opt_group{custom_options_, "global"}
      .add<bool>("verbose", "enable additional output");
    opt_group{custom_options_, "server"}
      .add<std::string>("listen-address,l", "the optional listen address")
      .add<uint16_t>("port,p", "the port to listen on");
    opt_group{custom_options_, "database"}
      .add<uint16_t>("port,,DB_PORT", "the port to connect to for the database")
      .add<std::string>("host,,DB_HOST", "the database host")
      .add<std::string>("table,,DB_TABLE", "the name of the table to use");
  }

  caf::settings dump_content() const override {
    auto result = super::dump_content();
    auto& server = result["server"].as_dictionary();
    caf::put_missing(server, "port", 8080);
    return result;
  }
};

The type settings is a dictionary that maps strings to config_value, which is a recursive type that can hold primitive values as well as lists and dictionaries. We will see how to work with config_value in more detail in the next section.

Now, when we run the application with the --dump-config option, we see the default port value in the output:

$ ./example2 --dump-config
server {
  port = 8080
}

Config Values

The class config_value represents a primitive value (numbers, booleans, strings), none (no value), a list of config_value objects, or a dictionary mapping strings to config_value objects.

The type settings that we have seen earlier is an alias for dictionary<config_value>, whereas dictionary is a std::map-like type that always uses strings as keys.

Usually, we do not work with config_value directly. Instead, we use the functions get_as and get_or directly on the actor_system_config object. However, sometimes there are cases where we need to work with config_value directly. For example, when overriding the dump_content member function. So, if you are interested in the details of the config_value API, read on! However, of course feel free to skip right to the conclusion if you are not interested in the finer details.

Basics

Much like a regular std::variant, a config_value accepts any input in its constructor that it can convert to one of its types. For example, we can define the three config values x, y and z as integer, floating point and string as follows:

Source Code

auto x = caf::config_value{1};
auto y = caf::config_value{2.0};
auto z = caf::config_value{"three"};
sys.println("x = {}", x);
sys.println("y = {}", y);
sys.println("z = {}", z);

Output

x = 1
y = 2
z = three

Users may treat config_value as a simple sum type similar to std::variant by using functions like get, get_if, and holds_alternative. However, most of the time we will use the function pair get_as and get_or that we have seen in the previous sections.

On-the-fly Conversions with get_as

Configuration values never exist in a vacuum. They typically represent input by the user. That input may be a string that actually represents a timespan. Luckily ,the function get_as is the Swiss Army knife of type conversions. The function takes one template parameter T that represents the target type and it returns expected<T>. An expected<T> represents an optional value, but unlike std::optional it carries an error if no value exists or if the conversion fails.

Source Code

auto x = caf::config_value{"5s"};
if (auto ts = caf::get_as<caf::timespan>(x))
  sys.println("ts = {}", *ts);
else
  sys.println("oops: {}", ts.error());

Output

ts = 5s

CAF knows how to convert the string "5s" into a timespan. It also knows how to convert numbers in case a user typed in an integer while the system expects a double. Basically, CAF may perform all sorts of type conversions as long as the target type may reasonably hold the value. CAF will also perform bound checks for integral types. For example, if the user inputs a number that is too large to fit into a 16-bit integer, the conversion will fail:

Source Code

auto x = caf::config_value{42};
if (auto narrow_x = caf::get_as<uint16_t>(x))
  sys.println("narrow_x = {}", *narrow_x);
else
  sys.println("oops: {}", narrow_x.error());
auto y = caf::config_value{1'000'000};
if (auto narrow_y = caf::get_as<uint16_t>(y))
  sys.println("narrow_y = {}", *narrow_y);
else
  sys.println("oops: {}", narrow_y.error());

Output

narrow_x = 42
oops: conversion_failed("narrowing error")

The function get_as only performs safe conversions. In this case, by performing bound checks. Hence, only the conversion for 42 succeeds, because 1,000,000 does not fit into 16 bit!

While converting between builtin types is neat, the real power of get_as comes from the fact that it tightly integrates with the type inspection API! We have already seen this in action when we converted a dictionary to a server_config object. However, CAF can go even further and first convert a string to a dictionary and then convert that dictionary to a custom type. All in one shot.

For our next example, we will use this simple point_2d struct:

struct point_2d {
  int32_t x;
  int32_t y;
};

template <class Inspector>
bool inspect(Inspector& f, point_2d& x) {
  return f.object(x).fields(f.field("x", x.x), f.field("y", x.y));
}

Then, we can read a point_2d directly from a config_value that holds a string (representing a dictionary) as follows):

Source Code

auto x = caf::config_value{"{x = 12, y = 21}"} ;
if (auto point = caf::get_as<point_2d>(x))
  sys.println("got a point: ({}, {})", point->x, point->y);
else
  sys.println("oops: {}", point.error());

Output

got a point: (12, 21)

We have also used get_or before. The only thing left worth mentioning is that get_or will use get_as internally. Hence, it can do all the conversions that get_as can do. The only difference is that get_or will return the fallback value if the conversion fails.

Lists

Aside from storing single values, the type config_value can also store lists. Each element in the list is a config_value again. Hence, we can nest lists arbitrarily.

Creating Lists

The constructor of config_value is explicit to stop the compiler from automatically converting values to config_value everywhere. However, this makes initializing a list of config values cumbersome:

// Note: caf::config_value::list xs{1, 2, 3}; -- will not compile
auto xs = caf::config_value::list{caf::config_value{1},
                                  caf::config_value{2},
                                  caf::config_value{3}};
sys.println("{}", xs);

The above snippet prints [1, 2, 3], but it requires a lot of boilerplate code to initialize the list. Constructing a config value from a list adds even more boilerplate code, because we need to wrap the entire initialization again:

auto xs = caf::config_value{caf::config_value::list{caf::config_value{1},
                                                    caf::config_value{2},
                                                    caf::config_value{3}}};
sys.println("{}", xs);

The second snippet also prints [1, 2, 3]. The only difference is that xs is a config value holding a list this time. To make working with lists easier, CAF offers the factory function make_config_value_list:

Source Code

auto xs = caf::make_config_value_list(1, 2, 3);
sys.println("{}", xs);

Output

[1, 2, 3]

Since config value lists are heterogeneous, we can also construct a list with mixed types:

Source Code

auto xs = caf::make_config_value_list(1, "two", 3.0);
sys.println("{}", xs);

Output

[1, "two", 3]

Using as_list

Sometimes, we receive a config value and need to convert it to a list before continuing. If the value already contains a list then we want to make sure not to override it, because we want to keep existing entries. For this particular use case, config_value provides the member function as_list:

Source Code

auto x = caf::config_value{};
auto y = caf::config_value{42};
auto z = caf::make_config_value_list(1, 2, 3);
sys.println("(1) x as list = {}", x.as_list());
sys.println("(2) y as list = {}", y.as_list());
sys.println("(3) z as list = {}", z.as_list());

Output

(1) x as list = []
(2) y as list = [42]
(3) z as list = [1, 2, 3]

In the first case, we convert nothing to a list. The only way CAF could perform this conversion is by creating an empty list. In the second case, the variable y contains the integer 42. Here, CAF simply lifts the single value into a list with one element. Lastly, z already contains a list, so CAF can simply return the stored list in this case without any conversion.

Working with as_list avoids unnecessary boilerplate code, as we can see in the following example that creates a list of three lists of three integers each:

Source Code

caf::config_value x;
sys.println("(1) x = {}", x);
auto& ls = x.as_list();
sys.println("(2) x = {}", x);
ls.resize(3); // Fills the list with three null elements.
sys.println("(3) x = {}", x);
auto num = int64_t{0};
for (auto& element : ls) {
  auto& nested = element.as_list();
  nested.emplace_back(num);
  for (++num; num % 3 != 0; ++num)
    nested.emplace_back(num);
}
sys.println("(4) x = {}", x);

Output

(1) x = null
(2) x = []
(3) x = [null, null, null]
(4) x = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

Running the example prints x four times:

  1. The first line prints the default-constructed x. As we can see, initially it is just null.
  2. The second time we print x is after we have called as_list on it. This member function converts the config value to a list. Hence, the second line shows [].
  3. After resizing the vector, we have three null objects in the list.
  4. Finally, the fourth line displays our final result after filling x with the desired content in the for-loop.

Converting to Homogeneous Lists

We already know how to conveniently create a list of config values using make_config_value_list:

auto xs = caf::make_config_value_list(1, 2, 3);
sys.println("{}", xs);

Our container xs from the snippet above consists of integers only. Just like get_as converts to single values, the function also knows how to convert the config value lists to homogeneous list types:

Source Code

auto xs = caf::make_config_value_list(1, 2, 3);
if (auto ints = caf::get_as<std::vector<int32_t>>(xs)) {
  sys.println("xs is a vector of int: {}", *ints);
} else {
  sys.println("xs is not a vector of int: {}", ints.error());
}

Output

xs is a vector of int: [1, 2, 3]

Note that CAF does not limit its users to std::vector. The automatic unboxing supports all types that behave like STL containers such as std::vector, std::list, std::set and std::unordered_set

Converting to Tuples

Because config_value may hold a variety of types, config_value::list may hold elements of different types. Such lists cannot convert to std::vector or similar data structures except when converting to a type that can construct from different types. However, heterogeneous lists can convert to tuples:

Source Code

auto xs = caf::make_config_value_list(1, "two", 3.3);
if (auto tup = caf::get_as<std::tuple<int32_t, std::string, double>>(xs)) {
  sys.println("tup: {}", *tup);
} else {
  sys.println("oops: {}", tup.error());
}

Output

tup: [1, "two", 3.3]

This also applies to std::array and std::pair. To CAF, every type that specializes std::tuple_size etc. is treated the same way with respect to get_as and get_or.

Source Code

auto xs = caf::make_config_value_list(1, "two");
if (auto tup = caf::get_as<std::pair<int32_t, std::string>>(xs)) {
  sys.println("tup: {}", *tup);
} else {
  sys.println("oops: {}", tup.error());
}
auto ys = caf::make_config_value_list(1, 2, 3);
if (auto arr = caf::get_as<std::array<int32_t, 3>>(ys)) {
  sys.println("arr: {}", *arr);
} else {
  sys.println("oops: {}", arr.error());
}

Output

tup: [1, "two"]
arr: [1, 2, 3]

Dictionaries and Settings

In CAF, settings is an alias for dictionary<config_value>. A dictionary is a map with string keys. Semantically, a dictionary<T> is equivalent to a std::map<std::string, T>.

Using as_dictionary

Analogous to as_list, CAF also offers an as_dictionary member function that returns the config_value as a dictionary, converting it to a dictionary if needed.

Source Code

auto x = caf::config_value{};
sys.println("(1) x = {}", x);
auto& dict = x.as_dictionary();
sys.println("(2) x = {}", x);
dict.emplace("foo", "bar");
dict.emplace("int-value", 42);
dict.emplace("int-value", 23);
sys.println("(3) x = {}", x);

Output

(1) x = null
(2) x = {}
(3) x = {foo = "bar", "int-value" = 42}

Running the example prints x three times again:

  1. Initially, x is null once again.
  2. After calling as_dictionary, x now is an empty dictionary.
  3. Just like std::map, calling emplace tries to add a new entry to the dictionary and does nothing if the entry already exists. This means that 42 will remain associated to the key int-value (not 23).

Just like with as_list, CAF tries to convert the content of a config value to a dictionary. However, dictionaries require two values: key and value. Hence, the only other data structures that CAF can convert into dictionaries are lists of lists, where each nested list has exactly two values, and strings that can be parsed into a valid dictionary:

Source Code

auto x = caf::make_config_value_list(caf::make_config_value_list("one", 1),
                                     caf::make_config_value_list("two", 2),
                                     caf::make_config_value_list("three", 3));
sys.println("(1) x = {}", x);
x.as_dictionary();
sys.println("(2) x = {}", x);
auto y = caf::config_value{"{answer = 42}"};
sys.println("(3) y = {}", y);
y.as_dictionary();
sys.println("(4) y = {}", y);

Output

(1) x = [["one", 1], ["two", 2], ["three", 3]]
(2) x = {one = 1, three = 3, two = 2}
(3) y = {answer = 42}
(4) y = {answer = 42}

The automatic parsing of strings to dictionaries also enables the conversion from strings to point_2d we observed earlier. The inspect function exposes the fields inside an object to CAF and we can naturally translate a dictionary to an object by interpreting the keys as field names. So as long as a config_value represents a dictionary, CAF can use the inspect function for trying to construct a C++ object.

Converting to Regular Map Types

At this point, you can probably guess what our next example illustrates. Yes, get_as once last time!

Source Code

auto x = caf::config_value{};
auto& dict = x.as_dictionary();
dict.emplace("1", 10);
dict.emplace("2", 20);
dict.emplace("3", 30);
if (auto m1 = caf::get_as<std::map<double, int32_t>>(x))
  sys.println("m1: {}", *m1);
else
  sys.println("oops: {}", m1.error());
if (auto m2 = caf::get_as<std::unordered_map<int32_t, double>>(x))
  sys.println("m2: {}", *m2);
else
  sys.println("oops: {}", m2.error());

Output

m1: {1 = 10, 2 = 20, 3 = 30}
m2: {3 = 30, 2 = 20, 1 = 10}

As you can see, get_as also performs "deep" conversions by converting the string key of the dictionary to another type. In this case, CAF converts the strings to integers or floating point numbers as needed.

Conclusion

CAF provides a powerful API for handling configurations. The actor_system_config object is the central customization point in CAF. It allows users to register custom options and read settings programmatically.

From the custom options, CAF automatically creates parsers for command line arguments, environment variables, and configuration files. The user can provide a configuration in any of these formats, and CAF will parse it into the actor_system_config object.

If multiple sources provide the same option, CAF follows the order of precedence that most POSIX applications use: command line arguments have the highest precedence, followed by environment variables, and finally configuration files.

To deal with the parsed configuration, CAF provides the get_as and get_or functions. These functions allow users to convert the stored values to the desired type. The type conversions leverage the type inspection API to allow users to convert custom types from the configuration. The conversions will also perform bound checks for integral types.