diff --git a/docs/runtime/consensus.md b/docs/runtime/consensus.md index f15b30cf115e4f693f1a8c052b2fb403d7527b04..d75798c73439b007ab04652cdd8286b52b06ac30 100644 --- a/docs/runtime/consensus.md +++ b/docs/runtime/consensus.md @@ -1,51 +1,113 @@ -# Consensus Algorithm - -## Introduction. - -An essential concept within the Reowolf language is the `sync` block. Behaviours that are specified within such a block (as imperative code, containing the instructions to send or receive information, and conditions on the values in memory) must agree with all other parties that participate in the interaction. An additional concept within such a `sync` block is speculative execution. Code that uses this execution temporarily forks and is allowed to perform multiple behaviours at the same time. At the end of the `sync` block only one particular execution (i.e. local behaviour) is allowed to complete. This results in additional complexity in finding satisfactory global behaviour. - -This document attempts to explain the chosen implementation of the initial consensus protocol. At some point one should be able to write consensus protocols associated with `sync` blocks within PDL. As initial experimentation (mainly to see which information should be available to a programmer using PDL) the consensus protocol will be written in the language in which the runtime is written. - -The Reowolf 1.2 consensus protocol aims to fix several issues that were present in the Reowolf 1.0 consensus protocol, among which: - -- The newer protocol should not synchronize among all known connectors on a machine. Rather, it should only aim to achieve consensus among the connectors that are actually communicating to one-another in the same interaction. Any connector that does not send or receive messages to this "synchronous region" does not belong to that synchronous region. -- The newer protocol should aim to be leaderless. The old protocol featured a leader per interaction. Both the leader election itself and the subsequent picking of the global behaviour caused a large message overhead. Additionally the leader is tasked with a large computational overhead. Especially in the context of relatively large synchronous regions where some participants are running code on low-powered devices this is troublesome. -- With regards to performance, the new consensus protocol should aim to reduce the message complexity and amount of transmitted bytes as much as possible. Additionally computational complexity should be reduced by attempting to perform a reduction in the number of valid local connector behaviours, thereby reducing the search space of the global connector behaviour. - -In the following discussion, there is a lot of room for optimization. But we'll describe the general algorithm first, and the specific optimizations in a different document in the future. - -## Data Structures - -### Speculative Execution - -First we create a data structure for the speculative execution itself. Speculative execution is tracked in memory as an execution tree. At the root we find the very first bit of code that is executed without any speculative execution. Each node contains the executed code associated with a particular branch. The edges in this tree might imply speculative execution (if there is more than one edge leaving a particular node), or might simply imply a "dependency" (explained later) if there is only one edge. - -At the leaves of the tree we find successful executions, where the very last instruction is the "end of the sync block" instruction. Reaching such a leaf implies that we found a local behaviour that satisfies the local constraints placed upon the behaviour. If we trace the path from the root to the particular leaf then we find the execution path. If one imagines that all of the code in all of the nodes in the execution path are concatenated, then one finds all executed instructions in order of execution. - -Each time a connector reaches a sync block, it will associate a number with that sync block. We'll call this the `round ID`. Each executed sync block will have a unique `round ID` (up to some reasonable limit in case of integer overflow). Likewise each of the nodes in the execution tree will have a unique number called the `branch ID`. The branch ID is unique among all branches in the execution tree, but numbers may be reused in different `round ID`s. - -### Tracking Dependencies - -One of the implications of being able to send messages and perform speculative execution is that branches will also be created upon receiving messages. One may imagine connectors `S` and `R`. `R` simply has the behaviour of receiving a message and handing it off to some native application. But `S` branches and sends, in each branch, a message over the same port. This implies that `R` will also end up with two branches: one per received message. In order to track dependencies between these two parties it is sufficient to annotate each message with its sender's branch number. Afterwards we can pick the branch numbers that are consistent between the two parties. - -When more than two parties are communicating, the behaviour becomes more complicated. A series of connectors `A`, `B`, `C`, etc. may have behaviours that depend on one-another in convoluted fashion. A particular valid execution trace in `A` may have send message to multiple different connectors `B`, `C` and `D`, influencing their speculative behaviour. In turn `B`, `C` and `D` may have done some branching on their own, and each of them sends messages to a final connector `E`. We now have that the branches in `B`, `C` and `D` depend on `A`, and `E` depending on the former three. A consensus protocol needs to be able to reason about these dependencies and, when a solution is possible, pick a single execution path in each of the connectors. - -In order to achieve this, we'll simplify the subsequent discussion for now by assuming that there is some later algorithm that will kick in once a connector has found a local solution. This algorithm will somehow seek among all connectors if they agree with a particular solution. For now we'll just consider the necessary information that needs to be provided to this algorithm in order for it to find a solution. -\ -\ -\ -To start the train of thought, suppose that each connector that sends a message will append its execution path's branch numbers, and any of the branch numbers it has received through messages. This implies that each branch in the execution tree is associated with a mapping from `connector ID` to a set of branch numbers. If a connector receives a message then it can deposit the message in a branch if the received message's mapping contains `connector ID`s that map to a branch number set that is a superset of branch numbers in the branch's mapping itself. There are no restrictions on the set of `connector ID`s itself. Only on the branch number sets that are associated with the intersection of the `connector ID` sets. - -The upside of this scheme is that each connector has a complete view of the dependencies that exist within the synchonous region that resulted in the branch. The downside is that the amount of data quickly balloons. Each branch that encountered a `get` call needs to wait for more messages, and needs to keep the complete branch number mapping around. - -The subsequent algorithm, the one that makes sure that everyone agrees to a particular solution, then results in sending around this mapping, each connector adding its own compatible branch number mapping to it (or, if there is no compatible mapping, deleting the solution). If this messages reaches all connectors, and all connectors agree to the chosen mapping, then we have found a solution. -\ -\ -\ -A different approach would be to take a different look at the global behaviour centered around the channels themselves. Two connectors can only have a dependency on one another if they communicate through a channel. Furthermore, suppose connector `A` sends to `B` and `B` sends to `C`. In the scheme described above `C` would know about its dependency on `A`. However, this is redundant information. If `C` knows about its dependency on `B`, and `B` knows about its dependency on `A`, then globally we have a full view on the dependencies as well. If `A` sends to `C` as well, then `C` does not know about the interdependency between the message traversing `A -> B -> C` and the message traversing `A -> C`. But again: if we take a global view and join the branch number mapping of `A`, `B` and `C`, then we're able to determine the global behaviour. - -So instead of sending all branch number information received. We can append only the sending connector's branch numbers along with a message. A receiving connector will now associate these branch numbers with the port through which the message was received. Hence a connector's branch will have a branch number, but also a mapping from `port ID` to the branch number set of the sending party. - -If we send around a solution to all connectors (again, the procedure for which will be detailed later) they can be reconciled in the following manner. The connectors sharing a port will always have the "putter" choosing the port number mapping. And the "putter" may have advanced its execution and increased the number of elements in the branch number set. So if the "putter" receives a solution, then it needs to check if the port's branch number set is a subset of its own branch number set. If a "getter" receives a solution then it needs to check if the port's branch number set is a superset of its own branch number set. - -Taking a step back: if a global solution exists, then it is composed out of the local solutions per connector, of which there is at least one per connector. The fact that all connectors are part of the same synchronous region implies that each channel will have seen at least one interaction between the connector(s) that own the ports. Hence each channel will have had one set of branch IDs mapped to it. Hence if we were to take the branch ID sets associated with each channel, then we're able to find the global solution. \ No newline at end of file +# Previous Consensus Algorithm + +## Introduction + +The previous consensus algorithm (the one within Reowolf 1.0 and 1.1) had support for speculative execution. This means that the user may (directly or indirectly) fork the execution of a component. That particular execution then becomes two executions. At some point a component will have to choose which particular execution will be committed to memory. This is one reason for the existence of a `sync` block: a block of code wherein one may perform forking, and at the end a component will have to choose the execution that is committed to memory. + +With speculative execution we may have multiple components that are all forking their execution and sending/receiving messages. So we do not end up with one component that has to choose its final execution, but all components choosing their final execution. Note that one component's execution may apply restrictions on the validity of another component's execution. As an example, suppose the following components and their executions: + +- Component A: Has two executions: + - Execution A1: Component A has sent a message to component B. + - Execution A2: Component A has received a message from component B. +- Component B: Has three executions: + - Execution B1: Component B has received a message from component A, then sends a message back to component A. + - Execution B2: Component B has received a message from component A. + - Execution B3: Component B has sent two messages to component A. + +Without delving into too much detail, one may see that the only valid solution to this problem is the combination of `A1` and `B2`. + +## Component Execution Tree, and Execution Traces + +Components execute PDL code, which may contain calls to `fork`, `put`, and `get`. A `fork` explicitly forks the execution of the code. A `put` sends a message to a particular component, and a `get` receives a message from a component and forks (as explained later). + +As the component enters a `sync` block, it has only one possible execution. But as stated above there are reasons for the execution to split up. These individual executions may themselves split up later, thereby forming a so-called "execution tree": + +``` + +-----+ +------+ + | put |------>| sync | ++-------+ +------+----->+-----+ | end | +| sync | | fork | +------+ +| start |----->+------+----->+-----+ ++-------+ | get |------>+------+ + +-----+ | sync | + | | end | + | +------+ + | + +------>+------+ + | | sync | + | | end | + | +------+ + | + +--> ... +``` + +This corresponds to the following PDL code: + +``` +primitive some_component(out tx, in rx) { + sync { + fork { + put(tx, 5); + } or fork { + get(rx, 1); + } +} +``` + +We can see the reason for calling the execution tree a "tree". There are several things to note about the execution tree: Firstly that some executions have been completed and form a complete trace, that is: starting from the "sync start" a complete trace may be represented by the line running to the "sync end". Conversely, there is one trace that is incomplete: there is a trace waiting at the `get` for a message. We'll call a place where the execution splits into multiple branches/executions a "branching point". + +Note that the branching points can in the *general case* only be discovered at runtime. Any code may have control flow points like `if` statements, or `while` loops. Consider the following code: + +``` +primitive some_component(out tx, bool which_way) { + sync { + if (which_way) { + fork { + put(tx, 1); + } or fork { + put(tx, 2); + } + } else { + put(tx, 3); + } + } +} +``` + +Depending on the value of `which_way` we produce two different execution trees (of which we can determine all traces). The compiler cannot decide at compile time which execution tree will be generated. + +Note that the `get` branching points have an arbitrary number of forked executions arising from them. We'll call them "waiting points". In the *general case* we cannot figure out how many forked executions arise from a `get` branching point. The reason being might be illustrated by the following simple example: + +``` +primitive sender(out tx, u32 num_forks) { + sync { + auto fork_counter = 0; + while (fork_counter < num_forks) { + fork { + put(tx, fork_counter); + } or fork { } // empty case + } + put(tx, num_forks); + } +} + +primitive receiver(in rx) { + u32[] values = {}; + sync { + bool keep_going = true; + while (keep_going) { + auto new_value = get(rx); + values @= { new_value }; // append + fork { + keep_going = false; + } or fork { } + } + } +} +``` + +If the sender is connected to the receiver, then the sender will send anywhere between `1` and `num_forks` messages, depending on a user-supplied parameter (which we cannot figure out at compile-time). The isolated receiver can generate an infinite number of forked executions. We can analyze that the receiver will at most have `num_forks + 1` forked executions arising from its `get` branching point, but the compiler cannot. + +For this reason a `get` branching point is a "waiting point": the runtime will always need to have a copy of the component's memory and execution state the moment it encountered a `get` instruction. It might just be that another component will send it another message, such that it needs to produce a new forked execution. + +A `get` operation is also a "blocking operation": in the *general case* the component needs to know the value produced by the `get` operation in order to continue its execution (perhaps more specifically: the first time a `read` operation is performed on the variable that will store the transmitted message). In the simplest case the received message contains a boolean that is used in the test expression of an `if` statement: we'll need to have actually received that boolean before we can decide which control flow path to take. Speculating on the contents of messages is too computationally expensive to be taken seriously.