Changeset - f030cfda190d
[Not reviewed]
0 2 0
MH - 3 years ago 2022-05-15 14:41:14
contact@maxhenger.nl
WIP on consensus/issues documentation
2 files changed with 106 insertions and 4 deletions:
0 comments (0 inline, 0 general)
docs/runtime/consensus.md
Show inline comments
 
@@ -81,7 +81,7 @@ Note that the `get` branching points have an arbitrary number of forked executio
 
```
 
primitive sender(out<u32> tx, u32 num_forks) {
 
  sync {
 
    auto fork_counter = 0;
 
    auto fork_counter = 1;
 
    while (fork_counter < num_forks) {
 
      fork {
 
        put(tx, fork_counter); 
 
@@ -106,8 +106,108 @@ primitive receiver(in<u32> rx) {
 
}
 
```
 

	
 
If the sender is connected to the receiver, then the sender will send anywhere between `1` and `num_forks` messages, depending on a user-supplied parameter (which we cannot figure out at compile-time). The isolated receiver can generate an infinite number of forked executions. We can analyze that the receiver will at most have `num_forks + 1` forked executions arising from its `get` branching point, but the compiler cannot.
 
If the sender is connected to the receiver, then the sender will send anywhere between `1` and `num_forks` messages (distributed over `num_forks` forks), depending on a user-supplied parameter (which we cannot figure out at compile-time). The isolated receiver can generate an infinite number of forked executions. We can analyze that the receiver will at most have `num_forks + 1` forked executions arising from its `get` branching point (the `num_forks` branches that do receive, and one final fork that is infinitely waiting on another message), but the compiler cannot.
 

	
 
For this reason a `get` branching point is a "waiting point": the runtime will always need to have a copy of the component's memory and execution state the moment it encountered a `get` instruction. It might just be that another component will send it another message, such that it needs to produce a new forked execution.
 
For this reason a `get` branching point needs to be kept around for the entire duration of the sync block. The runtime will always need to have a copy of the component's memory and execution state the moment it encountered a `get` instruction, because it might just be that another component (in perhaps a new fork, which we cannot predict) will send it another message, such that it needs to produce a new forked execution.
 

	
 
A `get` operation is also a "blocking operation": in the *general case* the component needs to know the value produced by the `get` operation in order to continue its execution (perhaps more specifically: the first time a `read` operation is performed on the variable that will store the transmitted message). In the simplest case the received message contains a boolean that is used in the test expression of an `if` statement: we'll need to have actually received that boolean before we can decide which control flow path to take. Speculating on the contents of messages is too computationally expensive to be taken seriously.
 
A `get` operation is also a "blocking operation": in the *general case* the component needs to know the value produced by the `get` operation in order to continue its execution (perhaps more specifically: the first time a `read` operation is performed on the variable that will store the transmitted message). Consider the simple case where the received message contains a boolean that is used in the test expression of an `if` statement: we'll need to have actually received that boolean before we can decide which control flow path to take. Speculating on the contents of messages is too computationally expensive to be taken seriously. A put operation is not a blocking operation: the message is sent and the component continues executing its code.
 

	
 
We've touched upon control flow points multiple times. We'll touch upon some aspects of control flow here, to more easily introduce the algorithm for finding consensus later. A component is fully described by its memory (i.e. all of the memory locations it has access to through its variables) and execution state (i.e. its current position in its code). So once a component encounters a control flow point, it can only take one control flow path. The calling of certain impure functions (e.g. retrieving a cryptographically secure random number) does not change this fact. Note that receiving values from other components might change a component's memory state, hence influence the control flow path it takes in the subsequent forked execution. Conversely, a component sending a value might influence another component's memory state.
 

	
 
So before treading into more detail, here we've found that in the general case:
 

	
 
- A interior of a sync block demarks the place where speculative execution may occur.
 
- Speculative execution implies that we end up with an execution tree.
 
- A path through the execution tree that reaches the end of the sync block is called a trace, and represents a valid execution of the sync block for the component (but perhaps not for a peer it interacted with).
 
- The set of traces produced by a component in its sync block can practically only be discovered at runtime.
 
- A `get` operation is necessarily a blocking operation that always incurs a branching point. A `put` operation is a nonblocking operation that does not incur a branching point.
 
- The trace of a component is influenced by the messages it has received.
 

	
 
## Towards a Consensus Algorithm
 

	
 
The solution to the consensus problem is somehow discovering the ways in which the components have influenced the memory state of their peers. If we have a complete trace for each component, for which all peer components agree on the way they have influenced that complete trace, then we've found a solution to the consensus problem. Hence we can subdivide the consensus problem into four parts:
 

	
 
1. Keeping track of the messages that influence the memory state of components.
 
2. Keeping track of the peers that influence the memory state of components.
 
3. Finding a set of interactions between components on which all involved components agree, i.e. each `put` should have a corresponding `get` at the peer.
 
4. Somehow having a communication protocol that finds these agreeable interactions.
 

	
 
We'll incrementally work towards a solution that satisfies the first three points. We'll not consider the last point, as this is essentially a gossip protocol. We define some terms to make the following discussion easier:
 

	
 
- "sync region": The group of components that have interacted with one another and should agree on the global consensus solutionj.
 
- "local solution": A complete trace of a component. For the component this is a valid local solution, but might not be part of a global solution.
 
- "global solution": A set of traces, one for each of the components in the sync region, that all agree on the interactions that took place between the components in the sync region.
 

	
 
Suppose a component can somehow predict exactly which messages we're going to receive during the execution of its code, we'll assume that each received message has the appropriate `get` call associated with it. In this case we're able to produce the set of complete traces that a component produces by symbolically executing its code: we start out with the initial memory state, might perhaps do some explicit `fork`ing, know exactly which messages we receive and how they influence the control flow, and arrive at the end of the sync block.
 

	
 
However, as we've outlined above, we cannot know exactly which messages we're going to receive. We'll have to discover these messages while executing a component. The next best thing is to keep track of the values of the messages that we've received in a complete trace. Once we have complete traces for all of the interacting components, we can check that the received value corresponds to a sent value. e.g.
 

	
 
```
 
primitive sender(out<u32> tx) {
 
  sync {
 
    fork {
 
      put(tx, 1);   
 
    } or fork {
 
      put(tx, 2);
 
    }
 
  }
 
}
 

	
 
primitive receiver(in<u32> rx) {
 
  u32 value = 0; 
 
  sync {
 
    value = get(rx);
 
  }
 
}
 
```
 

	
 
Where `tx` is part of the same channel as `rx`. In this case we'll have two traces for each of the components, resulting in two valid global consensus solutions. In one solution the message `1` was transferred, in another the message `2` was transferred. There are two problems with this solution: firstly it doesn't take the identity of the channel into account. And secondly it doesn't take the effects of previous messages into account.
 

	
 
To illustrate the first problem, consider:
 

	
 
```
 
primitive sender(out<u32> tx_a, out<u32> tx_b) {
 
  sync {
 
    fork {
 
      put(tx_a, 1);
 
    } or fork {
 
      put(tx_b, 1);
 
    }
 
  }
 
}
 

	
 
primitive receiver(in<u32> rx_a, in<u32> rx_b) {
 
  u32 value = 0; 
 
  sync {
 
    value = get(rx_a);
 
  }
 
}
 
```
 

	
 
Here the fact that the sender has the solutions `1` and `1` does not help the receiver figure out which of those corresponds to its own solution of `1`.
 

	
 
To illustrate the second problem, consider:
 

	
 
```
 
primitive sender(out<u32> tx) {
 
  sync {
 
    fork {
 
      put(tx, 1);
 
      put(tx, 2);
 
    } or fork {
 
      put(tx, 2);
 
    }
 
  }
 
}
 

	
 
primitive receiver(in<u32> rx) {
 
  u32 value = 0; 
 
  sync {
 
    value = get(rx);
 
  }
 
}
 
```
 

	
 
Now we'll have `sender` contributing the solutions `1, 2` and `2`. While the receiver will generate the solutions `1`, `2` and `2`. The reason there are three solutions for the receiver is because it cannot figure out that the message `2` from the sender depended on the first message `1` from the sender having arrived.
 

	
 
We can solve this by somehow embedding the identity of the channel associated with the message, and by describing all of the previous interactions
 
\ No newline at end of file
docs/runtime/known_issues.md
Show inline comments
 
@@ -20,4 +20,6 @@ The current implementation of Reowolf has the following known issues:
 

	
 
- Reserved memory for ports will grow without bounds: Ports can be given away from one component to another by creating a component, or by sending a message containing them. The component sending those ports cannot remove them from its own memory if there are still other references to the transferred port in its memory. This is because we want to throw a reasonable error if that transferred port is used by the original owner. Hence we need to keep some information about that transferred port in the sending component's memory. The solution is to have reference counting for the ports, but this is not implemented.
 

	
 
- An extra to the above statements: when transferring ports to a new component, the memory that remembers the state of that port is removed from the component that is creating the new one. Hence using old references to that port within the creating component's PDL code results in a crash.
 

	
 
- Some control algorithms are not robust under multithreading. Mainly error handling when in sync mode (because there needs to be a revision where we keep track of which components are still reachable by another component). And complicated scenarios where ports are transferred.
 
\ No newline at end of file
0 comments (0 inline, 0 general)