Changeset - 50a64dc885bc
[Not reviewed]
0 1 1
MH - 3 years ago 2022-05-16 01:06:35
contact@maxhenger.nl
Start on new consensus algorithm documentation
2 files changed with 72 insertions and 1 deletions:
0 comments (0 inline, 0 general)
docs/runtime/consensus2.md
Show inline comments
 
new file 100644
 
# Current Consensus Algorithm
 

	
 
## Introduction 
 

	
 
For all of the reasons described in the previous consensus algorithm, speculative execution was removed from the runtime. This greatly simplifies the consensus algorithm. In fact, there isn't much need for a consensus algorithm anymore, apart from some edge cases. Hence the `fork` statement is gone from the language.
 

	
 
Most of this document will describe a different problem that wasn't discussed in the document describing the previous consensus algorithm: the introduction of a `select` statement and the way in which components are allowed into a synchronous region.
 

	
 
## The Consensus Algorithm
 

	
 
Without speculative execution, there is no more execution tree: there is only one control flow path a component can take through its PDL code. That execution may not form a complete trace (i.e. reach the end of the sync block) because it encounters a blocking operation. But to reiterate: if there is a complete trace, then it can only have taken one control flow path through its code.
 

	
 
However, there is still are reasons for incorporating parts of the old consensus algorithm. That is: there is still a port mapping, `put` operations update this port mapping, and `get` operations (which will not fork anymore, but simply block until a message is present) do the same thing. 
 

	
 
There are two reasons for this. Firstly there is the design choice of enforcing strict ordering on the way channels are used between components. If there are two channels between components, then we may have the following code:
 

	
 
```
 
comp sender_variant_a(out<u32> tx_a, out<u32> tx_b) {
 
    sync {
 
        put(tx_a, 1);
 
        put(tx_b, 2);
 
    }
 
}
 

	
 
comp sender_variant_b(out<u32> tx_a, out<u32> tx_b) {
 
    sync {
 
        put(tx_b, 2);
 
        put(tx_b, 1);
 
    }
 
}
 

	
 
comp receiver(in<u32> rx_a, in<u32> rx_b) {
 
    sync {
 
        auto va = get(rx_a);
 
        auto vb = get(rx_b);
 
    }
 
}
 
```
 

	
 
If we wouldn't send along the port mapping. Then both `sender_variant_a` as `sender_variant_b` would be valid peers for the `receiver` component. This is because if put operations are still asynchronous (that is: they send the message and continue executing, not waiting for the corresponding `get` to complete), then both messages will arrive at the `receiver` component. The receiver component can retrieve these messages in any order. However, if we *do* send along the port mapping, then only `sender_variant_a` can be a valid peer of `receiver`. Note: this is a design choice.
 

	
 
We could reduce this design choice by only sending along the port mapping of the port over which is a message is sent. This would imply that messages over the same channel have to be received in the correct order, but messages over different channels can be received in any desired order. We've kept this strict ordering in place by sending along the full port mapping.
 

	
 
The second reason for still including the port mapping has to do with the fact that, by design, `put` operations are not acknowledged. That is to say: the `put` operation is asynchronous, and does not prevent the sending component from continuing its execution. For a `put` operation to be valid, the message *must* be received by the peer. Without any kind of port mapping we have no way of detecting the following mismatch:
 

	
 
```
 
comp sender(out<u32> tx) {
 
    sync {
 
        put(tx, 0);
 
        put(tx, 1);
 
    }
 
}
 

	
 
comp receiver(in<u32> rx) {
 
    sync get(rx);
 
}
 
```
 

	
 
However, there are much simpler ways of designing a consensus algorithm without speculative execution: one may annotate the port with the number of message sent, one may (as stated above) only send along the port mapping for a single port, instead of all shared channels, etc.
 

	
 
That being said, the current implementation still uses branch IDs (differently interpreted in this case: a simple operation counter). There is still a port mapping from `port ID` to `branch ID`, and finding the global solution still proceeds in the same way (except now there is only one complete trace per component).
 

	
 
## Synchronous Regions
 

	
 
There were some aspects of the consensus algorithm that were specifically left out in the previous document. Among them is when to consider a component part of the synchronous region. In version 1.0 of the Reowolf runtime each peer of each component was considered part of the synchronous region. The reason behind this decision was that the fact that a channel remains unused in a synchronous round should be seen as part of the consensus algorithm: if a component did not `get` on a port, then the peer must agree that it did not `put` on a port.
 

	
 
So when a complete trace was found, a component went over all its ports and sent a token message over the unused channels asking the peers to join the synchronous round with their local solutions, where those local solutions are required not to have sent/received anything over those silent channels.
 

	
 
The problem with this approach may be illustrated with a simple example: suppose a set of servers that are connected to a database of some sorts. In the normal situation it would be perfectly normal for multiple servers to be requesting database resources at the same time. These can all be handled in parallel
 
\ No newline at end of file
docs/runtime/known_issues.md
Show inline comments
 
@@ -25,7 +25,9 @@ The current implementation of Reowolf has the following known issues:
 
- An extra to the above statements: when transferring ports to a new component, the memory that remembers the state of that port is removed from the component that is creating the new one. Hence using old references to that port within the creating component's PDL code results in a crash.
 

	
 
- Some control algorithms are not robust under multithreading. Mainly error handling when in sync mode (because there needs to be a revision where we keep track of which components are still reachable by another component). And complicated scenarios where ports are transferred.
 

	
 
- There is an assertion in the interpreter that makes sure that there are no values left on the expression stack when a statement has completed. This is not true when you have an expression statement! If you want to remove this assertion make sure to clear the stack (using the method on the `Store`).
 

	
 
- The TCP listener component should probably do a `shutdown` before a `close` on the socket handle. Also it should set the `SO_REUSEADDR` option.
 
\ No newline at end of file
 
- The TCP listener component should probably do a `shutdown` before a `close` on the socket handle. It should also set the `SO_REUSEADDR` option.
 

	
 
- The way in which putting ports are ordered to block if the corresponding getter port's main inbox is full is rather silly. This lead to the introduction of the "backup inbox" as it is found in the runtime's code. There is a design decision to make here, but the current implementation is a bit silly. There are two options: (a) have an atomic boolean indicating if the message slot for an inbox is full, or (b) do away with the "main inbox" alltogether, and have an unbounded message queue.
 
\ No newline at end of file
0 comments (0 inline, 0 general)