Changeset - b17fad1f8c03
[Not reviewed]
MH - 3 years ago 2022-05-19 22:52:30
contact@maxhenger.nl
Small addition to documentation regarding removal of primitve/composite nomenclature
2 files changed with 9 insertions and 1 deletions:
0 comments (0 inline, 0 general)
docs/runtime/01_runtime.md
Show inline comments
 
@@ -219,49 +219,55 @@ We'll talk ourselves through the case of a component crashing before coming up w
 

	
 
We'll first consider that a component may crash inside or outside of a synchronous block. From the point of view of the peer component, we'll have four cases to consider:
 

	
 
1. The peer component is not in a synchronous block. 
 
2. The crashing component died before the peer component entered the synchronous block.
 
3. The crashing component died during the same synchronous block as the peer component.
 
4. The crashing component died after reaching consensus on the synchronous block that the peer component is currently still in.
 

	
 
Before discussing these cases, it is important to remember that the entire runtime has components running in their own thread of execution. We may have that the crashing component is unaware of its peers (due to the fact that peer ports might change ownership at any point in time). We'll discuss the consensus algorithm in more detail later within the documentation. For now it is important to note that the components will discover the synchronous region they are part of while the PDL code is executing. So if a component crashes within a synchronous region before the end of the sync block is reached, it may be possible that it will not discover the full synchronous region it would be part of.
 

	
 
Because the crashing component is potentially unaware of the component IDs it will end up notifying that it has failed, we can not design the crash-handling algorithm in such a way such that the crashing component notifies the peers of when they have to crash. We'll do the opposite: the crashing component simply crashes and somehow attempts to notify the peers. Those peers themselves decide whether they have to crash in response to such a notification.
 

	
 
For this reason, it does not make a lot of sense to deal with component failure through the consensus algorithm. Dealing with the failure through the consensus algorithm only makes sense if we can find the synchronous region that we would have discovered if we were able to fully execute the sync block of each participating component. As explained above: we can't, and so we'll opt to deal with failure on a peer-by-peer basis.
 

	
 
We'll go back to the four cases we've discusses above. We'll change our point of view: we're now considering a component (the "handling component") that has to deal with the failure of a peer (the "crashing component"). We'll introduce a small part of our solution a-priori: like a component shutting down, a failing component will simply end its life by broadcasting `ClosePort` message over all of its owned ports that are not closed (and, like the other control algorithms. the failing component will wait for the port that is shutting down to become unblocked before it will send the `ClosePort` message).
 

	
 
In the first case, we're dealing with a failing component while the handling component is not in a synchronous block. This means that if there was a previous synchronous block, that it has succeeded. We might still have data messages in our inbox that were sent by the failing component. But in this case it is rather easy to deal with this: we mark the ports as closed, and if we end up using them in the next synchronous block, then we will crash ourselves.
 

	
 
In the second case we have that the peer component died before we ourselves have entered the synchronous block. This case is somewhat equivalent to the case we described above. The crashing component cannot have sent the handling component any messages. So we mark the port as closed, potentially failing in the future if they end up being used. However, the handling component itself might've performed `put` operations already. So now that the handling component receives a `ClosePort` message, it realizes that those earlier `put` operations can never be acknowledged. For this reason a component stores when it last used a port in the metadata associated with a port. When, in this second case, a `ClosePort` message comes in while the port has been used already, the handling component should crash as well.
 

	
 
Next up is the third case, where both the crashing component and the handling component were both in the same synchronous round. Like before we mark the port as closed and future use will cause a crash. Like the second case, if the handling component has already used a port (which in this case may also be having received a message from the crashing component), then it should crash as well.
 

	
 
The fourth case is where the failing component crashes *after* the handling component finished its sync round. This is an edge cases dealing with the following situation: both the handling as the crashing component have submitted their local solution to the consensus algorithm (assumed to be running somewhere in a thread of execution different from the two components). The crashing component receives a global solution, finishes the sync round, and then crashes, therefore sending the `ClosePort` message to the handling component. The handling component, due to the asynchronous nature of the runtime, receives the `ClosePort` message before the global solution has a chance to reach the handling component. In this case, however, the handling component should be able to finish the synchronous round, and it shouldn't crash.
 

	
 
### Distinguishing the crashing cases
 

	
 
So far we've pretended like we could already determine the relation between the crashing component's synchronous round and the handling component's synchronous round. But in order to do this we need to add a bit of extra information to the `ClosePort` message.
 

	
 
The simplest case is to determine if the two components are both in the same synchronous round (case three, as described above). The crashing component annotates the `ClosePort` message with whether it was in a synchronous round or not. Then if both components are in a synchronous round (as checking by the handling component), and the about-to-be-closed port at the handling component was used in that round, or will be used in that round, then the handling component should crash.
 

	
 
Equally simple: the handling component can figure out itself if it is in a synchronous round (case one, as described above). If not: then the port is marked closed and future use causes crashes.
 

	
 
The last two cases require a bit more work: how do we distinguish the edge case where the handling component's round will complete in the future, from the case where it should crash. To distinguish the edge case we need the handling component to know if the last interaction the crashing component handled was the one in the handling component's current synchronous round.
 

	
 
For this reason we keep track of the synchronous round number. That is to say: there is a counter that increments each time a synchronous round completes for a component. We have a field in the metadata for a port that registers this round number. If a component performs a `put` operation, then it stores its own round number in that port's metadata, and sends this round number along with the message. If a component performs a `get` operation, then it stores the *received* round number in the port's metadata.
 

	
 
When a component closes a port, it will also send along the last registered round number in the `ClosePort` message. If the handling component receives a `ClosePort` message, and the last registered round number in the port's metadata matches the round number in the `ClosePort` message, and the crashing component was not in a synchronous round, then the crashing component crashed after the handling component's sync round. Hence: the handling component can complete its sync round.
 

	
 
To conclude: if we receive a `ClosePort` message, then we always mark the port as closed. If the handling and the crashing component were in a synchronous round, and the closed port was used in that synchronous round, then the handling component crashes as well. If the handling component *is* in a synchronous round but the crashing component *is not* in a synchronous round, the port of the handling component is used in the synchronous round and the port's last registered round number does not match the round number in the `ClosePort` message, then the handling component crashes as well.
 

	
 
## Sync Algorithm
 

	
 
A description of the synchronous algorithm is present in different documents. We will mention here that central to the consensus algorithm is that two components agree on the interactions that took place over a specific channel. In order for this to happen we'll send along a lot of metadata when trying to reach consensus, but here we're just concerned with attempting to match up the two ends of a channel. 
 

	
 
A port is identified by a `(component ID, port ID)` pair, and channel is a pair of those identifying pairs. So to match up the two ends of a channel we would have to find a consistent pair of ports that agree on who their peers are. However, we're dealing with the problem of eventual consistency: `put`ting ports never know who their peer is, because the sent message might be relayed. However, `get`ting ports *will* know who their peer is for the duration of a single synchronous round once they've received a single message.
 

	
 
This is the trick we will apply in the consensus algorithm. If a channel did not see any messages passing through it, then the components that own those ports will not have to reach consensus because they will not be part of the same synchronous region. However if a message did go through the channel then the components join the same synchronous region, and they'll have to form some sort of consensus on what interaction took place on that channel.
 

	
 
And so the `put`ting component will only submit its own `(component ID, port ID, metadata_for_sync_round)` triplet. The `get`ting port will submit information containing `(self component ID, self port ID, peer component ID, peer port ID, metadata_for_sync_round)`. The consensus algorithm can now figure out which two ports belong to the same channel.
 
\ No newline at end of file
 
And so the `put`ting component will only submit its own `(component ID, port ID, metadata_for_sync_round)` triplet. The `get`ting port will submit information containing `(self component ID, self port ID, peer component ID, peer port ID, metadata_for_sync_round)`. The consensus algorithm can now figure out which two ports belong to the same channel.
 

	
 
## Component Nomenclature
 

	
 
Earlier versions of the Reowolf runtime featured the distinction between primitive and composite components. This was put into the language from a design perspective. Primitive components could do nitty-gritty protocol execution: perform `put`/`get` operations, and entering into sync blocks. Conversely, composite components were tasked with setting up a network of interconnected components: creating channels and handing off the appropriate ports to the instantiated components.
 

	
 
Once the runtime was capable of sending ports over channels, it became apparent that this distinction no longer made sense. Because if only primitive components can send/receive ports, and cannot create new components, then the programmer is limited to using those received ports directly in the primitive's code! And so the split between primitive and composite components was removed: only the concept of a "component" is left.
 
\ No newline at end of file
docs/runtime/04_known_issues.md
Show inline comments
 
# Known Issues
 

	
 
The current implementation of Reowolf has the following known issues:
 

	
 
- Cannot create uninitialized variables that are later known to be initialized. This is not a problem for the regular types (perhaps a bit tedious), but is a problem for channels/ports. That is to say: if a component needs a temporary variable for a port, then it must create a complete channel. e.g.
 

	
 
  ```
 
  comp send(out<u32> tx1, out<u32> tx2, in<bool> which) {
 
    channel unused -> temporary;
 
    while (true) sync {
 
      if (get(which)) {
 
        temporary = tx1;
 
      } else {
 
        temporary = tx2;
 
      }
 
      put(temporary, 1);
 
    }
 
  }
 
  ```
 

	
 
  Another solution would be to use an empty array and to put a port inside of that. Hacks galore!
 

	
 
- Reserved memory for ports will grow without bounds: Ports can be given away from one component to another by creating a component, or by sending a message containing them. The component sending those ports cannot remove them from its own memory if there are still other references to the transferred port in its memory. This is because we want to throw a reasonable error if that transferred port is used by the original owner. Hence we need to keep some information about that transferred port in the sending component's memory. The solution is to have reference counting for the ports, but this is not implemented.
 

	
 
- An extra to the above statements: when transferring ports to a new component, the memory that remembers the state of that port is removed from the component that is creating the new one. Hence using old references to that port within the creating component's PDL code results in a crash.
 

	
 
- Some control algorithms are not robust under multithreading. Mainly error handling when in sync mode (because there needs to be a revision where we keep track of which components are still reachable by another component). And complicated scenarios where ports are transferred.
 

	
 
- There is an assertion in the interpreter that makes sure that there are no values left on the expression stack when a statement has completed. This is not true when you have an expression statement! If you want to remove this assertion make sure to clear the stack (using the method on the `Store`).
 

	
 
- The TCP listener component should probably do a `shutdown` before a `close` on the socket handle. It should also set the `SO_REUSEADDR` option.
 

	
 
- The TCP listener and TCP sender components have not been tested extensively in a multi-threaded setup.
 

	
 
- The way in which putting ports are ordered to block if the corresponding getter port's main inbox is full is rather silly. This led to the introduction of the "backup inbox" as it is found in the runtime's code. There is a design decision to make here, but the current implementation is a bit silly. There are two options: (a) have an atomic boolean indicating if the message slot for an inbox is full, or (b) do away with the "main inbox" alltogether, and have an unbounded message queue.
 

	
 
- For practical use in components whose code supports an arbitrary number of peers (i.e. their code contains an array of ports that is used for communication and changes in size during the component's lifetime), the `select` statement somehow needs to support waiting on any one of those ports.
 

	
 
- The compiler currently prevents one from using `sync` blocks (or corresponding `get` and/or `put` operations) in functions. They can only be used within components. When writing large programs this it makes it rather hard to re-use code: all code that interacts with other components can only be written within a sync block. I would advise creating `sync func`, `nonsync` func and regular `func`tions. Where:
 
  
 
  - `sync func`tions can only be called from within `sync` functions. They may not open new sync blocks, but may perform calls to `get`/`put`. These are useful to encapsulate sequences of `put`/`get` calls together with some common message-modifying code.
 
  - `nonsync func`tions (or `async func`tions) may only be called outside of sync blocks, and may open new sync blocks themselves. They are useful to encapsulate a single interaction with other components. One may also create new components here.
 
  - regular `func`tions. Are as useful as in any other language, but here we disallow calling `nonsync func`tions or `sync func`tions.
 

	
 
- The `Ack` messages that are sent in response to `PeerPortChanged_Block` messages should contain the sending components `(component ID, port ID)` pair in case the `PeerPortChanged_Block` message is relayed. When such an `Ack` message is received, the peer of the port must be updated before transferring the port to the new owner.
 

	
 
- The compiler currently accepts a select arm's guard that is formulated as `auto a = get(get(rx))`. This should be disallowed.
 

	
 
- The work queue in the runtime is still a mutex-locked queue. The `QueueMpsc` type should be extended to be a multiple-producer multiple-consumer queue. This type should then replace the mutex-locked work queue.
 
\ No newline at end of file
0 comments (0 inline, 0 general)