Changeset - 27f48b4078d7
[Not reviewed]
0 1 0
mh - 3 years ago 2022-04-13 16:55:29
contact@maxhenger.nl
WIP on documenting shutdown process
1 file changed with 18 insertions and 0 deletions:
0 comments (0 inline, 0 general)
docs/runtime/sync.md
Show inline comments
 
@@ -135,12 +135,30 @@ There are some simple assumptions we can make that makes the problem a little bi
 
Without adding any extra overhead (e.g. some kind of discovery round per synchronous interaction), we can take three approaches:
 

	
 
1. We simply don't care. It is impossible for a round where messages are received out of order to complete. Hence we temporarily allow a component to take the wrong actions, therefore wasting some CPU time, and to crash/error afterward.
 
2. We remove the entire concept of ordering of channels at a single component. Channels are always independent entities. This way we truly do not have to care. All we care about is that the messages that have been sent over a channel arrive at the other side.
 
3. We slightly modify the algorithm to detect these problems. This can be done in reasonable fashion, albeit a bit "hacky". For each channel there is a slot to receive messages. Messages wait there until the receiver performs a `get` in the PDL code. So far we've only considered learning about the component/port IDs that constitute a channel the moment they're received with a `get`. The algorithm could be changed to already learn about the peer component/port ID the moment the message arrives in the slot.
 

	
 
We'll go with the last option in the current implementation. We return to the problematic example above. Note that messages between components are sent in ordered fashion, and `a_put` happens before `b_put`. Then component `B` will first learn that `a_put` is the peer of `a_get`, then it performs the first `get` on the message from `b_put` to `b_get`. This message is annotated with a port mapping that `a_put` has been used before. We're now able to detect at component `B` that we cannot accept `b_get` before `a_get`.
 

	
 
Concluding:
 

	
 
- Every data message that is transmitted needs to contain the port mapping of all `put`ting ports (annotating them appropriately if they have not yet been used). We also need to include the port mapping of all `get`ting ports that have a pending/received message. The port mapping for `put`ting ports will only include their own ID, the port mapping for `get`ting ports will include the IDs of their peer as well.
 
- Every arriving data message will immediately be used to identify the sender as the peer of the corresponding `get`ter port. Since messages between components arrive in order this allows us to detect when the `put`s are in a different order at the sender as the `get`s at the receiver.
 

	
 
## Handling Fatal Component Errors
 

	
 
Components may, during their execution, encounter errors that prevent them from continuing executing their code. For the purposes of this chapter we may consider these to occur during two particular phases of their execution:
 

	
 
1. The error occurs outside of a sync-block.
 
2. The error occurs anywhere inside of a sync-block. Or more specifically: the error occurs inside of a sync-block where the component has already performed an interaction with the outside world (i.e. performed a `put` or a `get`, **note:** I need to think about whether a select block influences the error-handling as well).
 

	
 
### Handling Fatal Errors outside of Synchronous Rounds
 

	
 
In the first case we're dealing with a component that has finished previous interactions with the outside world. So it does not have to deal with submitting the fact that a sync round has finished to the outside world. And so the component will perhaps log something to `stdout` to indicate that it has failed, but apart from that it will simply initiate the exit procedure as described earlier: reporting to all peers that the ports will be closed.
 

	
 
There is one more remark that should be made here. Although the component `E` that has encountered the error might not be part of a sync round, another component `C` might have sent a message to component `E`. If the message is being sent from `C` while it has already received the information from `E` that it port should be closed, then `C` needs to handle the error as well.
 

	
 
Hence, if the component `E` encounters a critical error, while there are still data messages from component `C` in the inbox (and the corresponding port is not yet closed), then component `E` sends a `DeliveryFailed` message to `C`. We may annotate each sent data message with the origin of the message in the PDL source, such that we can send this annotation back to the sender. Once the `DeliveryFailed` message arrives at `C` there are two possible scenarios (consider that it has sent a message, hence must have done this in a sync round that has not yet finished):
 

	
 
1. It is still waiting on the conclusion to a synchronous round that, if it were not for component `E`, would have succeeded. In this case the component `C` prints the `put`-error, and initiates failure in the synchronous round (we'll come back to this later in the other subchapter).
 
2. It is not waiting for the conclusion of a synchronous round, because after sending some other component (maybe even `C` itself) experienced a fatal error. It received the notification of the failed synchronous round first, hence is busy shutting down. In this case the component likely already printed an error, hence can ignore the `DeliveryFailed` message and continue shutting down.
 
\ No newline at end of file
0 comments (0 inline, 0 general)