Changeset - afcf9a87186d
[Not reviewed]
0 1 0
MH - 3 years ago 2022-03-03 14:40:36
contact@maxhenger.nl
More documentation on runtime architecture
1 file changed with 50 insertions and 3 deletions:
0 comments (0 inline, 0 general)
docs/runtime/sync.md
Show inline comments
 
@@ -44,7 +44,7 @@ Within PDL code it is possible to create components. Upon their creation they ca
 

	
 
Here we run into a plethora of problems. The other endpoint might have been given away to another created component. The other endpoint may have already been used in communication, such that we already have messages underway for the port we're trying to give to a newly created component. We may have that the local port ID assigned by the creating component is not the same as the local port ID that the newly created component may want to assign to it. We may have that this port has been passed along multiple times already, etc.
 

	
 
We cannot help that messages have already arrived, or are in transit, for the transferred port. But we can make some assumptions that simplify the transfer of ports. As a first one, we have that the creating component decides when the created component is scheduled for execution. We'll choose not to execute it initially, such that we can be sure that it will not send messages over its ports the moment is created. To further simplify the problem, we have assumed that messages arrive in order. So although messages might still be underway for the transferred ports, if we ask the sender to stop sending, and the sender blocks the port and acknowledges that it has received this command. Then the moment the creator receives the acknowledgement it is certain that it has received all messages intended for the transferred ports.
 
We cannot help that messages have already arrived, or are in transit, for the transferred port. But we can make some assumptions that simplify the transfer of ports. As a first one, we have that the creating component decides when the created component is scheduled for execution. We'll choose not to execute it initially, such that we can be sure that it will not send messages over its ports the moment is created. To further simplify the problem, we have assumed that messages arrive in order. So although messages might still be underway for the transferred ports, if we ask the sender to stop sending, and the sender blocks the port and acknowledges that it has received this command. Then the moment the creator receives the acknowledgement it is certain that it has received all messages intended for the transferred ports. We'll also disallow the creation of components during synchronous interactions.
 

	
 
And so here we have our first control protocol. If a port is transferred then we might have:
 

	
 
@@ -63,6 +63,8 @@ In this last case we take the following actions, `C` for creating component, `N`
 

	
 
There is a bit of extra overhead here with the `PeerChangeBlockPort` -> `Acknowledge` -> `PeerChangeUnblockPort` sequence with respect to other possible schemes. But this one allows `P` to continue executing its code as long as it does not use its blocked port. It also ensures that messages arrive in order: they're all collected by `C`, given to `N`, and only then may `P` continue creating messages to `N`, hence arriving after the earlier messages have been handed off to `N`.
 

	
 
As we'll later introduce, ports can end up being closed. When a port is closed the above procedure is not initiated. A port cannot be reopened, and once a port is closed its peer is also notified that the channel cannot be used anymore. As such, it is not needed (and perhaps impossible, if the memory backing the peer component has already been freed) to notify that the port will change peer.
 

	
 
## Managing the Lifetime of Components
 

	
 
Components are created by other components or by the API. Note that the API may for intents and purposes be viewed as a component. It may own and create channels, and it may create components. Component start, like a function, executing at the top of their program listing and may end in two ways. Either they encounter an unrecoverable error, or they neatly reach the end of their execution.
 
@@ -75,8 +77,53 @@ For this reason there are two systems that make sure that components stay alive
 

	
 
Through this mechanism the consensus algorithm can be greatly simplified. If a component encounters a critical error and is already participated in a sync round, then it can notify the other participants, but remain reachable to all other components until the round ends (and the reference counts are decreased again).
 

	
 
A second system is needed to ensure that a component actually exits (because all the peers hold a reference, and we need all of those references to drop to 0 to truly remove the component from the runtime). And so when a component exits it will send `ClosePort` messages to all of its peers. These peers will `Acknowledge` those messages and close the respective ports, meaning that they will no longer allow sending messages over those ports, that will be a fatal error. However, messages that were sent to the exiting component before receiving the `ClosePort` message might still be underway.
 
A second system is needed to ensure that a component actually exits (because all the peers hold a reference, and we need all of those references to drop to 0 to truly remove the component from the runtime). And so when a component exits it will send `ClosePort` messages to all of its peers. These peers will `Acknowledge` those messages and close the respective ports, meaning that they will no longer allow sending messages over those ports, that will be a fatal error. However, messages that were sent to the exiting component before receiving the `ClosePort` message might still be underway. And so the terminating component will wait for all of the `Acknowledge` messages for all of its channels such that it knows that it has received all data messages. The component will respond to these intermediate messages with a `DataMessageFailed` message, meaning that the message has been received the moment the component was already terminated, hence the sender should consider this a failed message transmission.
 

	
 
Bringing these systems together: apart from data messages there might still be control messages in transit, or the exiting component might still have some control/sync work to do. And so we need to modify something we said earlier: instead of a component removing its self-reference the moment it terminates, we do this the moment we have received all the `Acknowledge` messages we were expecting. This way if a component is busy creating another component, we're certain that the appropriate protocols are observed. So:
 

	
 
Concluding everything described above, two separate mechanisms will act in conjunction:
 

	
 
- A. The exiting component `E` waiting until it has finished notifying all peers `P` of its termination:
 
  1. Component `E` sends a `ClosePort` message to each peer `P` for each of the shared channels (except when the port is already closed because `P` is also in the process of shutting down).
 
  2. The peer `P` receives the `ClosePort` message and closes the port as a result. This is a change of port state that will cause any subsequent `put` operations on that port to become a fatal error for component `P`. In response the peer `P` sends an `Acknowledge` message to component `E`, unless component `P` exited itself (that is: sending a `ClosePort` message to `E` before the `ClosePort` message from `P` arrived). After closing the port the component `P` will remove the reference to `E`.
 
  3. Component `E` waits until all of its pending control operations are finished (i.e. waiting for the `Acknowledge` messages following `ClosePort` messages, `PeerChangeBlockPort` messages, etc.). Once all of these are finished, note that we can no longer participate in any future control actions: component `E` will not create channels/components itself. Since all of its ports are closed, the peers `P` will also not send any data or control messages.
 
  4. Component `E` now checks its inbox for any remaining messages. It will respond to any data messages that arrived after `E` sending `ClosePort` and before `E` receiving `Acknowledge` with a `DataMessageFailed` message (except for ports that were closed before the `Acknowledge`). Then it removes the reference to itself, therefore decrementing the reference counter for the component by 1.
 

	
 
- B. The reference counting mechanism. Any sync round the exiting component `E` is participating in will conceptually hold a reference to `E`. The component `E` will always respond to sync messages as if it were alive (albeit trying to indicate to everyone that it is actually exiting). The component that removes the last reference to the component `E` (which may be `E` itself, but also a peer `P`) will truly remove the associated memory from the runtime. 
 

	
 
**Note**: We did not consider machine termination. That is to say: once we reach the runtime maturity where communication occurs over different machines, then we have to consider that machines encounter fatal errors. However these can only be resolved by embedding the possibility of failure inside the protocol over which these machines communicate. 
 

	
 
## Synchronization of 
 
\ No newline at end of file
 
## Sending Data Messages
 

	
 
So far we've discussed the following properties associated with sending data messages:
 

	
 
1. Port IDs are decided locally. So a peer may have an ID that is outdated for the intended recipient.
 
2. Ports can move from owner to owner. So a peer might have a component ID that is outdated for the intended recipient.
 
3. Ports may be closed.
 
4. Message intended for specific ports may end up at an intermediate component that is passing that message along.
 

	
 
However, there are some properties that we can take advantage of:
 

	
 
1. When a component sends a message, it knows for certain what its own component ID and port ID is. So a transmitting port always sends the correct source component/port ID.
 
2. When a component receives a message, it knows for certain what its own component ID and port ID is. So once a receiving port receives a properly annotated message from a transmitted port, the receiving end can be certain about the component IDs and port IDs that make up the channel.
 

	
 
Note that although the message transmitter may not be certain about its final recipient, the components along the way *are* aware of the routing that is necessary to make the message arrive at the intended target. Combing back to the case where we have a creator `C`, new component `N` and peer `P`. Then `P` will send a message intended for `N`, but arriving at `C`. Here `C` can change the target port and component to `N` (as it is in the process of transferring that port, so knows both its original and new port ID). Once the message arrives and is accepted by the recipient then it is certain about the component and port IDs.
 

	
 
## Sending Sync Messages
 

	
 
Sync messages have the purpose of making sure that consensus is reached about the interactions that took place in all of the participating components' sync blocks. The previous runtime featured speculative execution and a branching execution model: a component could exhibit multiple behaviours, and at the end all components decide which combination of local behaviours constitute a satisfying single global behaviour (if one exists at all). Without speculative execution the model can be a lot simpler.
 

	
 
We'll only shortly discuss the mechanisms that are present in the synchronization algorithm. A component has a local counter, that is reset for each synchronous interaction, that is used when transmitting data messages. Such a message will be annotated with the counter at `N`, after which the component sends the next message with annotation `N+1`. At the same time the component will keep track of a local mapping from port ID to port annotation, we'll call this the port mapping. Likewise, when a component receives a data message it will assign the received annotation in its own port mapping. If two components assign the same annotation to the ports that constitute a channel, then there is an agreeable interaction there.
 

	
 
And so at the end of the synchronous round a component will somehow broadcast its port mapping. Note from the discussion above that a transmitting port's annotation is only associated with that transmitting port, since a transmitting port can only truly ever know its own component/port ID. While the receiving port's annotation knows about the peer's component/port ID as well. And so a component can broadcast `(component ID, port ID, mapping)` for each of its transmitting ports, while it can broadcast `(own component ID, own port ID, peer component ID, peer port ID, mapping)` for each receiving port. Then a recipient of these mappings can match them up and make sure that the mappings agree.
 

	
 
Note that this broadcasting of synchronous messages is essentially a component-to-component operation. However these messages must still be sent over ports anyway (and any port that was used to transmit messages to a particular receiving component will do). There are two reasons:
 

	
 
1. The sync message may need to be rerouted (e.g. a sender quickly fires both a data message and a subsequent sync message while the receiving port is being transferred to a new component), but needs to arrive at the same target as the data message. This is essentially restating that a transmitter never knows about the component ID of the recipient.
 
2. The sync message must not be taken into account by the recipient if it has not accepted any messages from the sender yet. Ofcourse this can be achieved in various ways but a simple way to achieve this is to send the sync message over ports.
 

	
 
## Annotating Data Messages
 

	
 
These port mappings are also sent along when sending data messages. We will not go into details but here the mapping makes sure that messages arrive in the right order, and certain kinds of deadlock or inconsistent protocol behaviour may be detected. This port mapping is checked for consistency by the recipient and, when consistent, the target port is updated with its new mapping.
 

	
 
As we'll send along this mapping we will only consider the ports that are shared between the two components. But in the most general case the transmitting ports of the component do not have knowledge about the peer component. And so the sent port mapping will have to contain the annotation for *all* transmitting ports. Receiving port mappings only have to be sent along if they received a message, and here we can indeed apply filtering.
 
\ No newline at end of file
0 comments (0 inline, 0 general)