CSY/reowolf Changeset - 51eca459d3e5 · Centrum Wiskunde & Informatica (CWI)

Changeset - 51eca459d3e5

Parent rev.

Child rev.

[Not reviewed]

0 2 0

MH - 3 years ago 2022-03-02 18:19:48
contact@maxhenger.nl

Replicating test failure, starting runtime documentation

2 files changed with 115 insertions and 2 deletions:

docs/runtime/sync.md

src/runtime2/tests/mod.rs

0 comments (0 inline, 0 general)

docs/runtime/sync.md

➞

Show inline comments

 # Synchronous Communication
+# Synchronous Communication and Component Orchestration
 ##
@@ \ No newline at end of file @@
 ## Introduction
 The Reowolf runtime consists of a system that allows multiple components to run within their own thread of execution. These components are able to exchange messages with one another. Components are capable of creating other components, and of creating channels. We may visualise the runtime as a cloud of all kinds of components, connected by the communication channels between them, hence a kind of communication graph.
 With this version of the runtime there were several main drivers. For performance reasons we want:
 - As little centralized information as possible (because centralization of information implies synchronization to access it).
 - As much parallelism as possible (when information *must* be exchanged, then make sure as little components are affected as possible).
 To keep the complexity of the runtime to a reasonable minimum, the following requirements have to be met as well:
 - A component may terminate, therefore potentially not being able to participate in synchronous communication or receive messages. Experimentation showed that the system that ensures that termination is broadcast to all peers should be kept simple (earlier systems depended on a state-machine with assumptions about the order in which messages were exchanged, which greatly complicates the messaging subsystem of the runtime).
 - Messages should arrive in order (with some exceptions). As we'll see later in this document we have different types of messages. Reasoning about a component's operational state becomes much simpler if we can assume that the transmission of messages between components is ordered.
 As we will see there are several types of messages that can be exchanged. Among them we have:
 - Data messages: these messages contain the data that is "transmitted from and to PDL code". For each `put` a data message is annotated by the runtime and sent along to the receiving component, which will then hopefully retrieve the data with a `get`. These messages are conceptually sent over channels.
 - Sync messages: these messages are sent between components to communicate their consensus-state. These messages are not necessarily associated with channels.
 - Control messages: these messages are sent between components to ensure that the entire runtime is reliably facilitating data exchange. That is: they ensure that the language is working as intended. As an example: sending a port to a different component requires a bit of bookkeeping to ensure that all involved components are aware of the port exchange.
 The remainder of this document tries to describe the various technical aspects of synchronous communication and component orchestration.
 ## Brief Description of Schedulers
 Each component conceptually has its own thread of execution. It is executing a program with at any given point in time a particular memory state. In reality there are a limited number of threads that execute components. Making sure that components are scheduled correctly is based on the fact that components are generally executing programs that are blocked at some point: a message needs to be received, or a port is blocked so we cannot send any information. At that point a component is considered "sleeping". Should another component, scheduled on a particular thread, send a message to this sleeping component, then it is "woken up" by putting it into the execution queue.
 The job of the scheduler is then to: execute a component scheduled for execution, wait until a component is scheduled, or shut down in case there are no more components to execute.
 The details of the execution queue (currently rather simplistically implemented) is not of interest. What is of interest is that a component can only be scheduled once.
 ## Creation of Channels
 Within PDL code it is possible to create new channels. And so a component will always (that is to say: for now) create both endpoints of the channel, hence own both endpoints of the channel upon creation. Identifiers for these ports are generated locally (we don't want to resolve to having to synchronize on some kind of global port ID counter).
 As these IDs are generated locally there is no real technical challenge, but note that ports at different components may have the same port ID.
 ## Creation of Components
 Within PDL code it is possible to create components. Upon their creation they can be given endpoints of channels. Hence at component creation we are changing the configuration of the communication graph. All of the relevant components need to be informed about the port changing ownership.
 Here we run into a plethora of problems. The other endpoint might have been given away to another created component. The other endpoint may have already been used in communication, such that we already have messages underway for the port we're trying to give to a newly created component. We may have that the local port ID assigned by the creating component is not the same as the local port ID that the newly created component may want to assign to it. We may have that this port has been passed along multiple times already, etc.
 We cannot help that messages have already arrived, or are in transit, for the transferred port. But we can make some assumptions that simplify the transfer of ports. As a first one, we have that the creating component decides when the created component is scheduled for execution. We'll choose not to execute it initially, such that we can be sure that it will not send messages over its ports the moment is created. To further simplify the problem, we have assumed that messages arrive in order. So although messages might still be underway for the transferred ports, if we ask the sender to stop sending, and the sender blocks the port and acknowledges that it has received this command. Then the moment the creator receives the acknowledgement it is certain that it has received all messages intended for the transferred ports.
 And so here we have our first control protocol. If a port is transferred then we might have:
 . That the peer port is transferred to the new component as well. All is fine and we can take care of the exchange immediately.
 . That the peer port stays with the creating component. Here all is fine as well, everything is running in a single thread of execution so we diligently do our bookkeeping on the data associated with the port and the channel and we can transfer the port.
 . The peer port is already owned by a different component. Here we need to have a slightly more complicated protocol
 In this last case we take the following actions, `C` for creating component, `N` for newly created component, and `P` for the peer component that holds the other port of the same channel.
 . `C` transfers the port to the newly created component `N`, and ask it to come up with a new ID for that port. The port had an ID that was decided by its old owner `C`, and now has one that is agreeable with component `N`.
 . `C` sends a control message `PeerChangeBlockPort` to the peer `P`.
 . `P` receives the `PeerChangeBlockPort` message. It causes the peer port to be temporarily blocked. `P` may still continue executing its code, but the moment it wishes to send something over this port it is forced to block its execution. In response `P` sends an `Acknowledge` message back to `C`.
 . `C` waits for the `Acknowledge` message of `C`. Since the `Acknowledge` message was sent after the last data message that `P` sent to the port under consideration, and because `P` has blocked that port, we are certain that we received all messages. We transfer these messages to `N`.
 . Note that there may be multiple ports being transferred from `C` to `N`, each with a potentially different peer. And so here `C` waits until steps 2 through 4 are completed for all of the transferred ports.
 . Once `C` has received all of the `Acknowledge` messages it was waiting for, it will send a `PeerChangeUnblockPort` message to each peer `P`. This message contains the new port ID, such that `P` can unblock its port, and continue sending messages over this channel, but now correctly arriving at `N`. Additionally, `C` will now schedule the new component `N`.
 There is a bit of extra overhead here with the `PeerChangeBlockPort` -> `Acknowledge` -> `PeerChangeUnblockPort` sequence with respect to other possible schemes. But this one allows `P` to continue executing its code as long as it does not use its blocked port. It also ensures that messages arrive in order: they're all collected by `C`, given to `N`, and only then may `P` continue creating messages to `N`, hence arriving after the earlier messages have been handed off to `N`.
 ## Managing the Lifetime of Components
 Components are created by other components or by the API. Note that the API may for intents and purposes be viewed as a component. It may own and create channels, and it may create components. They start
 ## Synchronization of
@@ \ No newline at end of file @@

src/runtime2/tests/mod.rs

➞

Show inline comments

@@ @@ -84,6 +84,52 @@ fn test_component_communication() { @@
     create_component(&rt, "", "constructor", no_args());
+}
 #[test]
 fn test_intermediate_messenger() {
     let pd = ProtocolDescription::parse(b"
     primitive receiver<T>(in<T> rx, u32 num) {
         auto index = 0;
         while (index < num) {
             sync { auto v = get(rx); }
             index += 1;
+        }
+    }
     primitive middleman<T>(in<T> rx, out<T> tx, u32 num) {
         auto index = 0;
         while (index < num) {
             sync { put(tx, get(rx)); }
             index += 1;
+        }
+    }
     primitive sender<T>(out<T> tx, u32 num) {
         auto index = 0;
         while (index < num) {
             sync put(tx, 1337);
             index += 1;
+        }
+    }
     composite constructor_template<T>() {
         auto num = 0;
         channel<T> tx_a -> rx_a;
         channel tx_b -> rx_b;
         new sender(tx_a, 1);
         new middleman(rx_a, tx_b, 2);
         new receiver(rx_b, 3);
+    }
     composite constructor() {
         new constructor_template<u16>();
         // new constructor_template<u32>();
         // new constructor_template<u64>();
+    }
     ").expect("compilation");
     let rt = Runtime::new(1, true, pd);
     create_component(&rt, "", "constructor", no_args());
+}
 #[test]
 fn test_simple_select() {
     let pd = ProtocolDescription::parse(b"

0 comments (0 inline, 0 general)