diff --git a/docs/runtime/01_runtime.md b/docs/runtime/01_runtime.md index f8eeb89ad3feaef71ef8ba402f41ff97e2549f77..b29808a2ad37dea3470d99caa50b1396e8b89cc1 100644 --- a/docs/runtime/01_runtime.md +++ b/docs/runtime/01_runtime.md @@ -11,11 +11,11 @@ Roughly speaking the project consists of the following parts: Once the interpreter hits a point where it needs to interact with the runtime (generally in order to communicate with another component) it will halt and emit a signal to the runtime. 3. - The runtime. This is the code that keeps track of the components, decides when they can run, where their messages should end up, bring various control algorithms (running behind the scene) to completion. + The runtime. This is the code that keeps track of the components, decides when they can run, where their messages should end up, and bring various control algorithms (running behind the scene) to completion. We'll not go into points 1 and 2 in this document. One may simply assume that at the language level there is support for all of the things that are implemented in the runtime: sync blocks, channels, ports, component creation, sending and receiving messages, etc. -Once such features are encountered in the interpreter, a signal will be emitted to the runtime. Handling these signals is discussed in this document. +Once such builtin features are encountered in the interpreter (e.g. the creation of a channel), a signal will be emitted to the runtime. A rough outline of the architecture, and handling these signals, is discussed in this document. ## Runtime Architecture @@ -23,7 +23,7 @@ The runtime's code essentially consists of: - The machinery that keeps track of the components. That is to say: there is some memory reserved for each of the components. And using some kind of component ID we can look up this memory in a registry. If the runtime finds that there are no more user-controlled components running (i.e. the ones that a programmer uses to create more components) and there are no more regular components running, then the runtime shuts down. - A bit of shared memory for all of the OS threads that will be managed by the runtime. This mainly consists of a work queue. This work queue contains the identities of the components that are scheduled for execution. -- A set of scheduler threads. They attempt to retrieve work from the work queue. This work is in the form of component IDs, which they use to retrieve the associated component and run its PDL code. They do this by invoking the interpreter on the component's current execution and memory state. Once a runtime signal (as mentioned above) as emitted by the interpreter, the scheduler thread will deal with it appropriately. +- A set of scheduler threads. They attempt to retrieve work from the work queue. This work is in the form of component IDs, which they use to retrieve the associated component and run its PDL code. They do this by invoking the interpreter on the component's current execution and memory state. Once a runtime signal (as mentioned above) is emitted by the interpreter, the scheduler thread will deal with it appropriately. - An auxilliary polling thread. Not of great importance, so a short description suffices: although most components react to one another, some components (e.g. a TCP socket) might have nothing to do until the OS instructs it to do something. This polling thread ensures that once there is something to do, the component is put back onto the work queue. As long as the runtime doesn't shut down there will be `T` threads executing `N` components. A component can either be running (by being executed by a scheduler thread), scheduled for execution (its ID is in the work queue), or sleeping. All of these states are exclusive. Maintaining the exclusivity of these states is of great importance! We never want to end up in a place where two threads are both modifying the code/memory state of the same component! @@ -32,15 +32,15 @@ A component will start its lifecycle as being put into the work queue (not exact Once the execution of a component is blocked, it will attempt to go to sleep. Meaning: it will be in a state where it is not running, but also not scheduled for execution. A component may then be woken up by a different component, or the polling thread, by sending the sleeping a component a message. To prevent components from scheduling a sleeping component multiple times, the memory of a component contains an atomic `sleeping` boolean. -It is instructive at this point to roughly explain how components are stored in memory. The components memory is roughly divided into two regionh. There is the publicly accessible memory of a component and a private memory region. The public part is accessible by all scheduler threads. So all of the memory in the public memory region is somehow behind a lock, or some kind of locking mechanism (and will include the concept of atomics into "some kind of locking mechanism" for now). Hence the aforementioned `sleeping` boolean lives in this region. Conversely, the private memory region of a component is only accessed by the scheduler thread that is running the component. So here we store things like the memory and execution state of its code. +It is instructive at this point to roughly explain how components are stored in memory. The components memory is roughly divided into two regions. There is the publicly accessible memory of a component and a private memory region. The public part is accessible by all scheduler threads. So all of the memory in the public memory region is somehow behind a lock, or some kind of locking mechanism (we will include the concept of atomics into "some kind of locking mechanism" for now). Hence the aforementioned `sleeping` boolean lives in this region. Conversely, the private memory region of a component is only accessed by the scheduler thread that is running the component. So here we store things like the memory state and execution state of the component. Returning to the idea of a component wishing to enter the "sleeping" state. The procedure in pseudocode is written as: ``` func go_to_sleep(this_component) { - // We are currently, executing, so our sleeping flag MUST be `false` + // We are currently executing, so our sleeping flag MUST be `false` assert(atomic::load(this_component.sleeping) == false); - atomic::store(this_component.sleeping, true); // we force the flag to false + atomic::store(this_component.sleeping, true); // we force the flag to true // Note that while we were trying to go to sleep, someone has sent us a new // message, but it did not see yet that we stored `false` in the sleeping // flag, so we need to check ourselves. @@ -71,9 +71,9 @@ Note that, because we cannot predict how the OS threads are scheduled, we can al ## General Messaging -With this rough architecture, components can send eachother messages. One will find three kinds of messages in the runtime (okay, four, but the last one is just to make the OS polling thread work): +With this rough architecture, components can send each other messages. One will find three kinds of messages in the runtime (okay, four, but the last one is just to make the OS polling thread work): -1. Data messages: these are the messages that are sent by `put` calls and received with `get` calls. As will be explained later in this document, more information is attached to data messages than the values given as argument to the `put` call. These messages will arrive in the target component's inbox. When the target component performs a call to `get` they're actually pulled out of the inbox. +1. Data messages: these are the messages that are sent by `put` calls and received with `get` calls. As will be explained later in this document, more information is attached to data messages than the values given as argument to the `put` call. These messages will arrive in the target component's inbox. When the target component performs a call to `get` they're pulled out of the inbox and transferred to the component's memory state, such that the PDL code can read the transferred values. 2. Control messages: to facilitate certain operations permitted within PDL, the scheduler thread may decide to send messages to other components. These messages are called control messages. We'll encounter them later in this document when describing component creation, transmitting ports, and closing channels. Different from data messages, which may linger in the component's inbox for a while, control messages are handled by the receiving component immediately. This is important for various control algorithms. 3. Sync messages: to facilitate the consensus algorithm, there will be messages initiated by the scheduler thread as well. That is to say: when the component sends data messages there will be information attached to the data message that facilitates a successful sync round. But apart from that when the components are all done executing their code they must somehow reach consensus. This is done through these sync messages. @@ -85,33 +85,35 @@ As we'll later also see, the concept of a directional channel *may* be a useful Most design choices in the runtime were based on the fact that the Reowolf language should facilitate easier programming over the internet. So although the entire system currently only considers components running on a single machine, those components were conceptually regarded as living on different machines. In other words: components were conceptually considered not to have shared memory they can both access. -This has some implications for channels. Certainly a component that sends a value must have it stored somewhere temporarily, and the component receiving it needs to keep it around as well. But the channel is not an entity that you'll find in memory. Rather there is one component owning one port, and when a message is `put` through it, it will arrive at another component owning the peer port; there is no shared memory that represents a message flowing through a channel. +This has some implications for channels. Certainly a component that sends a value must have it stored somewhere temporarily, and the component receiving it needs to keep it around as well. But the channel is not an entity that you'll find in memory. Rather there is one component owning one port, and when a message is `put` through it, it will arrive at another component owning the peer port; there is no memory sahred between components that will store a message flowing through a channel. -A multi-machine runtime also requires the runtime to embrace the asynchronous nature of components. `put`s are non-blocking and can be performed one after the other before the peer has performed a corresponding `get`. The language does not contain the concept of a "lock" such that two components can agree on who owns a shared bit of memory. Rather each component is executed in its own thread of execution, and for multiple components to coordinate their actions they must use the messaging facilities. In order to make this coordinates somewhat simple to reason about one of the design drivers of the runtime was to ensure that each message sent in a specific order from one component to another will arrive in that same order. +A multi-machine runtime also requires the runtime to embrace the asynchronous nature of components. `put`s are non-blocking and can be performed one after the other before the peer has performed a corresponding `get`. The language does not contain the concept of a "lock" such that two components can agree on who owns a shared bit of memory. Rather each component is executed in its own thread of execution, and for multiple components to coordinate their actions they must use the messaging facilities. In order to make this coordination-through-messages somewhat simple to reason about one of the design drivers of the runtime was to ensure that each message sent in a specific order from one component to another will arrive in that same order at the target component. -And so we have a multi-machine runtime where components running in their own thread can only coordinate through messages. As a result an ever-important in designing internal (control) algorithms is to consider something called "message crossing". Two components may decide to initiate a protocol at the same time, hence send eachother the exact same protocol-initiating message. +And so we have a multi-machine runtime where components running in their own thread can only coordinate through messages. As a result an ever-important consideration in designing internal (control) algorithms is something called "message crossing". Two components may decide to initiate a protocol at the same time, hence send each other the exact same protocol-initiating message (e.g. we have components `A` and `B`, and a protocol that requires an initiator to send a `Request` message, and then wait for a response in terms of a `Response` message, then we may have `A` and `B` both sending each other `Request` at the same time). Yet another result is that we decided to design the runtime without any globally unique component and/or port IDs. Certainly: on a single machine a component IDs allows one to retrieve a component's memory. But when sending a message to a component living on another machine, it may well be that we're sending it to a through a port that has the same port ID as ours, and targeting a component that has the same ID as ours. ## Control Algorithms -We'll now discuss several of the control algorithms. These control algorithms may be initiated by the scheduler threads when certain runtime signals are emitted by the interpreter, and are executed by sending messages. We'll talk about these messages as if they're sent from component to another component. For certain reasons (introduced later) these messages may be sent to ports of peer components, but that does not mean that the peer can `get` these messages: they are control messages that are intercepted and handled by a bit of code that lives above the interpreter. +We'll now discuss several of the control algorithms. These control algorithms may be initiated by the scheduler threads when certain runtime signals are emitted by the interpreter. The control algorithms are brought to completion by sending messages. We'll talk about these messages as if they're sent from component to another component (this is for the sake of clarity: in reality they're sent by one scheduler thread to the memory location reserved for the target component's inbox). Because messages may be relayed one or more times before they arrive at the intended receiver (we'll introduce this concept soon), most messages include their intended target port in some kind of message header. This is true for all data messages, and most control messages. Only when a component is certain about the identity of the receiving component can it send messages without a target port in a header. ### Changing Port Peers due to Component Creation -Components, when they're not in sync mode, may decide to create new components. Ports may be the arguments to these newly created components. The rule we place on such a kind of port transfer is that the component that is creating the new component fully relinquishes ownership of the transferred port, and after the component creation the new component owns that port. As an annotated example: +Components, when they're not in sync mode, may decide to create new components. Ports may be used as the arguments to these newly created components. The rule we place on such a kind of port transfer is that the component that is creating the new component fully relinquishes ownership of the transferred port, and after the new component is created, the new component owns that port. As an annotated example: ``` comp creator(in one_port) { channel another_port -> and_a_final_one; sync { - auto value = get(one_port); // legal - put(another_port); // legal + auto value = get(one_port); // legal, just receiving an integer + put(another_port, 1337); // legal, sending a value over an owned } // perform component creation - new some_other_component(one_port, and_a_final_one); + new some_other_component(one_port, and_a_final_one); // transferring two ports - sync put(one_port); // illegal! Port was transferred + sync get(one_port); // illegal! Port was transferred + sync put(another_port, 1338); // still legal, we still own this port + sync get(and_a_final_one); // also illegal, port was transferred. } ``` @@ -119,28 +121,28 @@ We have several runtime properties to uphold when we're transferring ports: - No globally unique port IDs, so the new component is allowed to choose new port IDs for the ports it is adopting ownership of. - The peers of the transferred ports may be unaware that a new component is created. In fact those peers may have already transferred messages to the instantiating component! As a design decision (the one that we find makes sense) any incoming, but unread, messages for a port are transferred along to the new component. -- Similarly to above: a peer of a transferred port needs to be aware at some point that its peer port has changed owner. +- Similarly to the above: a peer of a transferred port needs to be aware at some point that its peer port has changed ownership. - Together with the requirement that peers need to be aware of the transferred ports, we also need to maintain ordering in the sent messages that intend to arrive at that transferred port at some point in time. -Here we see the asynchronous nature of the runtime rear its head. Because the transferring of ports does not just happen to the receiving end of a port (hence message only eventually arrive at their correct destination). It may happen to the transmitting end of a port as well. What this means for the receiver is that it is never sure which component is its peer until it has recevied a data message that is annotated with the origin of the message. At that moment in time the peer of the port is known, but only until the end of the synchronous round. Because after the synchronous round it is perfectly possible for the port to be passed around again. +Here we see the asynchronous nature of the runtime rear its head. Because the transferring of ports does not just happen to the receiving end of a port (in which case we transfer already received messages, hence messages only arrive at their correct destination eventually). It may happen to the transmitting end of a port as well. What this means for the receiver is that it is never sure which component is its peer until it has recevied a data message that is annotated with the origin of the message. At that moment in time the peer of the port is known, but only until the end of the synchronous round. Because after the synchronous round it is perfectly possible for the port to be passed around again. For all of the requirements above, the internal control algorithm to transfer a port to a new component is as following: -1. The component that is creating the new component (we'll call the creator the instantiator component, and the created the new component) temporarily has access to the private memory of the new component. Reason being is that a component is always created on the same machine as the instantiator component. And so the first step it takes is to create new port IDs (that make sense for the newly created component, instead of for itself) and map the old port IDs to the new ones. +1. The component that is creating the new component (we'll call the creator the instantiator component, and the created one the new component) temporarily has access to the private memory of the new component. Reason being is that a component is always created on the same machine as the instantiator component. And so the first step it takes is to create new port IDs (that make sense for the newly created component, instead of for the instantiator component) and map the old port IDs to the new ones. 2. The component transfers all of the metadata associated with the port, and transfers all of the messages that are targeted at those transferred ports to the new component. 3. For each transferred port the instantiator sends a `PortPeerChanged_Block` control message to the peers. This message instructs the peer that the port should be temporarily blocked. Any component that tries to send a message through that port enters a blocked state that can only be lifted if the corresponding `PortPeerChanged_Unblock` control message is sent. At the same time the instantiator sets up a special bit of code that will relay all incoming messages from that peer to the new component. We've mentioned earlier that all messages will have a target port. So when messages arrive at the instantiator component that need to be relayed, the instantiator component will modify the target port to the new component's chosen port ID. 4. Once a peer has received a `PortPeerChanged_Block`, it will, as stated above, stop sending messages over that channel. Not only data messages, but control messages as well. This also means that if the other component cannot start transferring ports itself. In any case, it will respond with an `Ack`nowledgement back to the instantiator component. 5. The instantiator component waits until it has received an `Ack` for all of the `PortPeerChanged_Block` message it has sent. This is done such that we're sure that we've received all of the messages that are actually intended for the new component (because while the new component is being created the peer may still be sending messages intended for the new component, but sent to the instantiator component). As a result, the new component will have all of the data messages in the inbox in the order in which they were sent, therefore maintaining the runtime property of message ordering. 6. When all of the `Ack`s are received, the instantiator component will remove the bit of code that relays all of the messages and will schedule the new component for execution. At this point the instantiator component will no longer access the private memory of the new component. Since the instantiator component is aware of the new component's ID and the new port IDs for all of the transferred ports, it will send `PortPeerChanged_Unblock` messages to all of the peer components. This message will also contain the new component's ID and its port ID. -7. The peers, upon receiving the `PortPeerChanged_Unblock` message, will update the metadata of their ports such that they point to the new components IDs. They will also unblock the port such that messages can be sent again. +7. The peers, upon receiving the `PortPeerChanged_Unblock` message, will update the metadata of their ports such that they point to the new component's ports. They will also unblock the port such that messages can be sent again. -With this control algorithm, all peers are now are of the new port's position. We've also maintained message ordering for the new component. Although it was mentioned in point (4), we'll mention it here to be extra clear: creating a new component will be blocked until all of the transferred ports are unblocked. If we don't do this a data/control message may end up at the wrong component. +With this control algorithm, all peers are now aware of the new port's position. We've also maintained message ordering for the message sent to the new component. Although it was mentioned in point (4), we'll mention it here to be extra clear: creating a new component will be blocked until all of the transferred ports are unblocked. If we don't do this a data/control message may end up at the wrong component. -Likewise we see the asynchronous nature of ports: the peers are eventually consistent. To maintain this consistence it is important to modify port IDs when we're relaying messages. +Likewise we see the asynchronous nature of ports: the peers are eventually consistent. This is why we stressed earlier that almost all messages have their targeted port in their message header. This is needed such that a component like the instantiator discussed above knows when to relay messages. In this process the relaying component will also update the target port ID in the header to the new port ID. ### Shutting Down Components -A component will require a bit of memory to run. So when we're done executing a component (either because it has crashes, or because its program has terminated) we would like to release this memory again. Earlier we mentioned that components send messages by accessing an inbox in the public memory region of a component. This memory will, ofcourse, be freed as well. So we need to make sure that when a component shuts down, that all of its peers will somehow be notified that they can never send messages to that terminated component again. +A component will require a bit of memory to run. So when we're done executing a component (either because it has crashes, or because its program has terminated) we would like to release this memory again. Earlier we mentioned that components send messages by accessing an inbox in the public memory region of a component. This memory will, ofcourse, be freed as well. So we need to make sure that when a component shuts down, all of its peers will somehow be notified that they can never send messages to that terminated component again. In order to do so we have another control protocol. We'll extend this protocol when we discuss encountering crashing components, but we'll introduce a simpler variant here. The protocol is relatively simple. For each of the ports that are not yet closed and are owned by the terminating component we will: @@ -149,7 +151,7 @@ In order to do so we have another control protocol. We'll extend this protocol w 3. The peer of the port, upon receiving a `ClosePort` message, will mark the port as being closed in its metadata. From that point onwards, any attempt to `put` or `get` on that port will result in the peer component crashing. In response to the `ClosePort` message, the peer component will send an `Ack`. There is one exception, and that is when the peer component itself already initiated a `ClosePort` control algorithm for that port. In that case the incoming `ClosePort` message is treated like an `Ack`. 4. The terminating component will wait until all of the `Ack`s have arrived (or crossing `ClosePort` messages, as stated in point (3)). Once they do, they will instruct the runtime to remove the component from memory. -Note here that we have to be careful and send annotate the `ClosePort` message with the target port. We'll wait if the port is already blocked, but it may be that we have the `ClosePort` message crossing with a `PortPeerChanged_Block` message. Which implies that our `ClosePort` message will be relayed by the peer component. +To reiterate: we have to be careful and annotate the `ClosePort` message with the target port. The terminating component will delay sending a `ClosePort` message if the port is blocked, but it may be that we have the `ClosePort` message crossing with a `PortPeerChanged_Block` message. Which implies that our `ClosePort` message will be relayed by the peer component. ### Transferring Ports through Data Messages @@ -193,15 +195,15 @@ To facilitate this, we'll follow roughly the same procedure as when we're transf Note that it is perfectly okay to send closed ports. The adopting component will receive this component together with the information that the port is closed. In this way, if the adopting component attempts a `put` or `get` on that received component, it will crash. -We'll enforce a second rule upon transferring ports. Namely that ports transferred in a synchronous round may not have been used in `get` or `put` operations. I'm certain that it is possible to come up with a set of rules that will make this possible. But the protocol for transferring components over channels is a lot easier if we disallow this. For this reason we'll introduce a field in the metadata for each port that registers when the port was last used. If the relinquishing component attempts to transfer a port that has been used, then it will crash. +We'll enforce a second rule upon transferring ports. Namely that ports transferred in a synchronous round may not have been used in `get` or `put` operations. I'm certain that it is possible to come up with a set of rules that will make this possible. But the protocol for transferring components over channels is a lot easier if we disallow this. For this reason we'll introduce a field in the metadata for each port that registers when the port was last used. If the relinquishing component attempts to transfer a port that has been used within the same sync round, then it will crash. -As for transferring ports to a newly created component, we want to ensure that all of the messages intended for that port arrive in the correct order at the adopting component. +Like before we want to ensure that all messages intended for the transferred port arrive in the correct order at the adopting component. -That said: the control protocol for transmitting ports proceeds as following: +And so the control protocol for transmitting ports proceeds as following: 1. The relinquishing component will first make sure that none of the ports are blocked. If the ports are blocked then it will sleep until the ports become unblocked. As stated above the relinquishing component will also make sure that the ports were not previously used within the synchronous round. 2. The relinquishing component will send `PortPeerChanged_Block` message to all of the peers of the ports that will be transferred. However, in this case it will not relay any messages to the new component, they will still pile up in the relinquishing component's inbox. -3. The peers, upon receiving the `PortPeerChanged_Block` message, will proceed as they would in the case where ports werer transferred to a new component: they'll block the port and send an `Ack`. +3. The peers, upon receiving the `PortPeerChanged_Block` message, will proceed as they would in the case where ports were transferred to a new component: they'll block the port and send an `Ack`. 4. The relinquishing component will wait until all of the expected `Ack` message are received. Once they are received the component will wait until the port the message will travel through becomes unblocked (that is: the port that is used to transfer the ports to the adopting component). 5. The relinquishing component will send the data message containing the transferred ports to the adopting component. It will annotate this message with a list containing `(tranferred port ID, peer component ID, peer port ID)` triples. Note that since those peer ports are blocked, they will not be transferred in the meantime. This is essential for the next step. 6. The adopting component will receive the annotated data message containing the transferred ports. For each transferred port it will decide upon a new port ID. @@ -222,9 +224,9 @@ We'll first consider that a component may crash inside our outside of a synchron Before discussing these cases, it is important to remember that the entire runtime has components running in their own thread of execution. We may have that the crashing component is unaware of its peers (due to the fact that peer ports might change ownership at any point in time). We'll discuss the consensus algorithm in more detail later within the documentation. For now it is important to note that the components will discover the synchronous region they are part of while the PDL code is executing. So if a component crashes within a synchronous region before the end of the sync block is reached, it may be possible that it will not discover the full synchronous region it would be part of. -Because the crashing component is potentially unaware of the component IDs it will end up notifying that is has failed. We can not design the crash-handling algorithm in a way such that the crashing component notifies the peers of when they have to crash. We'll do the opposite: the crashing component simply crashes and somehow attempts to notify the peers. Those peers themselves decide whether they have to crash in response to such a notification. +Because the crashing component is potentially unaware of the component IDs it will end up notifying that is has failed, we can not design the crash-handling algorithm in such a way such that the crashing component notifies the peers of when they have to crash. We'll do the opposite: the crashing component simply crashes and somehow attempts to notify the peers. Those peers themselves decide whether they have to crash in response to such a notification. -For this reason, it does not make a lot of sense to deal with component failure through the consensus algorithm. Dealing with the failure through the consensus algorithm makes sense if we can find the synchronous region that we would discovery if we were able to fully execute the sync block of each participating component. And so we'll opt to deal with failure on a peer-by-peer basis. +For this reason, it does not make a lot of sense to deal with component failure through the consensus algorithm. Dealing with the failure through the consensus algorithm only makes sense if we can find the synchronous region that we would have discovered if we were able to fully execute the sync block of each participating component. As explained above: we can't, and so we'll opt to deal with failure on a peer-by-peer basis. We'll go back to the four cases we've discusses above. We'll change our point of view: we're now considering a component (the "handling component") that has to deal with the failure of a peer (the "crashing component"). We'll introduce a small part of our solution a-priori: like a component shutting down, a failing component will simply end its life by broadcasting `ClosePort` message over all of its owned ports that are not closed (and the failing component will also wait until all of those ports are not blocked). @@ -232,12 +234,22 @@ In the first case, we're dealing with a failing component while the handling com In the second case we have that the peer component died before we ourselves have entered the synchronous block. This case is somewhat equivalent to the case we described above. The crashing component cannot have sent the handling component any messages. So we mark the port as closed, potentially failing in the future if they end up being used. -Next up is the third case, where both the crashing component and the handling component were both in the same synchronous round. Like before we mark the port as closed and future use will cause a crash. The difference is that the handling component may be blocked on attempting to `get` from a port which the crashing component now indicates is closed, perhaps we might have already performed successful `get` operations. In that case the handling component should crash: the crashing component can never submit its local solution, so the synchronous round can never succeed! And so we need to have metadata stored for the component that tracks if the port was used in the synchronous round. If the component is used in the synchronous round and a `ClosePort` message comes in, then the component should crash as well. +Next up is the third case, where both the crashing component and the handling component were both in the same synchronous round. Like before we mark the port as closed and future use will cause a crash. The difference is that the handling component may be blocked on attempting to `get` from a port which the crashing component now indicates is closed, perhaps we might have already performed successful `get`/`put` operations. In that case the handling component should crash: the crashing component can never submit its local solution, so the synchronous round can never succeed! And so we need to have metadata stored for the component that tracks if the port was used in the synchronous round. If the component is used in the synchronous round and a `ClosePort` message comes in, then the component should crash as well. The fourth case is where the failing component crashes *after* the handling component finished its sync round. This is an edge cases dealing with the following situation: both the handling as the crashing component have submitted their local solution to the consensus algorithm. The crashing component receives a global solution, finishes the sync round, and then crashes, therefore sending the `ClosePort` message to the handling component. The handling component, due to the asynchronous nature of the runtime, receives the `ClosePort` message before the global solution has a chance to reach the handling component. In this case, however, the handling component should be able to finish the synchronous round, and it shouldn't crash. Here is where we arrive at the protocol for dealing with component crashes. To let the handling component deal with this last case, we'll let the crashing component send the `ClosePort` messages together with a boolean indicating if it crashed inside, or outside of a synchronous block. -If the crashing component sends `ClosePort` together with the indication that it crashed inside of the synchronous block, then we're dealing with case 3 *if* there are messages in the inbox, or if the handling component uses the closed port after receiving the `ClosePort` message. But if the crashing component sends `ClosePort` together with the indication that it crashed outside of a synchronous block, then we check if we have performed any operations on the synchronous port. If we have, then we crash. If we haven't, then the handling component must be dealing with a synchronous round that succeeds in the future. +If the crashing component sends `ClosePort` together with the indication that it crashed inside of the synchronous block, then we're dealing with case 3 *if* there are messages in the inbox, or if the handling component uses the closed port after receiving the `ClosePort` message. But if the crashing component sends `ClosePort` together with the indication that it crashed outside of a synchronous block, then we check if we have performed any operations on the port in the synchronous round. If we have performed operations *and* have received messages from that component, then apparently the synchronous round will succeed. So we will not immediately crash. Otherwise we will. -In this way we modify the control algorithm for terminating components. Now we're able to deal correctly with crashing components. \ No newline at end of file +In this way we modify the control algorithm for terminating components. Now we're able to deal correctly with crashing components. + +## Sync Algorithm + +A description of the synchronous algorithm is present in different documents. We will mention here that central to the consensus algorithm is that two components agree on the interactions that took place over a specific channel. In order for this to happen we'll send along a lot of metadata when trying to reach consensus, but here we're just concerned with attempting to match up the two ends of a channel. + +A port is identified by a `(component ID, port ID)` pair, and channel is a pair of those identifying pairs. So to match up the two ends of a channel we would have to find a consistent pair of ports that agree on who their peers are. However, we're dealing with the problem of eventual consistency: `put`ting ports never know who their peer is, because the sent message might be relayed. However, `get`ting ports *will* know who their peer is for the duration of a single synchronous round once they've received a single message. + +This is the trick we will apply in the consensus algorithm. If a channel did not see any messages passing through it, then the components that own those ports will not have to reach consensus because they will not be part of the same synchronous region. However if a message did go through the channel then the components join the same synchronous region, and they'll have to form some sort of consensus on what interaction took place on that channel. + +And so the `put`ting component will only submit its own `(component ID, port ID, metadata_for_sync_round)` triplet. The `get`ting port will submit information containing `(self component ID, self port ID, peer component ID, peer port ID, metadata_for_sync_round)`. The consensus algorithm can now figure out which two ports belong to the same channel. \ No newline at end of file diff --git a/docs/runtime/02_consensus_1.md b/docs/runtime/02_consensus_1.md index cd652e9a0088b178c80d30c2f237ae44c5aa9a4d..ca914cd0d0ecd51e49fe61ffaffed49622636ac1 100644 --- a/docs/runtime/02_consensus_1.md +++ b/docs/runtime/02_consensus_1.md @@ -41,7 +41,7 @@ As the component enters a `sync` block, it has only one possible execution. But +--> ... ``` -This corresponds to the following PDL code: +This tree was formed by executing the following following PDL code: ``` primitive some_component(out tx, in rx) { @@ -54,7 +54,7 @@ primitive some_component(out tx, in rx) { } ``` -We can see the reason for calling the execution tree a "tree". There are several things to note about the execution tree: Firstly that some executions have been completed and form a complete trace, that is: starting from the "sync start" a complete trace may be represented by the line running to the "sync end". Conversely, there is one trace that is incomplete: there is a trace waiting at the `get` for a message. We'll call a place where the execution splits into multiple branches/executions a "branching point". +We can see the reason for calling the execution tree a "tree". There are several things to note about the execution tree: Firstly that some executions have been completed and form a complete trace, that is: starting from the "sync start" a complete trace may be represented by the path running to the "sync end". Conversely, there is one trace that is incomplete: there is a trace waiting at the `get` for a message. We'll call a place where the execution splits into multiple branches/executions a "branching point". Note that the branching points can in the *general case* only be discovered at runtime. Any code may have control flow points like `if` statements, or `while` loops. Consider the following code: @@ -110,18 +110,18 @@ If the sender is connected to the receiver, then the sender will send anywhere b For this reason a `get` branching point needs to be kept around for the entire duration of the sync block. The runtime will always need to have a copy of the component's memory and execution state the moment it encountered a `get` instruction, because it might just be that another component (in perhaps a new fork, which we cannot predict) will send it another message, such that it needs to produce a new forked execution. -A `get` operation is also a "blocking operation": in the *general case* the component needs to know the value produced by the `get` operation in order to continue its execution (perhaps more specifically: the first time a `read` operation is performed on the variable that will store the transmitted message). Consider the simple case where the received message contains a boolean that is used in the test expression of an `if` statement: we'll need to have actually received that boolean before we can decide which control flow path to take. Speculating on the contents of messages is too computationally expensive to be taken seriously. A put operation is not a blocking operation: the message is sent and the component continues executing its code. +A `get` operation is also a "blocking operation": in the *general case* the component needs to know the value produced by the `get` operation in order to continue its execution (perhaps more specifically: the first time a read operation is performed on the variable that will store the transmitted message). Consider the simple case where the received message contains a boolean that is used in the test expression of an `if` statement: we'll need to have actually received that boolean before we can decide which control flow path to take. Speculating on the contents of messages is too computationally expensive to be taken seriously. A put operation is not a blocking operation: the message is sent and the component continues executing its code. -We've touched upon control flow points multiple times. We'll touch upon some aspects of control flow here, to more easily introduce the algorithm for finding consensus later. A component is fully described by its memory (i.e. all of the memory locations it has access to through its variables) and execution state (i.e. its current position in its code). So once a component encounters a control flow point, it can only take one control flow path. The calling of certain impure functions (e.g. retrieving a cryptographically secure random number) does not change this fact. Note that receiving values from other components might change a component's memory state, hence influence the control flow path it takes in the subsequent forked execution. Conversely, a component sending a value might influence another component's memory state. +We've touched upon control flow points multiple times. We'll touch upon some aspects of control flow here, to more easily introduce the algorithm for finding consensus later. A component is fully described by its memory (i.e. all of the memory locations it has access to through its variables) and execution state (i.e. its current position in its code). So once a component encounters a control flow point, it can only take one control flow path. Which path may be computed from its memory state. The calling of certain impure functions (e.g. retrieving a cryptographically secure random number) does not change this fact. Note that receiving values from other components might change a component's memory state, hence influence the control flow path it takes in the subsequent forked execution. Conversely, a component sending a value might influence another component's memory state. So before treading into more detail, here we've found that in the general case: -- A interior of a sync block demarks the place where speculative execution may occur. +- Speculative execution may only occur inside sync blocks. - Speculative execution implies that we end up with an execution tree. -- A path through the execution tree that reaches the end of the sync block is called a trace, and represents a valid execution of the sync block for the component (but perhaps not for a peer it interacted with). +- A path through the execution tree that reaches the end of the sync block is called a complete trace, and represents a valid execution of the sync block for the component (but perhaps not compatible with a particular complete trace of a peer it interacted with). - The set of traces produced by a component in its sync block can practically only be discovered at runtime. -- A `get` operation is necessarily a blocking operation that always incurs a branching point. A `put` operation is a nonblocking operation that does not incur a branching point. -- The trace of a component is influenced by the messages it has received. +- A `get` operation is necessarily a blocking operation that always incurs a branching point. A `put` operation is a nonblocking operation that will not fork into multiple executions. +- The control flow path of a trace of a component may be influenced by the messages it has received. ## Towards a Consensus Algorithm @@ -134,14 +134,14 @@ The key to the consensus problem is somehow discovering the ways in which the co We'll not consider the last point, as this is essentially a gossip protocol, and the appropriate gossip protocol varies with the requirements of the user (e.g. robust to failure, memory efficient, runtime efficiency, message complexity). We define some terms to make the following discussion easier: -- "component graph": A graph where each node is a component, and each channel between components forms the edges to the graph. +- "component graph": A graph where each node is a component, and each channel between components forms am edge in that graph. - "sync region": The group of components that have interacted with one another and should agree on the global consensus solution. - "local solution": A complete trace of a component. For the component this is a valid local solution, but might not be part of a global solution. - "global solution": A set of traces, one for each of the components in the sync region, that all agree on the interactions that took place between the components in the sync region. We'll now work incrementally to the final consensus algorithm, making it a bit of a story in order to explain the reasoning and intuition behind the consensus algorithm. -Suppose a component can somehow predict exactly which messages we're going to receive during the execution of its code, we'll assume that each received message has the appropriate `get` call associated with it. In this case we're able to produce the set of complete traces that a component produces by symbolically executing its code: we start out with the initial memory state, might perhaps do some explicit `fork`ing, know exactly which messages we receive and how they influence the control flow, and arrive at the end of the sync block. Hence each component can figure out independently which complete trace is the solution to its consensus problem. +Suppose a component can somehow predict exactly which messages it is going to receive during the execution of its code, we'll assume that each received message has the appropriate `get` call associated with it. In this case we're able to produce the set of complete traces that a component produces by symbolically executing its code: we start out with the initial memory state, might perhaps do some explicit `fork`ing, know exactly which messages we receive and how they influence the control flow, and arrive at the end of the sync block. Hence each component can figure out independently which complete trace is the solution to its consensus problem. However, as we've outlined above, we cannot know exactly which messages we're going to receive. We'll have to discover these messages while executing a component. The next best thing is to keep track of the values of the messages that we've received in a complete trace. Once we have complete traces for all of the interacting components, we can check that the received value corresponds to a sent value. e.g. @@ -222,7 +222,7 @@ We're already calling this information the "port mapping" (because we'll truly t This changes the way we can interpret the execution tree: each node is not only associated with the performed operation (`fork`, `put` or `get`), but also associated with a particular port mapping that indicates the influence of other components that allowed it to reach that exection node. We modify the port mapping per node in the following way: -- For a `fork`: we fork the execution as many times as needed, and for those forks we copy the port mapping of the part node. +- For a `fork`: we fork the execution as many times as needed, and for those forks we copy the port mapping of the ancestor node in the execution tree. - For a `put`: we transmit the current port mapping, and the transmitted value in the message. We then update the mapping from the sending port: that particular port ID now maps to the recently transmitted value. - For a `get`: we receive the transmitted port mapping and value. Note that this `get` might be a particular statement executed by multiple different forked executions (each with their own port mapping). And so for a `get` to succeed, we need the shared channels between the sender and the receiver to agree on their port mapping. If such a `get` succeeds, then it forks into a new execution where the receiving port now maps to the received value. @@ -256,7 +256,7 @@ primitive responder(in rx_a, in rx_b, out tx_c) { } ``` -Here, once the components have completed as much fork executions as possible, we'll have the following execution trees (and mappings). The square bracketed terms denote port mapping. The parenthesized terms correspond to the operations in the code, and the curly bracketed terms are the names for the traces (so we can refer to them in this document). +Here, once the components have brought as much forked executions to completion as possible, we'll have the following execution trees (and mappings). The square bracketed terms denote port mapping. The parenthesized terms correspond to the operations in the code, and the curly bracketed terms are the names for the traces (so we can refer to them in this document). ``` For the initiator: @@ -331,10 +331,10 @@ We'll now apply the last bit of trickery to this algorithm. Firstly keeping trac The reason this is a valid trick is because of the arguments made earlier regarding control flow being influenced by received messages. If we know how a component was influenced by external influences, then the control flow path it takes is deterministic, hence the content of the sent messages will be deterministic. Locally a component `A` may only describe the way it was influenced by its peer `B`, but `B` also records how it was influenced by its peers `C` and `D`. So transitively `A` will also know the indirect mutual influences between it and `C` and `D`. -Lastly, we can turn the list of `(port ID, branch number)` pairs into a true mapping `{port ID -> branch number}`, we do not actually need to keep the entire history around. The reason behind this is the fact that the `get` operation is blocking and requires the sent port mapping to be compatible with its execution node's port mapping. Consider the following cases: +Lastly, we can turn the list of `(port ID, branch number)` pairs into a true mapping `{port ID -> branch number}`, we do not actually need to keep the entire history around. The reason behind this is the fact that the `get` operation is blocking and requires the sent port mapping to be compatible with its execution node's port mapping. So when a `get` operation succeeds, it agrees that the executions of both sender and receiver are compatible up until that point. Continued execution only has to check that the subsequent interactions are compatible up until that point. Consider the following cases: - A `get` operation never receives a message: in that case we keep blocking indefinitely, we'll never get a complete trace, hence are prevented from finding a local solution. -- A `get` operation receives a message containing an incompatible port mapping: in that case we have the same case as above. The interpretation is different: the sender and the receiver did not agree on the control flow paths they took. +- A `get` operation receives a message containing an incompatible port mapping: in that case we have roughly the same case as above. The message is not accepted and we keep blocking indefinitely. The interpretation is different: the sender and the receiver did not agree on the control flow paths they took. - A `get` operation receives a message containing a compatible port mapping: in this case both components agree on the order of operations that took place for the message to be transferred. - A `put` operation is never received: again, we're fine. The `put`ting component might submit a local solution for a complete trace. But the mapping for that never-received message will also never appear in the supposed receiver's port mapping. Hence the two components will never agree on the control flow paths they took. @@ -363,5 +363,5 @@ There are a lot of practical issues with this consensus algorithm: 1. The fact that a `get` operation never knows when it will receive a new message requires us to keep a complete copy of the component's memory and execution state at that point. Hence for each `get` operation we're incurring a rather large memory overhead. 2. The fact that we never know if a received message can be discarded because it cannot be received by any of the `get` operations in the component's code. There may be another message coming in that causes a fork with a `get` operation that *can* receive this message. Hence we need to keep around *all* of the messages received in a synchronous round. -3. The incredible computational complexity of finding a global solution to the consensus algorithm. We need to check for each component all of its completed traces. For all of those `N` components, each supplying `T` traces (to simplify the math), we need to check each pair of traces. So the computational complexity is `(N*T)*(N*(T-1))/2`. +3. The incredible computational complexity of finding a global solution to the consensus algorithm. We need to check for each component all of its completed traces. For all of those `N` components, each supplying `T` traces (to simplify the math), we need to check each pair of traces. So the maximum computational complexity is `(N*T)*(N*(T-1))/2`. In reality this is a bit less, because we can very likely quickly eliminate certain traces. 4. Considering the previous points: the simplicity (especially when running over the internet) for a nefarious actor to incur computational overhead for the receiver. All it has to do is to keep sending messages to the receiver with an acceptable port mapping, but to never offer the consensus algorithm a valid local solution. Each accepted message will spawn a new fork at the receiver. \ No newline at end of file diff --git a/docs/runtime/03_consensus_2.md b/docs/runtime/03_consensus_2.md index 33509ae5adb683b2d35dbaa38bb7aa112757a59b..372b75601bf60763288636c557cdef7057558a59 100644 --- a/docs/runtime/03_consensus_2.md +++ b/docs/runtime/03_consensus_2.md @@ -136,7 +136,7 @@ And so the rule for joining a synchronous region was changed. Now it is simply: The example above hints at a second problem (figured out a long time ago by our unix overlords): there is no way within PDL code to say that we can perform a variety of behaviours triggered by the arrival of messages. For our `database` component above, we see that it can have multiple requesters connected to it, but there is no way to indicate that we can have valid behaviour for any one of them. So we introduce the `select` statement. It is a statement with a set of arms. Each arm has a guard, where we indicate which ports need to have incoming messages, and a block of code, that is executed when all of the indicated ports actually have received a message. -The select statement may only appear within async block. The arm's guard is formulated in terms of an expression or a variable declaration (that may contain `get` calls, but not `put` calls). The corresponding block of code is executed when all of the `get` calls in the guard have a message ready for them. In case multiple arm guards are satisfied then a random arms is chosen by the runtime for execution. +The select statement may only appear within a sync block. The arm's guard is formulated in terms of an expression or a variable declaration (that may contain `get` calls, but not `put` calls). The corresponding block of code is executed when all of the `get` calls in the guard have a message ready for them. In case multiple arm guards are satisfied then a random arms is chosen by the runtime for execution. So the select statement takes the form: @@ -164,6 +164,7 @@ while (still_waiting_for_a_message) { break; } + block_until_there_is_a_new_message(); } ``` @@ -177,4 +178,4 @@ Because the current consensus algorithm is a scaled down version of the previous Once a component has reached the end of the sync block it will submit its local solution (i.e. the port mapping) for validation by the consensus algorithm. If all components in the synchronous region have submitted a local solution, whose port mappings are pairwise consistent with one another, then we've found a global solution. With this global solution the components in the synchronous region are ordered to continue the execution of their PDL code. -As a small side-note: with speculative execution the consensus algorithm amounted to finding, for each component, a complete trace whose interactions are consistent with all of its peers. Without speculative execution we have the multi-party equivalent to an `Ack` message in the TCP protocol. (Yes, a lot of details of the TCP protocol are left out, but) in TCP both parties perform a best-effort attempt to ensure that a sent message has arrived at the receiver by the receiver `Ack`nowledging that the message has been received. In this cases the consensus algorithm performs this function: it ensures that the sent messages have all arrived at their peer. \ No newline at end of file +As a small side-note: with speculative execution the consensus algorithm amounted to finding, for each component, a complete trace whose interactions are consistent with all of its peers. Without speculative execution we have the multi-party equivalent to an `Ack` message in the TCP protocol. (Yes, this statement skips over a *lot* of details of the TCP protocol, but) in TCP both parties perform a best-effort attempt to ensure that a sent message has arrived at the receiver by the receiver `Ack`nowledging that the message has been received. In this cases the consensus algorithm performs this function: it ensures that the sent messages have all arrived at their peers in a single multi-party interaction. \ No newline at end of file diff --git a/docs/runtime/04_known_issues.md b/docs/runtime/04_known_issues.md index ef93ebc0f6feb444770e89d6044cee4e4594f4dc..2fa7b4437d7e3ce4074a471796d081418566c744 100644 --- a/docs/runtime/04_known_issues.md +++ b/docs/runtime/04_known_issues.md @@ -30,12 +30,16 @@ The current implementation of Reowolf has the following known issues: - The TCP listener component should probably do a `shutdown` before a `close` on the socket handle. It should also set the `SO_REUSEADDR` option. -- The way in which putting ports are ordered to block if the corresponding getter port's main inbox is full is rather silly. This lead to the introduction of the "backup inbox" as it is found in the runtime's code. There is a design decision to make here, but the current implementation is a bit silly. There are two options: (a) have an atomic boolean indicating if the message slot for an inbox is full, or (b) do away with the "main inbox" alltogether, and have an unbounded message queue. +- The way in which putting ports are ordered to block if the corresponding getter port's main inbox is full is rather silly. This led to the introduction of the "backup inbox" as it is found in the runtime's code. There is a design decision to make here, but the current implementation is a bit silly. There are two options: (a) have an atomic boolean indicating if the message slot for an inbox is full, or (b) do away with the "main inbox" alltogether, and have an unbounded message queue. - For practical use in components whose code supports an arbitrary number of peers (i.e. their code contains an array of ports that is used for communication and changes in size during the component's lifetime), the `select` statement somehow needs to support waiting on any one of those ports. - The compiler currently prevents one from using `sync` blocks (or corresponding `get` and/or `put` operations) in functions. They can only be used within components. When writing large programs this it makes it rather hard to re-use code: all code that interacts with other components can only be written within a sync block. I would advise creating `sync func`, `nonsync` func and regular `func`tions. Where: - `sync func`tions can only be called from within `sync` functions. They may not open new sync blocks, but may perform calls to `get`/`put`. These are useful to encapsulate sequences of `put`/`get` calls together with some common message-modifying code. - - `nonsync func`tions (or `async func`tions) may only be called outside of sync blocks, and may open new sync blocks themselves. They are useful to encapsulate a single interaction with other components. - - regular `func`tions. Are as useful as in any other language, but here we disallow calling `nonsync func`tions or `sync func`tions. \ No newline at end of file + - `nonsync func`tions (or `async func`tions) may only be called outside of sync blocks, and may open new sync blocks themselves. They are useful to encapsulate a single interaction with other components. One may also create new components here. + - regular `func`tions. Are as useful as in any other language, but here we disallow calling `nonsync func`tions or `sync func`tions. + +- The `Ack` messages that are sent in response to `PeerPortChanged_Block` messages should contain the sending components `(component ID, port ID)` pair in case the `PeerPortChanged_Block` message is relayed. When such an `Ack` message is received, the peer of the port must be updated before transferring the port to the new owner. + +- The compiler currently accepts a select arm's guard that is formulated as `auto a = get(get(rx))`. This should be disallowed. \ No newline at end of file