diff --git a/docs/spec/overview.md b/docs/spec/overview.md index 4c9034d11097832bce1c2f1f26e541482b16f608..0fec260e9b6d187ba790b8dcfeaf83dde4d7d07b 100644 --- a/docs/spec/overview.md +++ b/docs/spec/overview.md @@ -11,4 +11,5 @@ Please view the following documents (in roughly this order): 1. tokens.md 2. definitions.md 3. statements.md -4. expressions.md \ No newline at end of file +4. expressions.md +5. validation.md \ No newline at end of file diff --git a/docs/spec/validation.md b/docs/spec/validation.md index 4fec475c04f86fd6bc652cbf9ec0e882de712c9a..a692b0a3de06993395e6c41bc5332094cd78bf76 100644 --- a/docs/spec/validation.md +++ b/docs/spec/validation.md @@ -4,15 +4,15 @@ So far we've defined the language in terms of its grammar. Especially for expres ## Scoping and Naming Rules -Without any further constraints one may define variables or types with the same name. So we introduce rules for the various identifiers that are found within a program. +Without us specifying any further constraints one may define variables or types with the same name. So we introduce rules for the various identifiers that are found within a program. Within the root of a module we are at the module scope. It's only parent is the global scope (whose contents we'll introduce later). The identifiers of type and procedure declarations within the module may not conflict with one another, or with any of the identifiers in the global scope. -Secondly, imports introduce identifiers into the module scope. Identifiers of procedures and types imported into a module may not conflict with one another, the global scope (which is implicitly already satisfied, unless the import becomes aliased), or the types defined within a module. In case an import produces an alias for a particular type or procedure, then that is the identifier that may not conflict. +Secondly, imports introduce identifiers into the module scope. Identifiers of procedures and types imported into a module may not conflict with one another, the global scope (which is implicitly already satisfied, unless the import becomes aliased), or the types defined within a module. The identifier we insert into the module scope depends on the type of import: if the import is aliased then we insert the alias, otherwise we insert the name of the imported type/procedure or last section of the module's name. Whenever we're parsing a procedure (function or component), we enter a new scope whose parent is the module scope. Block statements implicitly introduce a new child scope. Any defined variable's identifier will be added to the scope it is in, and may not conflict with another identifier in the same scope or parent scopes. -**note**: Look up if I made mistakes with labels. +The same rules apply to labeled statements. Albeit in a simpler form, as labeled statements can only appear in the bodies of procedures. Defining labels will add the label to the current scope. These labels may not conflict with labels in the same scope or labels in parent scopes. Various other groups of identifiers in the program may not conflict among themselves. These are: @@ -28,7 +28,23 @@ Types and procedures may be defined using polymorphic variables. Whenever we use 1. Specifying all of the polymorphic arguments. Note that one may still allow for type inference by using the `auto` type in the place of the polymorphic argument. 2. Specifying none of the polymorphic arguments, in this the compiler must assume that all of the polymorphic arguments are to be inferred (i.e. as if the programmer had written `auto` for each of the polymorphic arguments). -Any other case should be a compile-time error. +Any other case should be a compile-time error. Some examples: + +``` +// With a single argument +func maximum(T a, T b) -> T { + if (a > b) return a; + else return b; +} + +// Used as +u32 zero = 0; +maximum(zero, 5); // fine, no argument is specified, so `T` will be inferred. +maximum(zero, 5); // fine, practically the same as above +maximum(zero, 5); // fine, T is explicitly `u32` + +maximum(zero, 5); // error: mismatch in arguments specified and polymorphic variables on definition. +``` **note**: At the moment polymorphism in the language is rather underpowered until a reasonable and simple scheme for constraining them can be found. At the moment one can only specify one of the builtin types or struct/enum/union types of polymorphic arguments. Furthermore functions and procedures defined with polymorphic variables may not construct literals by using them. @@ -36,7 +52,7 @@ Any other case should be a compile-time error. ### Assignment -Assignment expressions may only be placed at the statement level. That is to say: viewing a series of expressions as a tree, where each node has a parent expression or a parent statement (e.g. the test expression of an if-statement), then assignment expressions may only have the expression-statement as a parent. +Assignment expressions may only be placed at the statement level. That is to say: viewing a complicated expressions as a tree, where each node has a parent expression or a parent statement (e.g. the test expression of an if-statement), then assignment expressions may only have the expression-statement as a parent. The left hand side of an assignment expression should be assignable. An expression is assignable if it is: @@ -47,21 +63,92 @@ The left hand side of an assignment expression should be assignable. An expressi ### Literals -An enum literal is valid if its `TypeRef` points to an enum declaration (either directly, or through a potentially aliased import) and the subsequent identifier is a variant declared in the type definition. +An enum literal is valid if its `TypeRef` points to an enum declaration (either directly, or through a potentially aliased import) and the subsequent identifier is a variant declared in the type definition. Some examples: + +``` +enum Answer { Yes, No }; + +// Used as: +auto a = Answer::Yes; // Valid +auto b = Answer::No; // Valid +Answer c = Answer::Yes; // Valid + +auto d = Answer; // Invalid: points to the type instead of a variant +auto e = Answer::DoesNotExist; // Invalid: no such variant defined on `Answer` +``` A union literal is valid if its `TypeRef` points (indirectly) to a union declaration, the subsequent identifier is a variant in the type definition, and if (where applicable) the number of embedded expressions matches the number of embedded types in the variant's definitions. +``` +union Value { Empty, Unsigned(u64), Signed(s64) }; + +// Used as: +auto a = Value::Empty; // Valid +auto b = Value::Unsigned(5); // Valid +Value c = Value::Signed(-123); // Valid + +auto d = Value; // Invalid: points to type instead of variant +auto e = Value::Empty(5); // Invalid: `Empty` has no embedded types +auto f = Value::Unsigned; // Invalid: Variant `Unsigned` expected 1 embedded value, none are provided +auto g = Value::Signed(-1, -5); // Invalid: Variant `Signed` expected 1 embedde value, 2 are provided +``` + A struct literal is valid if its `TypeRef` points (indirectly) to a struct declaration, and it provides an expression to initialize all of the struct's fields, and only initializes each field once. +``` +struct Pair { u32 left, u32 right } + +// Used as: +auto a = Pair{ left: 0, right: 1 }; // Valid +auto b = Pair{ right: 0, left: 1 }; // Valid + +auto c = Pair; // Invalid: Points to type instead of constructing literal +auto d = Pair{}; // Invalid: Not all fields specified +auto e = Pair{ left: 0 }; // Invalid: Not all fields specified +auto f = Pair{ left: 0, center: 5, right: 2 }; // Invalid: field `center` does not exist +auto g = Pair{ left: 0, left: 0, right: 2 }; // Invalid: field `left` specified more than once +``` + For all of these literals the rules for polymorphic arguments apply: if the type specifies polymorphic arguments, then all of them have to be specified or none of them have to be specified (for implicit type inference). ### Binding -A binding expression itself is only valid if it is nested in the expression tree under no expressions, or only binary expressions. The chain of parents must terminate at an if-statement or while-statements test expression. As a result, binding expressions may not be nested in one another. +Again viewing a complex expression as a tree: A binding expression is only valid if it is nested in the expression tree under a statement, or under a chain of parents that are all binary expressions of the logical-and variant. The chain of parents must terminate at an if-statement's or while-statement's test expression. As a result of these rules, binding expressions may not be nested in one another. A binding expression's left hand side is valid if it is a binding variable, or if it is a literal that may contain a binding variable. A binding variable is a variable that has not yet been declared using a `StmtLocalMem`. Such a variable will then be implicitly be declared at the binding location, and usable within the scope of the if-statement's then-block, or the while-statement's loop body. The same scoping rules apply to that scope: so binding variables may not be declared twice, nor may conflict to any variables declared in the scope or it's parents. -Furthermore, the parent of a binding variable may only be some kind of literal expression, or the left hand side of the binding expression itself. +The interpretation of the binding expression is that it tests for equality between the left hand side and the right hand side. Except for those places where a binding variable is used, here no equality test is applied. Rather, the value from the right hand side is assigned to the binding variable on the left hand side. Hence if you use an expression containing e.g. regular variable inside a binding expression, then no binding will occur. Instead, the resulting value from that regular variable expression will be checked for equality against whatever is on the right hand side of the binding expression. + +The parent of a binding variable may only be some kind of literal expression, or the left hand side of the binding expression itself. Some examples: + +``` +// Example only for a struct, but binding expressions can be applied to all +// kinds of literals (although, for some types, the binding then cannot contain +// any binding variables, and resolves to a test of equality between the left +// hand side and the right hand side). +struct Inside { u32 value }; +struct Outside { u32 left, Inside right }; + +// Used as: +auto v_inside = Inside{ value: 0 }; +auto v_outside = Outside{ left: 0, right: Inside{ value: 1 } }; + +// All valid: +if (let foo = v_outside && let bar = v_inside) { /* ... */ } +if (let foo = v_outside && (5 + 2 == 7) { /* ... */ } +while (let Inside{ value: foo } = v_inside) { /* ... */ } +if (let Outside{ left: 0, right: Inside{ value: foo } } = v_outside) { /* ... */ } +while (let Outside{ left: foo, right: Inside { value: bar }} = v_outside) { /* ... */ } +if (let Outside{ left: foo, right: bar } = v_outside) { /* ... */ } + +// Invalid: parent expression that is not `&&` +if (let foo = v_outside || (5 + 2 == 7)) { /* ... */ } +// Invalid: not nested under an if/while statement +let foo = v_outside; +// Invalid: binding variable is used in a more complicated expression +while (let Inside{ value: foo + 5 } = v_inside) { /* ... */ } + +``` ### Function Calls @@ -73,3 +160,85 @@ The `TypeRef` for a component call should resolve to either a builtin component, ## Builtin Procedures +Several functions are builtin procedures in the language. We'll shortly provide their meaning and their syntax here. Some of these functions cannot be expressed in PDL itself, so some ad-hoc syntax is used instead. + +### Get(...) + +The `get` function receives a value from an input port. This function may only be called in primitive components and may only be called within sync-blocks. If no value is present then the execution halts until a value becomes available. Upon receiving a value the execution forks in two. One branch will remain waiting for any potentially other messages, the other branch will actually obtain the sent message and continue execution. + +If a `get` function is called in a branch that has previously assumed that nothing will ever be sent through the port provided as argument, then that branch is considered inconsistent and will be removed. If a `get` function is called in a branch that has not decided on the firing state of a port then it will be forced to fire without forking of execution. + +The get function would be written in PDL as: + +``` +func get(in input_port) -> T { /* ... */ } +``` + +### Put(...) + +The `put` function may put a value into an output port. This function may only be called in primitive components and may only be called within sync-blocks. + +If a `put` function is called in a branch that has previously assumed that nothing will ever be sent through the port provided as argument, then that branch is considered inconsistent and will be removed. If a `put` function is called in a branch that has not decided on the firing state of a port then it will be forced to fire without forking of execution. + +The put function would be written in PDL as: + +``` +func put(out output_port, T value_to_put) -> T { /* ... */ } +``` + +**note**: Whether the `put` statement blocks until a corresponding `get` has been performed has not yet been decided. + +### Fires(...) + +The `fires` function will check if a port has previously been assumed to fire or not fire. If such a decision has not yet been reached then the execution forks. One branch assumes that the port will fire and the other branch assumes it will not. Otherwise this function returns the firing state of the port: true if it fires, false if it does not. + +This function may only be called in primitive components and may only occur inside sync-blocks. + +The `fires` function would be written in PDL, if the language would support function overloading, as: + +``` +func fires(in input_port) -> bool { /* ... */ } +func fires(out output_port) -> bool { /* ... */ } +``` + +### Create(...) + +The `create` function creates a value of the type `msg`. Its argument is the length of the `msg` type. It would be written in PDL as: + +``` +func create(u32 length) -> msg { /* ... */ } +``` + +**note**: This function will most likely be removed from the language at some point. This because the `msg` type will be removed at some point, and because it should be possible to reserve size for values of array-like types without setting the length of those values. + +### Length(...) + +The `length` function returns the length of arrays and strings. If PDL would support function overloading, it would be written as: + +``` +func length(T[] v) -> u32 { /* ... */ } +func length(string v) -> u32 { /* ... */ } +func length(msg v) -> u32 { /* ... */ } +``` + +**note**: As stated above, the `msg` type will likely be removed. The `string` implementation is tentative, so might be changed as well. + +### Assert(...) + +The `assert` function is used to prune execution branches within components. It works much like an assert would in another language: the assert statement excepts an expression as argument that resolves to a boolean value. If the boolean value is `true` then execution continues. If the boolean value is `false` then the execution branch is removed. + +The `assert` function may only appear in components (primitive or composite) and may only be called in sync-blocks. It would be written in PDL, if it would support the C-like "void" type, as: + +``` +func assert(bool test) -> void { /* ... */ } +``` + +**note**: Perhaps the name of this function will change in the future. + +## Statement Rules + +### Synchronous Blocks + +### Use of Builtin Procedures + +### Control-Flow Rules \ No newline at end of file