Files
@ f55ca167cf51
Branch filter:
Location: CSY/reowolf/docs/spec/validation.md - annotation
f55ca167cf51
14.6 KiB
text/markdown
expand section on language validation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df 8e4c0fc8b4df f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 f55ca167cf51 | # Validating the AST
So far we've defined the language in terms of its grammar. Especially for expressions we're still dealing with ambiguity in the interpretation of raw text, and with certain statements or expressions that are invalid in a particular context. We'll deal with these in several phases in this section. Note that some conditions for validity are delayed until the section on typing, as it is more appropriate to introduce those restrictions there.
## Scoping and Naming Rules
Without us specifying any further constraints one may define variables or types with the same name. So we introduce rules for the various identifiers that are found within a program.
Within the root of a module we are at the module scope. It's only parent is the global scope (whose contents we'll introduce later). The identifiers of type and procedure declarations within the module may not conflict with one another, or with any of the identifiers in the global scope.
Secondly, imports introduce identifiers into the module scope. Identifiers of procedures and types imported into a module may not conflict with one another, the global scope (which is implicitly already satisfied, unless the import becomes aliased), or the types defined within a module. The identifier we insert into the module scope depends on the type of import: if the import is aliased then we insert the alias, otherwise we insert the name of the imported type/procedure or last section of the module's name.
Whenever we're parsing a procedure (function or component), we enter a new scope whose parent is the module scope. Block statements implicitly introduce a new child scope. Any defined variable's identifier will be added to the scope it is in, and may not conflict with another identifier in the same scope or parent scopes.
The same rules apply to labeled statements. Albeit in a simpler form, as labeled statements can only appear in the bodies of procedures. Defining labels will add the label to the current scope. These labels may not conflict with labels in the same scope or labels in parent scopes.
Various other groups of identifiers in the program may not conflict among themselves. These are:
- The polymorphic variables of type or procedure definitions.
- The fields names of a struct.
- The variants of an enum.
- The identifiers of variants of a union (even though they might have had a different number of embedded values).
## Polymorphic Argument Rules
Types and procedures may be defined using polymorphic variables. Whenever we use these types to construct literals, or use functions to perform calls, we may specify these as polymorphic arguments. One has the choice of:
1. Specifying all of the polymorphic arguments. Note that one may still allow for type inference by using the `auto` type in the place of the polymorphic argument.
2. Specifying none of the polymorphic arguments, in this the compiler must assume that all of the polymorphic arguments are to be inferred (i.e. as if the programmer had written `auto` for each of the polymorphic arguments).
Any other case should be a compile-time error. Some examples:
```
// With a single argument
func maximum<T>(T a, T b) -> T {
if (a > b) return a;
else return b;
}
// Used as
u32 zero = 0;
maximum(zero, 5); // fine, no argument is specified, so `T` will be inferred.
maximum<auto>(zero, 5); // fine, practically the same as above
maximum<u32>(zero, 5); // fine, T is explicitly `u32`
maximum<u32, u64>(zero, 5); // error: mismatch in arguments specified and polymorphic variables on definition.
```
**note**: At the moment polymorphism in the language is rather underpowered until a reasonable and simple scheme for constraining them can be found. At the moment one can only specify one of the builtin types or struct/enum/union types of polymorphic arguments. Furthermore functions and procedures defined with polymorphic variables may not construct literals by using them.
## Expression Rules
### Assignment
Assignment expressions may only be placed at the statement level. That is to say: viewing a complicated expressions as a tree, where each node has a parent expression or a parent statement (e.g. the test expression of an if-statement), then assignment expressions may only have the expression-statement as a parent.
The left hand side of an assignment expression should be assignable. An expression is assignable if it is:
- An indexing expression with an assignable subject expression.
- A slicing expression with an assignable subject expression.
- A select expression with an assignable subject expression.
- A variable expression.
### Literals
An enum literal is valid if its `TypeRef` points to an enum declaration (either directly, or through a potentially aliased import) and the subsequent identifier is a variant declared in the type definition. Some examples:
```
enum Answer { Yes, No };
// Used as:
auto a = Answer::Yes; // Valid
auto b = Answer::No; // Valid
Answer c = Answer::Yes; // Valid
auto d = Answer; // Invalid: points to the type instead of a variant
auto e = Answer::DoesNotExist; // Invalid: no such variant defined on `Answer`
```
A union literal is valid if its `TypeRef` points (indirectly) to a union declaration, the subsequent identifier is a variant in the type definition, and if (where applicable) the number of embedded expressions matches the number of embedded types in the variant's definitions.
```
union Value { Empty, Unsigned(u64), Signed(s64) };
// Used as:
auto a = Value::Empty; // Valid
auto b = Value::Unsigned(5); // Valid
Value c = Value::Signed(-123); // Valid
auto d = Value; // Invalid: points to type instead of variant
auto e = Value::Empty(5); // Invalid: `Empty` has no embedded types
auto f = Value::Unsigned; // Invalid: Variant `Unsigned` expected 1 embedded value, none are provided
auto g = Value::Signed(-1, -5); // Invalid: Variant `Signed` expected 1 embedde value, 2 are provided
```
A struct literal is valid if its `TypeRef` points (indirectly) to a struct declaration, and it provides an expression to initialize all of the struct's fields, and only initializes each field once.
```
struct Pair { u32 left, u32 right }
// Used as:
auto a = Pair{ left: 0, right: 1 }; // Valid
auto b = Pair{ right: 0, left: 1 }; // Valid
auto c = Pair; // Invalid: Points to type instead of constructing literal
auto d = Pair{}; // Invalid: Not all fields specified
auto e = Pair{ left: 0 }; // Invalid: Not all fields specified
auto f = Pair{ left: 0, center: 5, right: 2 }; // Invalid: field `center` does not exist
auto g = Pair{ left: 0, left: 0, right: 2 }; // Invalid: field `left` specified more than once
```
For all of these literals the rules for polymorphic arguments apply: if the type specifies polymorphic arguments, then all of them have to be specified or none of them have to be specified (for implicit type inference).
### Binding
Again viewing a complex expression as a tree: A binding expression is only valid if it is nested in the expression tree under a statement, or under a chain of parents that are all binary expressions of the logical-and variant. The chain of parents must terminate at an if-statement's or while-statement's test expression. As a result of these rules, binding expressions may not be nested in one another.
A binding expression's left hand side is valid if it is a binding variable, or if it is a literal that may contain a binding variable. A binding variable is a variable that has not yet been declared using a `StmtLocalMem`. Such a variable will then be implicitly be declared at the binding location, and usable within the scope of the if-statement's then-block, or the while-statement's loop body. The same scoping rules apply to that scope: so binding variables may not be declared twice, nor may conflict to any variables declared in the scope or it's parents.
The interpretation of the binding expression is that it tests for equality between the left hand side and the right hand side. Except for those places where a binding variable is used, here no equality test is applied. Rather, the value from the right hand side is assigned to the binding variable on the left hand side. Hence if you use an expression containing e.g. regular variable inside a binding expression, then no binding will occur. Instead, the resulting value from that regular variable expression will be checked for equality against whatever is on the right hand side of the binding expression.
The parent of a binding variable may only be some kind of literal expression, or the left hand side of the binding expression itself. Some examples:
```
// Example only for a struct, but binding expressions can be applied to all
// kinds of literals (although, for some types, the binding then cannot contain
// any binding variables, and resolves to a test of equality between the left
// hand side and the right hand side).
struct Inside { u32 value };
struct Outside { u32 left, Inside right };
// Used as:
auto v_inside = Inside{ value: 0 };
auto v_outside = Outside{ left: 0, right: Inside{ value: 1 } };
// All valid:
if (let foo = v_outside && let bar = v_inside) { /* ... */ }
if (let foo = v_outside && (5 + 2 == 7) { /* ... */ }
while (let Inside{ value: foo } = v_inside) { /* ... */ }
if (let Outside{ left: 0, right: Inside{ value: foo } } = v_outside) { /* ... */ }
while (let Outside{ left: foo, right: Inside { value: bar }} = v_outside) { /* ... */ }
if (let Outside{ left: foo, right: bar } = v_outside) { /* ... */ }
// Invalid: parent expression that is not `&&`
if (let foo = v_outside || (5 + 2 == 7)) { /* ... */ }
// Invalid: not nested under an if/while statement
let foo = v_outside;
// Invalid: binding variable is used in a more complicated expression
while (let Inside{ value: foo + 5 } = v_inside) { /* ... */ }
```
### Function Calls
The `TypeRef` for a function call should resolve to either a builtin function, or a user-defined function. Function calls follow the rules for polymorphic arguments as outlined above. Furthermore the number of expressions given as arguments to the function should match the number of arguments in the definition of the function.
### Component "Calls"
The `TypeRef` for a component call should resolve to either a builtin component, or a user-defined component. Component calls, like function calls, must following the rules for polymorphic arguments and the arguments to the components themselves. Furthermore component "call" expressions may only be placed with a `StmtNew` as parent. That is to say: one doesn't really call components, instead one instantiates them.
## Builtin Procedures
Several functions are builtin procedures in the language. We'll shortly provide their meaning and their syntax here. Some of these functions cannot be expressed in PDL itself, so some ad-hoc syntax is used instead.
### Get(...)
The `get` function receives a value from an input port. This function may only be called in primitive components and may only be called within sync-blocks. If no value is present then the execution halts until a value becomes available. Upon receiving a value the execution forks in two. One branch will remain waiting for any potentially other messages, the other branch will actually obtain the sent message and continue execution.
If a `get` function is called in a branch that has previously assumed that nothing will ever be sent through the port provided as argument, then that branch is considered inconsistent and will be removed. If a `get` function is called in a branch that has not decided on the firing state of a port then it will be forced to fire without forking of execution.
The get function would be written in PDL as:
```
func get<T>(in<T> input_port) -> T { /* ... */ }
```
### Put(...)
The `put` function may put a value into an output port. This function may only be called in primitive components and may only be called within sync-blocks.
If a `put` function is called in a branch that has previously assumed that nothing will ever be sent through the port provided as argument, then that branch is considered inconsistent and will be removed. If a `put` function is called in a branch that has not decided on the firing state of a port then it will be forced to fire without forking of execution.
The put function would be written in PDL as:
```
func put<T>(out<T> output_port, T value_to_put) -> T { /* ... */ }
```
**note**: Whether the `put` statement blocks until a corresponding `get` has been performed has not yet been decided.
### Fires(...)
The `fires` function will check if a port has previously been assumed to fire or not fire. If such a decision has not yet been reached then the execution forks. One branch assumes that the port will fire and the other branch assumes it will not. Otherwise this function returns the firing state of the port: true if it fires, false if it does not.
This function may only be called in primitive components and may only occur inside sync-blocks.
The `fires` function would be written in PDL, if the language would support function overloading, as:
```
func fires<T>(in<T> input_port) -> bool { /* ... */ }
func fires<T>(out<T> output_port) -> bool { /* ... */ }
```
### Create(...)
The `create` function creates a value of the type `msg`. Its argument is the length of the `msg` type. It would be written in PDL as:
```
func create(u32 length) -> msg { /* ... */ }
```
**note**: This function will most likely be removed from the language at some point. This because the `msg` type will be removed at some point, and because it should be possible to reserve size for values of array-like types without setting the length of those values.
### Length(...)
The `length` function returns the length of arrays and strings. If PDL would support function overloading, it would be written as:
```
func length<T>(T[] v) -> u32 { /* ... */ }
func length(string v) -> u32 { /* ... */ }
func length(msg v) -> u32 { /* ... */ }
```
**note**: As stated above, the `msg` type will likely be removed. The `string` implementation is tentative, so might be changed as well.
### Assert(...)
The `assert` function is used to prune execution branches within components. It works much like an assert would in another language: the assert statement excepts an expression as argument that resolves to a boolean value. If the boolean value is `true` then execution continues. If the boolean value is `false` then the execution branch is removed.
The `assert` function may only appear in components (primitive or composite) and may only be called in sync-blocks. It would be written in PDL, if it would support the C-like "void" type, as:
```
func assert(bool test) -> void { /* ... */ }
```
**note**: Perhaps the name of this function will change in the future.
## Statement Rules
### Synchronous Blocks
### Use of Builtin Procedures
### Control-Flow Rules
|