Changeset - b06a29504c82
[Not reviewed]
0 2 0
mh - 4 years ago 2021-06-03 15:52:43
contact@maxhenger.nl
more documentation of language validation
2 files changed with 9 insertions and 2 deletions:
0 comments (0 inline, 0 general)
docs/spec/definitions.md
Show inline comments
 
# Declarations
 

	
 
A module is a text stream that is convertable into a token stream. This token stream may contain declarations. At the top level of the module we expect three types of declarations: pragmas, imports, type definitions and procedure definitions. We'll specify each of these in this file in terms of the tokens (where appropriate).
 

	
 
## Types and Identifiers
 

	
 
Throughout the various declarations we will use the following building blocks. Some of these are exactly equal to one another, their meaning will arise from the context in which they're used.
 

	
 
```
 
Ident = KwIdent
 
Path = KwIdent "::"
 

	
 
TypeName = KwIdent
 
TypeRef = Path* KwIdent ("<" (Type ",")* Type? ">")?
 

	
 
ModuleName = (KwIdent ".")* KwIdent
 
```
 

	
 
So a type's name is a single identifier (which may be defined with polymorphic variables). But the reference to a type may exist in some other module, so we can have a path prefix to it (e.g. `other_module::type_name`). The optionally postfixed bit with the angular brackets is the specification of the polymorphic arguments (e.g. `other_module::some_type<u32, yet_another_module::another_type<s32>>`). The pattern `(Thing ",")* Thing?` appears quite often, and is a comma-separated list of `Thing`s which may have a trailing comma.
 

	
 
## Pragmas
 

	
 
Only two types of pragmas are currently supported: one to specify the module's name and one to specify the module's version:
 

	
 
```
 
PragmaName = "#module" ModuleName RawNL
 
PragmaVersion = "#version" TkInt RawNL
 

	
 
DeclPragma = PragmaName | PragmaVersion
 
```
 

	
 
Note that the pragma's are terminated by a newline. Hence we can have a module name defined like `#module some_name` or `#module some.dot.separated.name`, and a version is simply `#version 123` or even `#version 0x00_01_00_A2`.
 

	
 
## Imports
 

	
 
An import statement is followed by a module name and optionally a combination of aliases, specific symbols to import or a wildcard. These are specified as:
 

	
 
```
 
ImportModule = KwImport ModuleName (KwAs Ident)? ";"
 
ImportWildcard = KwImport ModuleName "::" "*" ";"
 

	
 
ImportedSymbol = TypeName (KwAs Ident)?
 
ImportOneSymbol = KwImport ModuleName "::" ImportedSymbol ";"
 
ImportMultipleSymbols = KwImport ModuleName "::" "{"
 
        (ImportedSymbol ",")* ImportedSymbol?
 
    "}" ";" 
 

	
 
DeclImport = ImportModule | ImportWildcard | ImportSymbols
 
```
 

	
 
The `ImportModule` is the import of a particular name where one may optionally specify an alias (e.g. `import some_module;` or `import some_module as some_alias;`). Whenever there is a dot-separated module name, we take the last section as its alias. So if module `Foo.Foozle.Wozzle` contains a type `Bar`, then if we `import Foo.Foozle.Wozzle;`, then we refer to `Bar` as `Wozzle::Bar` (**note**: this is an initial implementation without much thought about its practical use, default alias rule may change in the future).
 

	
 
The `ImportWildcard` is the import of all *defined* symbols (i.e. not the symbols the imported module imports itself) within the module, such that we can refer to them directly. Taking the above example, if we write `import Foo.Foozle.Wozzle::*;`, then we may directly refer to `Bar` in the importing module.
 

	
 
Finally, one may also import particular symbols from a module. One may import a single symbol with an optional alias with `import SomeModule::Symbol;` or `import SomeModule::Symbol as Alias;`. Or one may import multiple symbols using `import SomeModule::{One as AnAlias, Two, Three as AnotherAlias};`.
 

	
 
## Enum Type Definitions
 

	
 
An enum acts much like a C enumeration. It is a collection of type-safe identifiers associated with an integer. It is defined as:
 

	
 
```
 
PolyVars = "<" (Ident ",")* Ident ">"
 
EnumVariantDef = Ident ("=" TkInt)?
 
DeclEnum = KwEnum Ident PolyVars? "{"
 
        (EnumVariantDef ",")* EnumVariantDef?
 
    "}"
 
```
 

	
 
Hence we may write a non-polymorphic enumeration as:
 

	
 
```
 
enum FileStatus {
 
    Ok = 0,
 
    DoesNotExist,
 
    NotPermitted,
 
    FailedReading = 0xFF,
 
}
 
```
 

	
 
And a polymorphic enumeration as:
 

	
 
```
 
enum Handle<T> {
 
    Present,
 
    NotPresent
 
}
 
```
 

	
 
We'll mention in passing that polymorphic variables to enum types can never exist in the body of the enum type. But they're allowed for now as they might be used in a useful way within struct or procedure declarations. **note**: this decision might change in the future.
 

	
 
## Union Type Definitions
 

	
 
A union in PDL is a tagged union. The tag itself cannot be decided by the programmer. A union variant may act like an enum variant in that it only specifies variants. A union differs from an enum in that it may contain an arbitrary number of embedded types. A union definition is specified as:
 

	
 
```
 
UnionVariantDef = Ident ("(" (TypeRef ",")* TypeRef? ")")?
 
DeclUnion = KwUnion Ident PolyVars? "{"
 
        (UnionVariantDef ",")* UnionVariantDef?
 
    "}"
 
```
 

	
 
Hence we may write a non-polymorphic union as:
 

	
 
```
 
union IntegerAndBool {
 
    UnsignedSmall(u8, bool),
 
    UnsignedLarge(u64, bool),
 
    SignedSmall(s8, bool),
 
    SignedLarge(s64, bool),
 
    VeryLarge(some_module::BigIntType, bool)
 
}
 
```
 

	
 
And some classic examples of polymorphic unions are:
 

	
 
```
 
union Option<T> {
 
    Some(T),
 
    None
 
}
 

	
 
union Result<T, E> {
 
    Ok(T),
 
    Error(E)
 
}
 
```
 

	
 
## Struct Type Definitions
 

	
 
A struct type is a collection of named fields with a particular type. It is rather similar to structs/records in many other languages. A struct definition is specified as:
 

	
 
```
 
StructFieldDef = TypeRef Ident
 
DeclStruct = KwStruct Ident PolyVars? "{"
 
        (StructFieldDef ",")* StructFieldDef?
 
    "}"
 
```
 

	
 
Hence we may write a non-polymorphic struct as:
 

	
 
```
 
struct Integer3D {
 
    u32 x,
 
    u32 y,
 
    u32 z,
 
}
 

	
 
// Or alternatively:
 
struct Integer3D { u32 x, u32 y, u32 z }
 
```
 

	
 
And two possible polymorphic structs as:
 

	
 
```
 
struct Point3D<T> { T x, T y, T z }
 
struct Pair<L, R> {
 
    L left,
 
    R right,
 
}
 
```
 

	
 
## Function Definition
 

	
 
A function is a callable subprocedure. It is defined as:
 

	
 
```
 
FuncArgDef = TypeRef Ident
 
FuncRetDef = TypeRef
 
DeclFunc = KwFunc Ident PolyVars? "("
 
        (FuncArgDef ",")* FuncArgDef?
 
    ")" "->" FuncRetDef
 
    StmtBlock
 
```
 

	
 
Hence we can have polymorphic and non-polymorphic functions with zero or multiple arguments with one return type (**note**: The language will likely feature multiple return types in one form or the other in the future). The `StmtBlock` will be introduced in the section on statements.
 

	
 
Two examples of functions:
 

	
 
```
 
func PickAnInt(u32 left, u32 right, bool choose_left) -> u32 {
 
    if (choose_left) {
 
        return left;
 
    } else {
 
        return right;
 
    }
 
}
 

	
 
func PickASide<T>(Pair<T, T> pair, bool choose_left) -> T {
 
    if (choose_left) {
 
        return pair.left;
 
    } else {
 
        return pair.right;
 
    }
 
}
 
```
 

	
 
## Component Definition
 

	
 
A component definition is the definition of a Reo connector. Currently there are two variants. The meaning of these variants will be introduced in a later section. A component is defined much in the same way as a function, except that it doesn't have a return type. Is is defined as:
 

	
 
```
 
CompVariant = KwPrim | KwComp
 
DeclComp = CompVariant Ident PolyVars? "("
 
        (FuncArgDef ",")* FuncArgDef?
 
    ")"
 
    StmtBlock
 
```
 

	
 
We'll delay examples of components until we've introduced the concept of connectors in this document.
 

	
 
**note**: It might be that the distinction between primitive and composite connectors (and the restrictions they place on the available language features inside each of them) will be removed in the future.
 

	
 
## Combining all Declarations
 

	
 
A module is the combination of any of the above declarations. So we may specify the possible contents of a module as:
 

	
 
```
 
Module = (
 
        DeclPragma | 
 
        DeclImport | 
 
        DeclEnum | 
 
        DeclUnion | 
 
        DeclStruct |
 
        DeclFunc |
 
        DeclComp
 
    )*
 
```
 
\ No newline at end of file
docs/spec/validation.md
Show inline comments
 
# Validating the AST
 

	
 
So far we've defined the language in terms of its grammar. Especially for expressions we're still dealing with ambiguity in the interpretation of raw text, and with certain statements or expressions that are invalid in a particular context. We'll deal with these in several phases in this section. Note that some conditions for validity are delayed until the section on typing, as it is more appropriate to introduce those restrictions there.
 

	
 
## Scoping and Naming Rules
 

	
 
Without us specifying any further constraints one may define variables or types with the same name. So we introduce rules for the various identifiers that are found within a program. 
 

	
 
Within the root of a module we are at the module scope. It's only parent is the global scope (whose contents we'll introduce later). The identifiers of type and procedure declarations within the module may not conflict with one another, or with any of the identifiers in the global scope.
 

	
 
Secondly, imports introduce identifiers into the module scope. Identifiers of procedures and types imported into a module may not conflict with one another, the global scope (which is implicitly already satisfied, unless the import becomes aliased), or the types defined within a module. The identifier we insert into the module scope depends on the type of import: if the import is aliased then we insert the alias, otherwise we insert the name of the imported type/procedure or last section of the module's name.
 

	
 
Whenever we're parsing a procedure (function or component), we enter a new scope whose parent is the module scope. Block statements implicitly introduce a new child scope. Any defined variable's identifier will be added to the scope it is in, and may not conflict with another identifier in the same scope or parent scopes.
 

	
 
The same rules apply to labeled statements. Albeit in a simpler form, as labeled statements can only appear in the bodies of procedures. Defining labels will add the label to the current scope. These labels may not conflict with labels in the same scope or labels in parent scopes.
 

	
 
Various other groups of identifiers in the program may not conflict among themselves. These are:
 

	
 
- The polymorphic variables of type or procedure definitions.
 
- The fields names of a struct.
 
- The variants of an enum.
 
- The identifiers of variants of a union (even though they might have had a different number of embedded values).
 

	
 
## Polymorphic Argument Rules
 

	
 
Types and procedures may be defined using polymorphic variables. Whenever we use these types to construct literals, or use functions to perform calls, we may specify these as polymorphic arguments. One has the choice of:
 

	
 
1. Specifying all of the polymorphic arguments. Note that one may still allow for type inference by using the `auto` type in the place of the polymorphic argument.
 
2. Specifying none of the polymorphic arguments, in this the compiler must assume that all of the polymorphic arguments are to be inferred (i.e. as if the programmer had written `auto` for each of the polymorphic arguments).
 

	
 
Any other case should be a compile-time error. Some examples:
 

	
 
```
 
// With a single argument
 
func maximum<T>(T a, T b) -> T {
 
    if (a > b) return a;
 
    else       return b;
 
}
 

	
 
// Used as
 
u32 zero = 0;
 
maximum(zero, 5); // fine, no argument is specified, so `T` will be inferred.
 
maximum<auto>(zero, 5); // fine, practically the same as above
 
maximum<u32>(zero, 5); // fine, T is explicitly `u32`
 

	
 
maximum<u32, u64>(zero, 5); // error: mismatch in arguments specified and polymorphic variables on definition.
 
```
 

	
 
**note**: At the moment polymorphism in the language is rather underpowered until a reasonable and simple scheme for constraining them can be found. At the moment one can only specify one of the builtin types or struct/enum/union types of polymorphic arguments. Furthermore functions and procedures defined with polymorphic variables may not construct literals by using them.
 

	
 
## Expression Rules
 

	
 
### Assignment
 

	
 
Assignment expressions may only be placed at the statement level. That is to say: viewing a complicated expressions as a tree, where each node has a parent expression or a parent statement (e.g. the test expression of an if-statement), then assignment expressions may only have the expression-statement as a parent.
 

	
 
The left hand side of an assignment expression should be assignable. An expression is assignable if it is:
 

	
 
- An indexing expression with an assignable subject expression.
 
- A slicing expression with an assignable subject expression.
 
- A select expression with an assignable subject expression.
 
- A variable expression.
 

	
 
### Literals
 

	
 
An enum literal is valid if its `TypeRef` points to an enum declaration (either directly, or through a potentially aliased import) and the subsequent identifier is a variant declared in the type definition. Some examples:
 

	
 
```
 
enum Answer { Yes, No };
 

	
 
// Used as:
 
auto a = Answer::Yes; // Valid
 
auto b = Answer::No; // Valid
 
Answer c = Answer::Yes; // Valid
 

	
 
auto d = Answer; // Invalid: points to the type instead of a variant
 
auto e = Answer::DoesNotExist; // Invalid: no such variant defined on `Answer`
 
```
 

	
 
A union literal is valid if its `TypeRef` points (indirectly) to a union declaration, the subsequent identifier is a variant in the type definition, and if (where applicable) the number of embedded expressions matches the number of embedded types in the variant's definitions.
 

	
 
```
 
union Value { Empty, Unsigned(u64), Signed(s64) };
 

	
 
// Used as:
 
auto a = Value::Empty; // Valid
 
auto b = Value::Unsigned(5); // Valid
 
Value c = Value::Signed(-123); // Valid
 

	
 
auto d = Value; // Invalid: points to type instead of variant
 
auto e = Value::Empty(5); // Invalid: `Empty` has no embedded types
 
auto f = Value::Unsigned; // Invalid: Variant `Unsigned` expected 1 embedded value, none are provided
 
auto g = Value::Signed(-1, -5); // Invalid: Variant `Signed` expected 1 embedde value, 2 are provided 
 
```
 

	
 
A struct literal is valid if its `TypeRef` points (indirectly) to a struct declaration, and it provides an expression to initialize all of the struct's fields, and only initializes each field once.
 

	
 
```
 
struct Pair { u32 left, u32 right }
 

	
 
// Used as:
 
auto a = Pair{ left: 0, right: 1 }; // Valid
 
auto b = Pair{ right: 0, left: 1 }; // Valid
 

	
 
auto c = Pair; // Invalid: Points to type instead of constructing literal
 
auto d = Pair{}; // Invalid: Not all fields specified
 
auto e = Pair{ left: 0 }; // Invalid: Not all fields specified
 
auto f = Pair{ left: 0, center: 5, right: 2 }; // Invalid: field `center` does not exist
 
auto g = Pair{ left: 0, left: 0, right: 2 }; // Invalid: field `left` specified more than once
 
```
 

	
 
For all of these literals the rules for polymorphic arguments apply: if the type specifies polymorphic arguments, then all of them have to be specified or none of them have to be specified (for implicit type inference).
 

	
 
### Binding
 

	
 
Again viewing a complex expression as a tree: A binding expression is only valid if it is nested in the expression tree under a statement, or under a chain of parents that are all binary expressions of the logical-and variant. The chain of parents must terminate at an if-statement's or while-statement's test expression. As a result of these rules, binding expressions may not be nested in one another.
 

	
 
A binding expression's left hand side is valid if it is a binding variable, or if it is a literal that may contain a binding variable. A binding variable is a variable that has not yet been declared using a `StmtLocalMem`. Such a variable will then be implicitly be declared at the binding location, and usable within the scope of the if-statement's then-block, or the while-statement's loop body. The same scoping rules apply to that scope: so binding variables may not be declared twice, nor may conflict to any variables declared in the scope or it's parents.
 

	
 
The interpretation of the binding expression is that it tests for equality between the left hand side and the right hand side. Except for those places where a binding variable is used, here no equality test is applied. Rather, the value from the right hand side is assigned to the binding variable on the left hand side. Hence if you use an expression containing e.g. regular variable inside a binding expression, then no binding will occur. Instead, the resulting value from that regular variable expression will be checked for equality against whatever is on the right hand side of the binding expression.
 

	
 
The parent of a binding variable may only be some kind of literal expression, or the left hand side of the binding expression itself. Some examples:
 

	
 
```
 
// Example only for a struct, but binding expressions can be applied to all 
 
// kinds of literals (although, for some types, the binding then cannot contain
 
// any binding variables, and resolves to a test of equality between the left
 
// hand side and the right hand side).
 
struct Inside { u32 value };
 
struct Outside { u32 left, Inside right };
 

	
 
// Used as:
 
auto v_inside = Inside{ value: 0 };
 
auto v_outside = Outside{ left: 0, right: Inside{ value: 1 } };
 

	
 
// All valid:
 
if (let foo = v_outside && let bar = v_inside) { /* ... */ }
 
if (let foo = v_outside && (5 + 2 == 7) { /* ... */ }
 
while (let Inside{ value: foo } = v_inside) { /* ... */ }
 
if (let Outside{ left: 0, right: Inside{ value: foo } } = v_outside) { /* ... */ }
 
while (let Outside{ left: foo, right: Inside { value: bar }} = v_outside) { /* ... */ }
 
if (let Outside{ left: foo, right: bar } = v_outside) { /* ... */ }
 

	
 
// Invalid: parent expression that is not `&&`
 
if (let foo = v_outside || (5 + 2 == 7)) { /* ... */ }
 
// Invalid: not nested under an if/while statement
 
let foo = v_outside;
 
// Invalid: binding variable is used in a more complicated expression
 
while (let Inside{ value: foo + 5 } = v_inside) { /* ... */ }
 

	
 
```
 

	
 
### Function Calls
 

	
 
The `TypeRef` for a function call should resolve to either a builtin function, or a user-defined function. Function calls follow the rules for polymorphic arguments as outlined above. Furthermore the number of expressions given as arguments to the function should match the number of arguments in the definition of the function.
 

	
 
### Component "Calls"
 

	
 
The `TypeRef` for a component call should resolve to either a builtin component, or a user-defined component. Component calls, like function calls, must following the rules for polymorphic arguments and the arguments to the components themselves. Furthermore component "call" expressions may only be placed with a `StmtNew` as parent. That is to say: one doesn't really call components, instead one instantiates them.
 

	
 
## Builtin Procedures
 

	
 
Several functions are builtin procedures in the language. We'll shortly provide their meaning and their syntax here. Some of these functions cannot be expressed in PDL itself, so some ad-hoc syntax is used instead.
 

	
 
### Get(...)
 

	
 
The `get` function receives a value from an input port. This function may only be called in primitive components and may only be called within sync-blocks. If no value is present then the execution halts until a value becomes available. Upon receiving a value the execution forks in two. One branch will remain waiting for any potentially other messages, the other branch will actually obtain the sent message and continue execution.
 

	
 
If a `get` function is called in a branch that has previously assumed that nothing will ever be sent through the port provided as argument, then that branch is considered inconsistent and will be removed. If a `get` function is called in a branch that has not decided on the firing state of a port then it will be forced to fire without forking of execution.
 

	
 
The get function would be written in PDL as:
 

	
 
```
 
func get<T>(in<T> input_port) -> T { /* ... */ }
 
```
 

	
 
### Put(...)
 

	
 
The `put` function may put a value into an output port. This function may only be called in primitive components and may only be called within sync-blocks. 
 

	
 
If a `put` function is called in a branch that has previously assumed that nothing will ever be sent through the port provided as argument, then that branch is considered inconsistent and will be removed. If a `put` function is called in a branch that has not decided on the firing state of a port then it will be forced to fire without forking of execution.
 

	
 
The put function would be written in PDL as:
 
The put function would be written in PDL as, if it would support the C-like `void` type:
 

	
 
```
 
func put<T>(out<T> output_port, T value_to_put) -> T { /* ... */ }
 
func put<T>(out<T> output_port, T value_to_put) -> void { /* ... */ }
 
```
 

	
 
**note**: Whether the `put` statement blocks until a corresponding `get` has been performed has not yet been decided. 
 

	
 
### Fires(...)
 

	
 
The `fires` function will check if a port has previously been assumed to fire or not fire. If such a decision has not yet been reached then the execution forks. One branch assumes that the port will fire and the other branch assumes it will not. Otherwise this function returns the firing state of the port: true if it fires, false if it does not.
 

	
 
This function may only be called in primitive components and may only occur inside sync-blocks.
 

	
 
The `fires` function would be written in PDL, if the language would support function overloading, as:
 

	
 
```
 
func fires<T>(in<T> input_port) -> bool { /* ... */ }
 
func fires<T>(out<T> output_port) -> bool { /* ... */ }
 
```
 

	
 
### Create(...)
 

	
 
The `create` function creates a value of the type `msg`. Its argument is the length of the `msg` type. It would be written in PDL as:
 

	
 
```
 
func create(u32 length) -> msg { /* ... */ }
 
```
 

	
 
**note**: This function will most likely be removed from the language at some point. This because the `msg` type will be removed at some point, and because it should be possible to reserve size for values of array-like types without setting the length of those values. 
 

	
 
### Length(...)
 

	
 
The `length` function returns the length of arrays and strings. If PDL would support function overloading, it would be written as:
 

	
 
```
 
func length<T>(T[] v) -> u32 { /* ... */ }
 
func length(string v) -> u32 { /* ... */ }
 
func length(msg v) -> u32 { /* ... */ }
 
```
 

	
 
**note**: As stated above, the `msg` type will likely be removed. The `string` implementation is tentative, so might be changed as well.
 

	
 
### Assert(...)
 

	
 
The `assert` function is used to prune execution branches within components. It works much like an assert would in another language: the assert statement excepts an expression as argument that resolves to a boolean value. If the boolean value is `true` then execution continues. If the boolean value is `false` then the execution branch is removed.
 

	
 
The `assert` function may only appear in components (primitive or composite) and may only be called in sync-blocks. It would be written in PDL, if it would support the C-like "void" type, as:
 

	
 
```
 
func assert(bool test) -> void { /* ... */ }
 
```
 

	
 
**note**: Perhaps the name of this function will change in the future.
 

	
 
## Statement Rules
 

	
 
### Synchronous Blocks
 

	
 
Synchronous blocks may only be placed in primitive components. Synchronous blocks may not be nested. One may not jump into or out of a synchronous block.
 

	
 
### Use of Builtin Procedures
 

	
 
Builtin procedures that do not have a return value (currently only `put` and `assert`) may not be called at the expression level and must be called at the statement level. i.e. the parent of the builtin call expression may only be the `StmtExpr` statement.
 

	
 
### Control-Flow Rules
 

	
0 comments (0 inline, 0 general)