From 8e4c0fc8b4df32c39739961ffc67e61444377675 2021-06-03 09:44:50 From: mh Date: 2021-06-03 09:44:50 Subject: [PATCH] Initial commit of language spec --- diff --git a/docs/spec/definitions.md b/docs/spec/definitions.md new file mode 100644 index 0000000000000000000000000000000000000000..aa83df09814a0b62fb32ff6c5f8bb458ad030f5f --- /dev/null +++ b/docs/spec/definitions.md @@ -0,0 +1,225 @@ +# Declarations + +A module is a text stream that is convertable into a token stream. This token stream may contain declarations. At the top level of the module we expect three types of declarations: pragmas, imports, type definitions and procedure definitions. We'll specify each of these in this file in terms of the tokens (where appropriate). + +## Types and Identifiers + +Throughout the various declarations we will use the following building blocks. Some of these are exactly equal to one another, their meaning will arise from the context in which they're used. + +``` +Ident = KwIdent +Path = KwIdent "::" + +TypeName = KwIdent +TypeRef = Path* KwIdent ("<" (Type ",")* Type? ">")? + +ModuleName = (KwIdent ".")* KwIdent +``` + +So a type's name is a single identifier (which may be defined with polymorphic variables). But the reference to a type may exist in some other module, so we can have a path prefix to it (e.g. `other_module::type_name`). The optionally postfixed bit with the angular brackets is the specification of the polymorphic arguments (e.g. `other_module::some_type>`). The pattern `(Thing ",")* Thing?` appears quite often, and is a comma-separated list of `Thing`s which may have a trailing comma. + +## Pragmas + +Only two types of pragmas are currently supported: one to specify the module's name and one to specify the module's version: + +``` +PragmaName = "#module" ModuleName RawNL +PragmaVersion = "#version" TkInt RawNL + +DeclPragma = PragmaName | PragmaVersion +``` + +Note that the pragma's are terminated by a newline. Hence we can have a module name defined like `#module some_name` or `#module some.dot.separated.name`, and a version is simply `#version 123` or even `#version 0x00_01_00_A2`. + +## Imports + +An import statement is followed by a module name and optionally a combination of aliases, specific symbols to import or a wildcard. These are specified as: + +``` +ImportModule = KwImport ModuleName (KwAs Ident)? ";" +ImportWildcard = KwImport ModuleName "::" "*" ";" + +ImportedSymbol = TypeName (KwAs Ident)? +ImportOneSymbol = KwImport ModuleName "::" ImportedSymbol ";" +ImportMultipleSymbols = KwImport ModuleName "::" "{" + (ImportedSymbol ",")* ImportedSymbol? + "}" ";" + +DeclImport = ImportModule | ImportWildcard | ImportSymbols +``` + +The `ImportModule` is the import of a particular name where one may optionally specify an alias (e.g. `import some_module;` or `import some_module as some_alias;`). Whenever there is a dot-separated module name, we take the last section as its alias. So if module `Foo.Foozle.Wozzle` contains a type `Bar`, then if we `import Foo.Foozle.Wozzle;`, then we refer to `Bar` as `Wozzle::Bar` (**note**: this is an initial implementation without much thought about its practical use, default alias rule may change in the future). + +The `ImportWildcard` is the import of all *defined* symbols (i.e. not the symbols the imported module imports itself) within the module, such that we can refer to them directly. Taking the above example, if we write `import Foo.Foozle.Wozzle::*;`, then we may directly refer to `Bar` in the importing module. + +Finally, one may also import particular symbols from a module. One may import a single symbol with an optional alias with `import SomeModule::Symbol;` or `import SomeModule::Symbol as Alias;`. Or one may import multiple symbols using `import SomeModule::{One as AnAlias, Two, Three as AnotherAlias};`. + +## Enum Type Definitions + +An enum acts much like a C enumeration. It is a collection of type-safe identifiers associated with an integer. It is defined as: + +``` +PolyVars = "<" (Ident ",")* Ident ">" +EnumVariantDef = Ident ("=" TkInt)? +DeclEnum = KwEnum Ident PolyVars? "{" + (EnumVariantDef ",")* EnumVariantDef? + "}" +``` + +Hence we may write a non-polymorphic enumeration as: + +``` +enum FileStatus { + Ok = 0, + DoesNotExist, + NotPermitted, + FailedReading = 0xFF, +} +``` + +And a polymorphic enumeration as: + +``` +enum Handle { + Present, + NotPresent +} +``` + +We'll mention in passing that polymorphic variables to enum types can never exist in the body of the enum type. But they're allowed for now as they might be used in a useful way within struct or procedure declarations. **note**: this decision might change in the future. + +## Union Type Definitions + +A union in PDL is a tagged union. The tag itself cannot be decided by the programmer. A union variant may act like an enum variant in that it only specifies variants. A union differs from an enum in that it may contain an arbitrary number of embedded types. A union definition is specified as: + +``` +UnionVariantDef = Ident ("(" (TypeRef ",")* TypeRef? ")")? +DeclUnion = KwUnion Ident PolyVars? "{" + (UnionVariantDef ",")* UnionVariantDef? + "}" +``` + +Hence we may write a non-polymorphic union as: + +``` +union IntegerAndBool { + UnsignedSmall(u8, bool), + UnsignedLarge(u64, bool), + SignedSmall(s8, bool), + SignedLarge(s64, bool), + VeryLarge(some_module::BigIntType, bool) +} +``` + +And some classic examples of polymorphic unions are: + +``` +union Option { + Some(T), + None +} + +union Result { + Ok(T), + Error(E) +} +``` + +## Struct Type Definitions + +A struct type is a collection of named fields with a particular type. It is rather similar to structs/records in many other languages. A struct definition is specified as: + +``` +StructFieldDef = TypeRef Ident +DeclStruct = KwStruct Ident PolyVars? "{" + (StructFieldDef ",")* StructFieldDef? + "}" +``` + +Hence we may write a non-polymorphic struct as: + +``` +struct Integer3D { + u32 x, + u32 y, + u32 z, +} + +// Or alternatively: +struct Integer3D { u32 x, u32 y, u32 z } +``` + +And two possible polymorphic structs as: + +``` +struct Point3D { T x, T y, T z } +struct Pair { + L left, + R right, +} +``` + +## Function Definition + +A function is a callable subprocedure. It is defined as: + +``` +FuncArgDef = TypeRef Ident +FuncRetDef = TypeRef +DeclFunc = KwFunc Ident PolyVars? "(" + (FuncArgDef ",")* FuncArgDef? + ")" "->" FuncRetDef + StmtBlock +``` + +Hence we can have polymorphic and non-polymorphic functions with zero or multiple arguments with one return type (**note**: The language will likely feature multiple return types in one form or the other in the future). The `StmtBlock` will be introduced in the section on statements. + +Two examples of functions: + +``` +func PickAnInt(u32 left, u32 right, bool choose_left) -> u32 { + if (choose_left) { + return left; + } else { + return right; + } +} + +func PickASide(Pair pair, bool choose_left) -> T { + if (choose_left) { + return pair.left; + } else { + return pair.right; + } +} +``` + +## Component Definition + +A component definition is the definition of a Reo connector. Currently there are two variants. The meaning of these variants will be introduced in a later section. A component is defined much in the same way as a function, except that it doesn't have a return type. Is is defined as: + +``` +CompVariant = KwPrim | KwComp +DeclComp = CompVariant Ident PolyVars? "(" + (FuncArgDef ",")* FuncArgDef? + ")" + StmtBlock +``` + +We'll delay examples of components until we've introduced the concept of connectors in this document. + +## Combining all Declarations + +A module is the combination of any of the above declarations. So we may specify the possible contents of a module as: + +``` +Module = ( + DeclPragma | + DeclImport | + DeclEnum | + DeclUnion | + DeclStruct | + DeclFunc | + DeclComp + )* +``` \ No newline at end of file diff --git a/docs/spec/expressions.md b/docs/spec/expressions.md new file mode 100644 index 0000000000000000000000000000000000000000..4fc2095f63d574297de1c22bf6ba10d867b9694c --- /dev/null +++ b/docs/spec/expressions.md @@ -0,0 +1,197 @@ +# Expressions + +Expressions are the parts that compute values (and optionally assign them). + +**note:** We could do recursive descent formal specification. But I think it is rather ugly. We could also do the generic case, and then simply state what the precedence rules are? + +## Assignment Expression + +An assignment expression is a right-to-left associative binary operator that evaluates the right hand side expression and assigns it to the left hand side. It is defined as: + +``` +AssignOp = + "=" | + "@=" | + "*=" | "/=" | "%=" | "+=" | "-=" | + "<<=" | ">>=" | + "&=" | "^=" | "|=" +ExprAssign = ExprConditional AssignOp ExprConditional +``` + +Where the assignment operators are: +- `=`: Set assignment: assign right hand side to left hand side. +- `@=`: Concatenate assignment: concatenates right hand side to left hand side. +- `*=`: Multiply: multiplies the left hand side with the value on the right hand side. +- `/=`: Divide: divides the left hand side by the number on the right hand side. +- `%=`: Remainder: Applies modulo operator directly to left hand side using the divisor on the right hand side. +- `+=`: Addition: adds right hand side to left hand side. +- `-=`: Subtraction: subtracts right hand side from left hand side. +- `<<=`: Shift left: Bitshifts the number on the left hand side, towards the left, by the number of bits on the right hand side. +- `>>=`: Shift right: Bitshifts the number on the left hand side, towards the right, by the number of bits on the right hand side. +- `&=`: Bitwise and: Performs a bitwise `and` on the left hand side using the integer on the right hand side. +- `^=`: Bitwise xor: Performs a bitwise `xor` on the left hand side using the integer on the right hand side. +- `|=`: Bitwise or: Performs a bitwise `or` on the left hand side using the integer on the right hand side. + +## Conditional Expression + +A conditional expression is an expression that contains a test expression and two nested subexpressions. If the test expression evaluates to true, then the first nested subexpression is evaluated. Otherwise the second nester subexpression is evaluated. + +It is defined as: + +``` +ExprConditional = ExprConcat "?" Expr ":" Expr +``` + +Due to its formulation one can see that nested conditional expressions are eagerly evaluated in the subexpressions, but not in the test expression. + +## Binary Operator Expressions + +Binary operators apply an operation to its two subexpressions. We have the following possible binary operators: + +``` +BinaryOp = + "@" | + "||" | "&&" | + "|" | "^" | "&" | + "==" | "!=" | + "<" | ">" | "<=" | ">=" | + "<<" | ">>" | + "+" | "-" | "*" | "/" | "%" +``` + +Where we have: +- `@`: Concatenation +- `||`: Logical or +- `&&`: Logical and +- `|`: Bitwise or +- `^`: Bitwise xor +- `&`: Bitwise and +- `==`: Equality test +- `!=`: Inequality test +- `<`: Less-than comparison +- `>`: Greater-than comparison +- `<=`: Less-than-or-equal-to comparison +- `>=`: Greater-than-or-equal-to comparison +- `<<`: Bitshift left +- `>>`: Bitshift right +- `+`: Addition +- `-`: Subtraction +- `*`: Multiplication +- `/`: Division +- `%`: Remainder/modulo + +We'll define the particular kinds of expressions that use these operators (and remaining kinds of operators in expressions) in such an order that the precedence rules are satisfied. The first in the chain has the lowest precedence. + +``` +ExprBinConcat = ExprBinLogicalOr ("@" ExprBinLogicalOr)* +ExprBinLogicalOr = ExprBinLogicalAnd ("||" ExprBinLogicalAnd)* +ExprBinLogicalAnd = ExprBinBitwiseOr ("&&" ExprBinBitwiseOr)* +ExprBinBitwiseOr = ExprBinBitwiseXor ("|" ExprBinBitwiseXor)* +ExprBinBitwiseXor = ExprBinBitwiseAnd ("^" ExprBinBitwiseAnd)* +ExprBinBitwiseAnd = ExprBinEquality ("&" ExprBinEquality)* +ExprBinEquality = ExprBinRelational (["==" | "!="] ExprBinRelational)* +ExprBinRelational = ExprBinShift (["<", ">", "<=", ">="] ExprBinShift)* +ExprBinShift = ExprBinAddSub (["<<", ">>"] ExprBinAddSub)* +ExprBinAddSub = ExprBinMulDivMod (["+" | "-"] ExprBinMulDivMod)* +ExprBinMulDivMod = ExprPrefix (["*" | "/" | "%"] ExprPrefix)* +``` + +## Prefix Operator Expression + +Prefix expressions apply their operators to the suffixed expression. Prefix expressions are evaluated right-to-left. The following prefix operators are supported: + +``` +PrefixOp = "+" | "-" | "~" | "!" +``` + +Where we have: +- `+`: Positive. Essentially meaningless in this rather strictly typed language, but supported for completeness (and sometimes it looks visually appealing in e.g. `sign = (x > 0) ? +1 : -1`). +- `-`: Negation. +- `~`: Bitwise not. +- `!`: Logical not. + +Which are used to define the prefix expression itself as: + +``` +ExprPrefix = (PrefixOp ExprPrefix)? | ExprPostfix +``` + +## Postfix Operator Expression + +Postfix operations are slightly more involves, because the postfixes take arguments following a particular pattern. We have the following possible postfix expressions: + +``` +PostfixOp = + ("[" Expr "]") | + ("[" Expr ".." Expr "]") | + ("." Ident) +``` + +Where we have: + +- `[a_number]`: Indexing expression, can be applied to arrays and strings. +- `[lower_idx..higher_idx]`: Slicing expressions, takes a range of values from an array or string. +- `.a_field`: Select expression, can only be applied to struct values and returns the selected field's value. + +Which are used in: + +``` +ExprPostfix = ExprPrimary PostfixOp* +``` + +## Primary Expressions + +Primary expressions are the ones with the highest precedence. We have the variants: + +``` +ExprParens = "(" Expr ")" + +ExprLitArray = "{" (Expr ",")* Expr? "}" +ExprLitInt = TkInt +ExprLitString = TkStr +ExprLitChar = TkChar +ExprLitBool = TkBool + +ExprLitStructField = Ident ":" Expr +ExprLitStruct = TypeRef "{" + (ExprLitStructField ",")* ExprLitStructField? + "}" +ExprLitEnum = TypeRef "::" Ident +ExprLitUnion = TypeRef "::" Ident ("(" (Expr ",")* Expr? ")")? + +ExprBinding = KwLet ExprPrefix "=" ExprPrefix + +ExprCall = TypeRef "(" (Expr ",")* Expr? ")" +ExprCast = "cast" PolyArgs? "(" Expr ")" + +ExprVar = Ident +``` + +Here we find, consecutively: +- The parenthesized expression, indicating its contents should be evaluated at a higher precedence. +- The various constructable literals. +- The binding expression, where the values on the right hand side of the equality are bound to the binding variables on the left hand side. +- Calling of functions (and components). +- Casting of values. +- Reference to a variable. + +Naturally the validity of each of these expressions depends strongly on the context in which they're used and their arguments. Likewise for deciding which expression should be used (e.g. an enum variant and a union variant without embedded values differ only in the fact that their type has been defined using a different keyword). We'll come back to all of these rules in a later section of this document. + +For now, we can group all of these possibilities together as: + +``` +ExprPrimary = + ExprParens | + ExprLitArray | ExprLitInt | ExprLitString | ExprLitChar | ExprLitBool | + ExprLitStruct | ExprLitEnum | ExprLitUnion | + ExprBinding | ExprCall | ExprCast | + ExprVar +``` + +## Putting Everything Together + +Now that we have a chain of expressions defined, each depending on the next one in the chain, we can finish by simply defining the basic `Expr` as: + +``` +Expr = ExprAssign +``` \ No newline at end of file diff --git a/docs/spec/overview.md b/docs/spec/overview.md new file mode 100644 index 0000000000000000000000000000000000000000..4c9034d11097832bce1c2f1f26e541482b16f608 --- /dev/null +++ b/docs/spec/overview.md @@ -0,0 +1,14 @@ +# Protocol Description Language Specification + +The Protocol Description Language (PDL) will be specified in this document. There are several levels to the specification. To simplify the specification we will roughly follow the manner in which the language is lexed and parsed by the compiler. We begin by assuming a set of source code streams/files. The specification here assumes that the source file is ASCII-encoded, but the lexer will throw human-readable errors in case it isn't. The source file is lexed into tokens. Where each token has a specific meaning within the language. Although the compiler has tokens for line- and block-comments, these are not important for the parsing. Likewise, apart from the fact that it might serve as a separator between tokens, whitespace will be ignored in the tokenizer. + +Having defined the possible tokens, we will assume that comments and whitespace no longer factor into the interpretation of the source code, and continue defining the layout of a single source file. Such a source file consists of pragmas, imports, type, and procedure definitions. These are all specified in terms of tokens. Procedure definitions may contain statements, which will be specified next. Finally we will specify the expressions that may appear within these statements. + +Note that the entire language is not yet fully set in stone. Parts of the syntax and it's interpretation may change. + +Please view the following documents (in roughly this order): + +1. tokens.md +2. definitions.md +3. statements.md +4. expressions.md \ No newline at end of file diff --git a/docs/spec/statements.md b/docs/spec/statements.md new file mode 100644 index 0000000000000000000000000000000000000000..dd4094e211d6a9396881f7c447c182f4aa37428f --- /dev/null +++ b/docs/spec/statements.md @@ -0,0 +1,156 @@ +# Statements + +Alongside expressions, statements are the main building block of the imperative procedure bodies. We'll define all of the possible statements in this document, and combine them at the end of this chapter into a `Stmt`. + +## Block Statement + +A block statement is a list of statements wrapped in curly braces. It has the additional meaning of scoping all of its defined variables. That is: it can access variables from its outer scopes. But not of the outer scopes can access the variables inside the inner block statement. We'll expand on the scoping rules in a later chapter. + +A block statement is defined as: + +``` +StmtBlock = "{" Stmt* "}" +``` + +## If Statement + +An if-statement contains a test expression that is evaluated and should resolve to a boolean. If that boolean is true then we take the "then" branch of the if-statement. If that boolean is false then we can take "else" branch if it is defined by the programmer, otherwise simply skip the "then" branch. An if statement is defined as: + +``` +StmtIf = + KwIf "(" Expr ")" Stmt + (KwElse Stmt)? +``` + +## While Statement + +A while statement is a loop that continues executing based on a test expression. If the test expression is true, then we enter the inner statement and execute it. If the test expression is false then we go to the next statement. If we have entered the inner statement, then upon reaching its end we go back to the while statement and re-evaluate the test expression. A while statement is defined as: + +``` +StmtWhile = KwWhile "(" Expr ")" Stmt +``` + +## Labeled Statement + +A labeled statement attaches a label to a particular nested label. This label can be used for goto-statements. Secondly, if the label is placed before a loop statement, then one may use break- and continue-statements to refer to that particular loop. + +``` +Label = Ident +StmtLabeled = Label ":" Stmt +``` + +## Break Statement + +A break statement may halt executing a loop body and jump to the statement coming after the loop statement. It may optionally have a label to break a particular loop. It is defined as: + +``` +StmtBreak = KwBreak Label? ";" +``` + +Some (contrived) examples: + +``` +u32 counter = 0; +while (true) { + if (counter == 10) break; + counter += 1; +} + +u32 i = 0; +u32 counter = 0; + +outer: while (i < 10) { + u32 j = 0; + while (j < 10) { + if (counter == 42) break outer; + j += 1; + counter += 1; + } + i += 1; +} +``` + +## Continue Statement + +A continue statement may halt executing a loop body and jump to the loop statement in order to evaluate its test expression (and optionally execute the loop body) again. It may optionally have a label to continue at a particular loop. It is defined as: + +``` +StmtContinue = KwContinue Label? ";" +``` + +## Return Statement + +A return statement is used in function bodies in order to halt the execution of the function, and provide a return value to the caller. It is defined as: + +``` +StmtReturn = KwReturn Expr ";" +``` + +**note**: Like with functions, in the future we will likely support multiple return types, hence the syntax may change in the future. + +## Goto Statement + +A goto statement is used to jump to a particular label within a procedure body. it is defined as: + +``` +StmtGoto = KwGoto Label ";" +``` + +## New Statement + +A new statement instantiates a primitive component. It is defined as: + +``` +StmtNew = KwNew CallExpr ";" +``` + +## Local Variable Statement + +A local variable statement instantiates a new variable which will have assigned a particular initial value. It is defined as: + +``` +VarRef = Ident +StmtLocVar = TypeRef VarRef "=" Expr ";" +``` + +**note**: We may change local variable statements to act as expressions in the future. This has no practical effect on the manner in which valid code can be written. See the comment at "expression statement" + +## Local Channel Statement + +A local channel statement instantiates a new channel. It returns the two ports that compose the channel. It is defined as: + +``` +PolyArgs = "<" (TypeRef ",")* TypeRef ">" +StmtLocChan = KwChannel PolyArgs? VarRef "->" VarRef ";" +``` + +Some examples are: + +``` +channel input_of_channel -> output_from_channel +channel output_of_component -> input_of_component +``` + +Although we'll expand on types later. One should note that in practice a channel only accepts one polymorphic argument. The port on the left hand side of the channel we call the output port (with type `out`) and the right hand side we call the input port (with type `in`). These names depend on the manner in which one views the two ends of the channel. From the point of view of the channel data flows in from the left hand side and out from the right hand side. However, we take the point of view from the component that uses these ports. Hence we view the left hand side as the `out` port, because a component puts data into it, and the right hand side as the `in` port, because a component gets data from it. + +## Expression Statement + +An expression statement is simply an expression placed in a statement. Hence it is simply defined as: + +``` +StmtExpr = Expr ";" +``` + +**note**: Should I "formally" define assignment expressions here as well? They act like statements, but in code we treat them as expressions (and apply some checks to ensure they're used at the statement level) because this simplifies all code a lot. Maybe we make the document simpler as well by simply stating that they're expressions? + +## Combining All Possible Statements + +Combining all of the possible statements, we arrive at: + +``` +Stmt = + StmtBlock | StmtLabeled + StmtIf | StmtWhile | + StmtBreak | StmtContinue | StmtReturn | StmtGoto | + StmtLocVar | StmtLocChan | StmtExpr +``` \ No newline at end of file diff --git a/docs/spec/tokens.md b/docs/spec/tokens.md new file mode 100644 index 0000000000000000000000000000000000000000..5fc856b3358f00f1831512495a4688e5f158070e --- /dev/null +++ b/docs/spec/tokens.md @@ -0,0 +1,265 @@ +# Tokens + +## Notation + +While parsing the raw text into tokens, we will use `"c"` to specify the ASCII character `c`, `"word"` to specify the ASCII string `word`, and we will use `0x20` to mean the ASCII character associated with hexadecimal integer 0x20 (i.e. the space character). Terms placed one after the other must appear in that order. When different combinations are possible within a term we will use the syntax `["option 1" | "option 2" | "option 3"]`. We will reserve the `*` symbol to mean "0 or more repetitions", the `+` symbol to mean "1 or more repetitions", and the `?` symbol to mean "0 or 1 occurrence". To apply these symbols to multiple terms we will put them between parentheses. So `(["0" | "1"])+` means that we expect a sequence of at least 1 character, where each character must be "0" or "1". + +Finally, we may specify (inclusive) ranges of characters by putting a dash between two elements. So `0x00-0x1F` represent all control characters. `0x41-0x5A`, or equivalently: `"A"-"Z"` represent all uppercase characters. + +## Parsing Raw Text + +We will define the following terms in terms of the raw input text: + +``` +RawWS = [" ", 0x09] +RawNL = [(0x0D 0x0A) | 0x0A] +RawWSNL = RawWS | RawNL +``` + +That is: whitespace may be the space character or a tab character. A newline may be the unix-like newline character, or the windows-like carriage feed + newline character. + +With regards to visible character groups, we may define: + +``` +RawDigit = "0"-"9" +RawUpperAlpha = "A"-"Z" +RawLowerAlpha = "a"-"z" +RawVChar = 0x20-0x7E | 0x09 +``` + +Where the `RawVChar` defines a visible character. + +Finally, because the text stream ends at some point, we interpret the text stream as if it always has an `RawEOF` token at the end, indicating the end of the file. + +## Parsing Unimportant Tokens + +The raw text stream will be interpreted as a series of tokens. For simplicity we will call these `TkRaw`, which may essentially be one of two variants: firstly we will define `TkUnimportant`, which we will lex but which will not continue to influence the parsing of the raw text stream. Secondly we have the `Tk` the token that serves as the building block for the remainder of the parsing. + +``` +TkRaw = TkUnimportant | Tk +TkUnimportant = TkCommentLine | TkCommentBlock | TkWS + +TkCommentLine = "//" RawVChar* [RawNL | RawEOF] +TkCommentBlock = "/*" [RawVChar | RawNL]* ["*/" | RawEOF] +TkWS = RawWSNL +``` + +Essentially: we consider all line comments, block comments (which do not nest) and whitespace to be unimportant. + +## Parsing Important Tokens + +There are two main types of important tokens: the varying length tokens (e.g. an identifier or an integer literal), and the fixed-length tokens (e.g. "!"). We'll start with the various kinds of varying length tokens. + +### Identifiers + +Identifiers are defined as: + +``` +IdentInit = "_" | RawUpperAlpha | RawLowerAlpha +IdentRem = IdentInit | RawDigit +TkIdent = IdentInit IdentRem* +``` + +That is: an identifier starts with an identifier or an underscore, and is followed by a sequence containing those characters or a number. Note that the later definition of keywords also matches the definition of identifiers. The tokenizer will prefer to pick keywords instead of identifiers. + +### Pragmas + +A pragma is a hint to the compiler and is indicated by a pound sign. Hence we define a pragma as: + +``` +TkPragma = "#" IdentInit IdentRem* +``` + +### Character + +A character literal is a single character bounded by single-quote marks. It is defined as: + +``` +CharUnescaped = 0x20-0x26 | 0x28-0x5B | 0x5D-0x7E +CharEscaped = "\" ["r" | "n" | "t" | "0" | "\" | "'" | 0x22] +CharElement = CharUnescaped | CharEscaped +TkChar = "'" CharElement "'" +``` + +That is: a character is any of the visible characters (except for the quotation mark, because that has to be escaped, and except for the backslash character, because that is the indicator of the escaping). Or it is an escaped character. From left to right we have the following supported escape characters: + +1. `r`, `0x0D`: Carriage feed, +2. `n`, `0x0A`: Newline +3. `t`, `0x09`: Horizontal tab +4. `0`, `0x00`: Null character +5. `\`, `0x5C`: Backslash character +6. `'`, `0x27`: Single quote character +7. `"`, `0x22`: Double quote character + +### String Literal + +A string literal is essentially defined in the same way as the character literal, however now we have to escape the '"' character. So it is defined as: + +``` +StrUnescaped = 0x20-0x21 | 0x23-0x5B | 0x5D-0x7E +StrEscaped = CharEscaped +StrElement = StrUnescaped | StrEscaped +TkStr = 0x22 StrElement 0x22 +``` + +Where again, 0x22 is the double quote character itself. + +### Integer Literal + +PDL currently supports binary, octal, decimal and hexadecimal integers. These are defined as: + +``` +IntBinEl = "0" | "1" +IntBinElSep = IntBinEl | "_" +IntBin = "0" ["b" | "B"] IntBinElSep* IntBinEl IntBinElSep* + +IntOctEl = "0"-"7" +IntOctElSep = IntOctEl | "_" +IntOct = "0" ["o" | "O"] IntOctElSep* IntOctEl IntOctElSep* + +IntDecEl = "0"-"9" +IntDecElSep = IntDecEl | "_" +IntDec = IntDecEl IntDecElSep* + +IntHexEl = "0"-"9" | "A"-"F" | "a"-"f" +IntHexElSep = IntHexEl | "_" +IntHex = "0" ["x" | "X"] IntHexElSep* IntHexEl IntHexElSep* + +TkInt = IntBin | IntOct | IntDec | IntHex +``` + +For the regular decimal integers we expect the first character to be an actual digit (to prevent ambiguity with the `TkIdent` token). The remainder may be any decimal digit or the separating `_` character. + +For the non-decimal integers we expect two initial characters indicating the radix of the integer. The remainder of the integer literal must then consist at least once of an element in its alphabet, and the remainder may contain the separating `_` character where possible. + +The separating character is visually useful for the programmer (e.g. `0xDEADBEEF_CAB3CAF3_DEADC0D3_C0D3CAF3` or `0b0001_0010_0100_1000`). But will not contribute to the interpretation of the integer character. + +### Boolean literals + +A boolean literal is just the word "true" or "false". The interpretation of these strings takes precedence over identifiers. We define: + +``` +TkBool = "true" | "false" +``` + +### Keywords + +Several sequences of characters are reserved keywords which have a special meaning within the context of importing modules or the definition of a procedure body. These keywords take precedence over the interpretation of the character sequence as an identifier. Hence, all of these keywords may not be used as identifiers within the program. + +``` +KwLet = "let" +KwAs = "as" +KwStruct = "struct" +KwEnum = "enum" +KwUnion = "union" +KwFunc = "func" +KwPrim = "primitive" +KwComp = "composite" +KwImport = "import" + +Kw = KwLet | KwAs | KwStruct | + KwEnum | KwUnion | KwStruct | + KwFunc | KwPrim | KwComp | + KwImport +``` + +For statements we have the following keywords: + +``` +KwChannel = "channel" +KwIf = "if" +KwElse = "else" +KwWhile = "while" +KwBreak = "break" +KwContinue = "continue" +KwGoto = "goto" +KwReturn = "return" +KwSync = "synchronous" +KwNew = "new" + +Stmt = KwChannel | + KwIf | KwElse | KwWhile | + KwBreak | KwContinue | KwGoto | KwReturn | + KwSync | KwNew +``` + +We use these two lists to define the keyword token as: + +``` +TkKw = Kw | Stmt +``` + +Apart from these keywords the language also features several builtin methods and types. However, from the point of view of the tokenizer these are simply interpreted as identifiers. Scoping rules will ensure that they are not doubly defined. The same is true for types: several types (e.g. the basic integers) are given special attention, but we will define these later. + +### Fixed-Width Tokens + +The remaining tokens are the fixed-width types of punctuation. There are several ones whose first couple of characters are identical. In this case we pick the largest matching sequence of characters. We have: + +``` +TkExcl = "!" +TkQuestion = "?" +TkPound = "#" + +TkLAngle = "<" +TkLCurly = "{" +TkLParen = "(" +TkLSquare = "[" +TkRAngle = ">" +TkRCurly = "}" +TkRParen = ")" + +TkRSquare = "]" +TkColon = ":" +TkComma = "," +TkDot = "." +TkSemiColon = ";" + +TkAt = "@" +TkPlus = "+" +TkMinus = "-" +TkStar = "*" +TkSlash = "/" +TkPercent = "%" +TkCaret = "^" +TkAnd = "&" +TkOr = "|" +TkTilde = "~" +TkEqual = "=" + +TkColonColon = "::" +TkDotDot = ".." +TkArrowRight = "->" +TkAtEquals = "@=" +TkPlusPlus = "++" +TkPlusEquals = "+=" +TkMinusMinus = "--" +TkMinusEquals = "-=" +TkStarEquals = "*=" +TkSlashEquals = "/=" +TkPercentEquals = "%=" +TkCaretEquals = "^=" +TkAndAnd = "&&" +TkAndEquals = "&=" +TkOrOr = "||" +TkOrEquals = "|=" +TkEqualEqual = "==" +TkNotEqual = "!=" +TkShiftLeft = "<<" +TkLessEqual = "<=" +TkShiftRight = ">>" +TkGreaterEqual = ">=" + +TkShiftLeftEqual = "<<=" +TkShiftRightEqual = ">>=" + +TkPunct = ... all of the above +``` + +For brevity's sake, we will not actually use the identifier above when we move onto specifying how definitions/statements/expressions are defined. The reason for specifying all of these combinations is that the tokenizer produces these tokens, and reports errors based on these tokens. Some of the tokens above are not used by the parser at all, and are merely parsed to produce reasonable error messages + +### Combining All Variants + +Our definition for a useful token now becomes: + +``` +Tk = TkChar | TkStr | TkInt | TkBool | TkKw | TkPunct diff --git a/docs/spec/validation.md b/docs/spec/validation.md new file mode 100644 index 0000000000000000000000000000000000000000..4fec475c04f86fd6bc652cbf9ec0e882de712c9a --- /dev/null +++ b/docs/spec/validation.md @@ -0,0 +1,75 @@ +# Validating the AST + +So far we've defined the language in terms of its grammar. Especially for expressions we're still dealing with ambiguity in the interpretation of raw text, and with certain statements or expressions that are invalid in a particular context. We'll deal with these in several phases in this section. Note that some conditions for validity are delayed until the section on typing, as it is more appropriate to introduce those restrictions there. + +## Scoping and Naming Rules + +Without any further constraints one may define variables or types with the same name. So we introduce rules for the various identifiers that are found within a program. + +Within the root of a module we are at the module scope. It's only parent is the global scope (whose contents we'll introduce later). The identifiers of type and procedure declarations within the module may not conflict with one another, or with any of the identifiers in the global scope. + +Secondly, imports introduce identifiers into the module scope. Identifiers of procedures and types imported into a module may not conflict with one another, the global scope (which is implicitly already satisfied, unless the import becomes aliased), or the types defined within a module. In case an import produces an alias for a particular type or procedure, then that is the identifier that may not conflict. + +Whenever we're parsing a procedure (function or component), we enter a new scope whose parent is the module scope. Block statements implicitly introduce a new child scope. Any defined variable's identifier will be added to the scope it is in, and may not conflict with another identifier in the same scope or parent scopes. + +**note**: Look up if I made mistakes with labels. + +Various other groups of identifiers in the program may not conflict among themselves. These are: + +- The polymorphic variables of type or procedure definitions. +- The fields names of a struct. +- The variants of an enum. +- The identifiers of variants of a union (even though they might have had a different number of embedded values). + +## Polymorphic Argument Rules + +Types and procedures may be defined using polymorphic variables. Whenever we use these types to construct literals, or use functions to perform calls, we may specify these as polymorphic arguments. One has the choice of: + +1. Specifying all of the polymorphic arguments. Note that one may still allow for type inference by using the `auto` type in the place of the polymorphic argument. +2. Specifying none of the polymorphic arguments, in this the compiler must assume that all of the polymorphic arguments are to be inferred (i.e. as if the programmer had written `auto` for each of the polymorphic arguments). + +Any other case should be a compile-time error. + +**note**: At the moment polymorphism in the language is rather underpowered until a reasonable and simple scheme for constraining them can be found. At the moment one can only specify one of the builtin types or struct/enum/union types of polymorphic arguments. Furthermore functions and procedures defined with polymorphic variables may not construct literals by using them. + +## Expression Rules + +### Assignment + +Assignment expressions may only be placed at the statement level. That is to say: viewing a series of expressions as a tree, where each node has a parent expression or a parent statement (e.g. the test expression of an if-statement), then assignment expressions may only have the expression-statement as a parent. + +The left hand side of an assignment expression should be assignable. An expression is assignable if it is: + +- An indexing expression with an assignable subject expression. +- A slicing expression with an assignable subject expression. +- A select expression with an assignable subject expression. +- A variable expression. + +### Literals + +An enum literal is valid if its `TypeRef` points to an enum declaration (either directly, or through a potentially aliased import) and the subsequent identifier is a variant declared in the type definition. + +A union literal is valid if its `TypeRef` points (indirectly) to a union declaration, the subsequent identifier is a variant in the type definition, and if (where applicable) the number of embedded expressions matches the number of embedded types in the variant's definitions. + +A struct literal is valid if its `TypeRef` points (indirectly) to a struct declaration, and it provides an expression to initialize all of the struct's fields, and only initializes each field once. + +For all of these literals the rules for polymorphic arguments apply: if the type specifies polymorphic arguments, then all of them have to be specified or none of them have to be specified (for implicit type inference). + +### Binding + +A binding expression itself is only valid if it is nested in the expression tree under no expressions, or only binary expressions. The chain of parents must terminate at an if-statement or while-statements test expression. As a result, binding expressions may not be nested in one another. + +A binding expression's left hand side is valid if it is a binding variable, or if it is a literal that may contain a binding variable. A binding variable is a variable that has not yet been declared using a `StmtLocalMem`. Such a variable will then be implicitly be declared at the binding location, and usable within the scope of the if-statement's then-block, or the while-statement's loop body. The same scoping rules apply to that scope: so binding variables may not be declared twice, nor may conflict to any variables declared in the scope or it's parents. + +Furthermore, the parent of a binding variable may only be some kind of literal expression, or the left hand side of the binding expression itself. + +### Function Calls + +The `TypeRef` for a function call should resolve to either a builtin function, or a user-defined function. Function calls follow the rules for polymorphic arguments as outlined above. Furthermore the number of expressions given as arguments to the function should match the number of arguments in the definition of the function. + +### Component "Calls" + +The `TypeRef` for a component call should resolve to either a builtin component, or a user-defined component. Component calls, like function calls, must following the rules for polymorphic arguments and the arguments to the components themselves. Furthermore component "call" expressions may only be placed with a `StmtNew` as parent. That is to say: one doesn't really call components, instead one instantiates them. + +## Builtin Procedures + diff --git a/language_spec.md b/language_spec.md deleted file mode 100644 index 0e45dbf4ac71ae535689eb6b144b172bfd766e79..0000000000000000000000000000000000000000 --- a/language_spec.md +++ /dev/null @@ -1,291 +0,0 @@ -# Protocol Description Language - -## Introduction - -## Grammar - -Beginning with the basics from which we'll construct the grammar, various characters and special variations thereof: - -``` -SP = " " // space -HTAB = 0x09 // horizontal tab -VCHAR = 0x21-0x7E // visible ASCII character -VCHAR-ESCLESS = 0x20-0x5B | 0x5D-0x7E // visible ASCII character without "\" -WSP = SP | HTAB // whitespace -ALPHA = 0x41-0x5A | 0x61-0x7A // characters (lower and upper case) -DIGIT = 0x30-0x39 // digit -NEWLINE = (0x15 0x0A) | 0x0A // carriage return and line feed, or just line feed - -// Classic backslash escaping to produce particular ASCII charcters -ESCAPE_CHAR = "\" -ESCAPED_CHARS = - ESCAPE_CHAR ESCAPE_CHAR | - ESCAPE_CHAR "t" | - ESCAPE_CHAR "r" | - ESCAPE_CHAR "n" | - ESCAPE_CHAR "0" | - ESCAPE_CHAR "'" | - ESCAPE_CHAR """ -``` - -Which are composed into the following components of an input file that do not directly contribute towards the AST: - -``` -// asterisk followed by any ASCII char, excluding "/", or just any ASCII char without "*" -block-comment-contents = "*" (0x00-0x2E | 0x30-0x7E) | (0x20-0x29 | 0x2B-0x7E) -block-comment = "/*" block-comment-contents* "*/" -line-comment = "//" (WSP | VCHAR)* NEWLINE -comment = block-comment | line-comment -cw = (comment | WSP | NEWLINE)* -cwb = (comment | WSP | newline)+ -``` - -Where it should be noted that the `cw` rule allows for not encountering any of the indicated characters, while the `cwb` rule expects at least one instance. - -The following operators are defined: - -``` -binary-operator = "||" | "&&" | - "!=" | "==" | "<=" | ">=" | "<" | ">" | - "|" | "&" | "^" | "<<" | ">>" | - "+" | "-" | "*" | "/" | "%" -assign-operator = "=" | - "|=" | "&=" | "^=" | "<<=" | ">>=" | - "+=" | "-=" | "*=" | "/=" | "%=" -unary-operator = "++" | "--" | "+" | "-" | "~" | "!" -``` - -**QUESTION**: Do we include the pre/postfix "++" and "--" operators? They were introduced in C to reduce the amount of required characters. But is still necessary? - -And to define various constants in the language, we allow for the following: - -``` -// Various integer constants, binary, octal, decimal, or hexadecimal, with a -// utility underscore to enhance humans reading the characters. Allowing use to -// write something like 100_000_256 or 0xDEAD_BEEF -int-bin-char = "0" | "1" -int-bin-constant = "0b" int-bin-char (int-bin-char | "_")* // 0b0100_1110 -int-oct-char = "0"-"7" -int-oct-constant = "0o" int-oct-char (int-oct-char | "_")* // 0o777 -int-dec-constant = DIGIT (DIGIT | "_")* // -int-hex-char = DIGIT | "a"-"f" | "A"-"F" -int-hex-constant = "0x" int-hex-char (int-hex-char | "_")* // 0xFEFE_1337 -int-constant = int-bin-constant | int-oct-constant | int-dec-constant | int-hex-constant - -// Floating point numbers -// TODO: Maybe support exponential notation? Seems silly for a networking -// language, but might be useful? -float-constant = DIGIT* "." DIGIT+ - -// Character constants: a single character. Its element may be an escaped -// character or a VCHAR (excluding "'" and "\") -char-element = ESCAPED_CHARS | (0x20-0x26 | 0x28-0x5B | 0x5D-0x7E) -char-constant = "'" char-element "'" - -// Same thing for strings, but these may contain 0 or more characters -str-element = ESCAPED_CHARS | (0x20-0x21 | 0x23-0x5B | 0x5D-0x7E) -str-constant = """ str-element* """ -``` - -Note that the integer characters are forced, somewhat arbitrarily without hampering the programmer's expressiveness, to start with a valid digit. Only then may one introduce the `_` character. And non-rigorously speaking characters may not contain an unescaped `'`-character, and strings may not contain an unescaped `"`-character. - -We now introduce the various identifiers that exist within the language, we make a distinction between "any identifier" and "any identifier except for the builtin ones". Because we h - -``` -identifier-any = ALPHA | (ALPHA | DIGIT | "_")* -keyword = - "composite" | "primitive" | - type-primitive | "true" | "false" | "null" | - "struct" | "enum" | - "if" | "else" | - "while" | "break" | "continue" | "return" | - "synchronous" | "assert" | - "goto" | "skip" | "new" | "let" -builtin = "put" | "get" | "fires" | "create" | "assert" -identifier = identifier-any WITHOUT (keyword | builtin) - -// Identifier with any number of prefixed namespaces -ns-identifier = (identifier "::")* identifier -``` - -We then start introducing the type system. Learning from the "mistake" of C/C++ of having types like `byte` and `short` with unspecified and compiler-dependent byte-sizes (followed by everyone using `stdint.h`), we use the Rust/Zig-like `u8`, `i16`, etc. Currently we will limit the programmer to not produce integers which take up more than 64 bits. Furthermore, as one is writing network code, it would be quite neat to be able to put non-byte-aligned integers into a struct in order to directly access meaningful bits. Hence, with restrictions introduced later, we will allow for types like `i4` or `u1`. When actually retrieving them or performing computations with them we will use the next-largest byte-size to operate on them in "registers". - -**Question**: Difference between u1 and bool? Do we allow assignments between them? What about i1 and bool? - -As the language semantics are value-based, we are prevented from returning information from functions through its arguments. We may only return information through its (single) return value. If we consider the common case of having to parse a series of bytes into a meaningful struct, we cannot return both the struct and a value as a success indicator. For this reason, we introduce algebraic datatypes (or: tagged unions, or: enums) as well. - -Lastly, since functions are currently without internal side-effects (since functions cannot perform communication with components, and there is no functionality to interact "with the outside world" from within a function), it does not make sense to introduce the "void" type, as found in C/C++ to indicate that a function doesn't return anything of importance. However, internally we will allow for a "void" type, this will allow treating builtins such as "assert" and "put" like functions while constructing and evaluating the AST. - -``` -// The digits 1-64, without any leading zeros allowed, to allow specifying the -// signed and unsigned integer types -number-1-64 = NZ-DIGIT | (0x31-0x35 DIGIT) | ("6" 0x30-0x34) -type-signed-int = "i" number-1-64 // i1 through i64 -type-unsigned-int = "u" number-1-64 // u1 through u64 - -// Standard floats and bools -type-float = "f32" | "f64" -type-bool = "bool" - -// Messages, may be removed later -type-msg = "msg" - -// Indicators of port types -type-port = "in" | "out" - -// Unions and tagged unions, so we allow: -// enum SpecialBool { True, False } -// enum SpecialBool{True,False,} -// enum Tagged{Boolean(bool),SignedInt(i64),UnsignedInt(u64),Nothing} -type-union-element = identifier cw (("(" cw type cw ")") | ("=" cw int-constant))? -type-union-def = "enum" cwb identifier cw "{" cw type-union-element (cw "," cw type-union-element)* (cw ",")? cw "}" - -// Structs, so we allow: -// struct { u8 type, u2 flag0, u6 reserved } -type-struct-element = type cwb identifier -type-struct-def = "struct" cwb identifier cw "{" cw type-struct-element (cw "," cw type-struct-element)* (cw ",")? cw "}" - -type-primitive = type-signed-int | - type-unsigned-int | - type-float | - type-bool | - type-msg | - type-port - -// A type may be a user-defined type (e.g. "struct Bla"), a namespaced -// user type (e.g. "Module::Bla"), or a non-namespaced primitive type. We -// currently have no way (yet) to access nested modules, so we don't need to -// care about identifier nesting. -type = type-primitive | ns-identifier -``` - -With these types, we need to introduce some extra constant types. Ones that are used to construct struct instances and ones that are used to construct/assign enums. These are constructed as: - -``` -// Struct literals -struct-constant-element = identifier cw ":" cw expr -struct-constant = ns-identifier cw "{" cw struct-constant-element (cw "," struct-constant-element)* cw "}" - -enum-constant = ns-identifier "::" identifier cw "(" cw expr cw ")" -``` - -Finally, we declare methods and field accessors as: - -``` -method = builtin | ns-identifier - -field = "length" | identifier -``` - -**Question**: This requires some discussion. We allow for a "length" field on messages, and allow the definition of arrays. But if we wish to perform computation in a simple fashion, we need to allow for variable-length arrays of custom types. This requires builtin methods like "push", "pop", etc. But I suppose there is a much nicer way... In any case, this reminds me of programming in Fortran, which I definitely don't want to impose on other people (that, or I will force 72-character line lengths on them as well) - -When we parse a particular source file, we may expect the following "pragmas" to be sprinkled at the top of the source -file. They may exist at any position in the global scope of a source file. - -``` -// A domain identifier is a dot-separated sequence of identifiers. As these are -// only used to identify modules we allow any identifier to be used in them. -// The exception is the last identifier, which we, due to namespacing rules, -// force to be a non-reserved identifier. -domain-identifier = (identifier-any ".")* identifier - -pragma-version = "#version" cwb int-constant cw ";" // e.g. #version 500 -pragma-module = "#module" cwb domain-identifier cw ";" // e.g. #module hello.there - -// Import, e.g. -// #import module.submodule // access through submodule::function(), or submodule::Type -// #import module.submodule as Sub // access through Sub::function(), or Sub::type -// #import module.submodule::* // access through function(), or Type -// #import module.submodule::{function} // access through function() -// #import module.submodule::{function as func, type} // access through func() or type - -pragma-import-alias = cwb "as" cwb identifier -pragma-import-all = "::*" -pragma-import-single-symbol = "::" identifier pragma-import-alias? -pragma-import-multi-symbol = "::{" ... - cw identifier pragma-import-alias? ... - (cw "," cw identifier pragma-import-alias?)* ... - (cw ",")? cw "}" -pragma-import = "#import" cwb domain-identifier ... - (pragma-import-alias | pragma-import-all | pragma-import-single-symbol | pragma-import-multi-symbol)? - -// Custom pragmas for people which may be using (sometime, somewhere) -// metaprogramming with pragmas -pragma-custom = "#" identifier-any (cwb VCHAR (VCHAR | WS)*) cw ";" - -// Finally, a pragma may be any of the ones above -pragma = pragma-version | pragma-module | pragma-import | pragma-custom -``` - -Note that, different from C-like languages, we do require semicolons to exist at the end of a pragma statement. The reason is to prevent future hacks using the "\" character to indicate an end-of-line-but-not-really-end-of-line statements. - -Apart from these pragmas, we can have component definitions, type definitions and function definitions within the source file. The grammar for these may be formulated as: - -``` -// Annotated types and function/component arguments -type-annotation = type (cw [])? -var-declaration = type-annotation cwb identifier -params-list = "(" cw (var-declaration (cw "," cw var-declaration)*)? cw ")" - -// Functions and components -function-def = type-annotation cwb identifier cw params-list cw block -composite-def = "composite" cwb identifier cw params-list cw block -primitive-def = "primitive" cwb identifier cw params-list cw block -component-def = composite-def | primitive-def - -// Symbol definitions now become -symbol-def = type-union-def | type-struct-def | function-def | component-def -``` - -Using these rules, we can now describe the grammar of a single file as: - -``` -file = cw (pragma | symbol-def)* cw -``` - -Of course, we currently cannot do anything useful with our grammar, hence we have to describe blocks to let the functions and component definitions do something. To do so, we proceed as: - -``` -// channel a->b;, or channel a -> b; -channel-decl = channel cwb identifier cw "->" cw identifier cw ";" -// int a = 5, b = 2 + 3; -memory-decl = var-declaration cw "=" cw expression (cw "," cw identifier cw "=" cw expression)* cw ";" - -stmt = block | - identifier cw ":" cw stmt | // label - "if" cw pexpr cw stmt (cw "else" cwb stmt)? | - "while" cw pexpr cw stmt | - "break" (cwb identifier)? cw ";" | - "continue" (cwb identifier)? cw ";" | - "synchronous" stmt | - "return" cwb identifier cw ";" | - "goto" cwb identifier cw ";" | - "skip" cw ";" | - "new" cwb method-expr cw ";" | - expr cw ";" - -// TODO: Add all the other expressions -// TODO: Also: add struct construction and enum construction -method-params-list = "(" cw (expr (cw "," cw expr)* )? cw ")" -method-expr = method cw method-params-list - -enum-destructure-expr = "let" cw ns-identifier "::" identifier cw "(" cw identifier cw ")" cw "=" expr -enum-test-expr = ns-identifier "::" identifier cw "==" cw expr - -block = "{" (cw (channel-decl | memory-decl | stmt))* cw "}" -``` - -Note that we have a potential collision of various expressions/statements. The following cases are of importance: - -1. An empty block is written as `{}`, while an empty array construction is also written as `{}`. -2. Both function calls as enum constants feature the same construction syntax. That is: `foo::bar(expression)` may refer to a function call to `bar` in the namespace `foo`, but may also be the construction of enum `foo`'s `bar` variant (containing a value `expression`). These may be disambiguated using the type system. -3. The enumeration destructuring expression may collide with the constant enumeration literal. These may be disambiguated by looking at the inner value. If the inner value is an identifier and not yet defined as a variable, then it is a destructuring expression. Otherwise it must be interpreted as a constant enumeration. The enumeration destructuring expression must then be completed by it being a child of an binary equality operator. If not, then it is invalid syntax. - -Finally, for consistency, there are additional rules to the enumeration destructuring. As a preamble: the language should allow programmers to express any kind of trickery they want, as long as it is correct. But programmers should be prevented from expressing something that is by definition incorrect/illogical. So enumeration destructuring (e.g. `Enum::Variant(bla) == expression`) should return a value with a special type (e.g. `EnumDestructureBool`) that may only reside within the testing expressions of `if` and `while` statements. Furthermore, this special boolean type only supports the logical-and (`&&`) operator. This way we prevent invalid expressions such as `if (Enum::Variant1(foo) == expr || Enum::Variant2(bar) == expr) { ... }`, but we do allow potentially valid expressions like `if (Enum::Variant1(foo) == expr_foo && Enum::Variant2(bar) == expr_bar) { ... }`. - -**Question**: In the documentation for V1.0 we find the `synchronous cw (params-list cw stmt | block)` rule. Why the `params-list`? - -**TODO**: Release constructions on memory declarations: as long as we have a write to it before a read we should be fine. Can be done once we add semantic analysis in order to optimize putting and getting port values. -**TODO**: Implement type inference, should be simpler once I figure out how to write a typechecker. -**TODO**: Add constants assigned in the global scope. -**TODO**: Add a runtime expression evaluator (probably before constants in global scope) to simplify expressions and/or remove impossible branches. \ No newline at end of file