diff --git a/notes_max.md b/notes_max.md index fcfb3e8b6d9fe970f0a2548a7741642c64bbf4e9..6ee6773d472ccc029713f344aeeb97437850743d 100644 --- a/notes_max.md +++ b/notes_max.md @@ -54,45 +54,77 @@ For now we will use the C++-like trait/interfaceless polymorphism: once we know Within functions and connectors we may employ polymorphic types. The specification of the polymorphic arguments is not required if they can be inferred. Polymorphic arguments may be partially specified by using somekind of throwaway `_` or `auto` type. When we are monomorphizing a type we must be able to fully determine all of the polymorphic arguments, if we can't then we throw a compiler error. -## Conclusions +## Type Inference - Monomorphization -- - Type definitions may contain polymorphic arguments. These are simple identifiers. The polymorphic arguments are embedded in definitions, they are erased after type inference on the type's use (e.g. function call or struct instantiation) has finished. +There are two parts to type inference: determining polymorphic variables and determining the `auto` variables. Among the polymorphic types we have: `struct`s, `enum`s, `union`s, `component`s and `function`s. The first three are datatypes and have no body to infer. The latter two are, for the lack of a better word, procedural types or proctypes. -- - While instantiating polymorphic types (e.g. `T something = call_a_func(3, 5)` or the inferred variant `let something = call_a_func(3, 5)`) we actually need to resolve these types. This implies: - - - Memory declarations will have their type embedded within the memory declaration itself. If we do not know the type yet, or only partially know the type, then we need to mark the type as being inferred. - - Expressions will have their type inferred from the constraints the expression imposes on its arguments, and the types of the arguments themselves. Within an expression tree type information may flow from the root towards the leaves or from the leaves towards the root. - - This implies that type information in the AST must be fully specified. If the type information is not yet fully known then it must be embedded in some kind of datastructure that is used for type inference. +We may only infer the types within a proctype if its polymorphic arguments are already inferred. Hence the type inference algorithm for a proctype does not work on the polymorphic types in its body. The type inference algorithm needs to work on all of the `auto` types in the body (potentially those of polymorphic types without explicitly specified polymorphic arguments). + +If the type inference algorithm determines the arguments to a polymorphic type within the body of a proctype then we need to (recursively) instantiate a monomorph of that type. This process is potentially recursive in two ways: + +1. **for datatypes**: when one instantiates a monomorph we suddenly know all of the embedded types within a datatype. At this point we may determine whether the type is potentially recursive or not. We may use this fact to mark struct members or union variants as pointerlike. We can at this point lay out the size and the offsets of the type. -- - A function definition with polyargs seems to behave a bit differently: During the initial definition parsing we may want to make sure that each type in the definition either resolves to a concrete type (optionally with polymorphic arguments) or to one of the polyargs. This would also provide information to the type inference algorithm. +2. **for proctypes**: when one instantiates a monomorph we can restart the inference process for the body of the function. During this process we may arrive at new monomorphs for datatypes or new monomorphs for function types. -## Conclusions - Parsing +## Type Inference - The Algorithm -During parsing we want to embed all of the information the programmer has provided to the compiler in the AST. So polyargs go inside type definitions, types that are supposed to be inferred are embedded in the AST, types that depend on polyargs also go into the AST. So we may have: +So as determined above: when we perform type inference for a function we create a lookup for the assigned polymorphic variables and assign these where used in the expression trees. After this we end up with expression trees with `auto` types scattered here and there, potentially embedded within concrete types (e.g. `in>>`, an input port from which we get arrays of a monomorphized struct `SomeStruct` whose second argument should be inferred). -- Partially concrete types with embedded inferred types (e.g. `let[] list_of_things` is an `Array` or `Result thing = Result::Ok(5)` or `let thing = Result::Ok(5)`). -- Direct references to polyarg types (e.g. `T list_of_things` or `Result thing = ...` or `T[][] array_of_arrays_yes_sir` and `Tuple2 = construct_a_thing()`) -- Fully inferred types (e.g. `auto thing = 5` or `auto result = call_a_thing()`) -- Completely specified types. +Since memory statements have already been converted into variable declarations (present in the appropriate scope) we only truly need to care about the expression trees that reside within a body. The one exception is the place where those expression trees are used: the expression tree used as the condition in an `if` statement must have a boolean return type. We have the following types of expressions, together with the constraints they impose on the inference algorithm (note that I will use the word "matching" types a lot. With that I will disregard implicit casting, for now. This is later fixed by inserting casting expressions that allow for (builtin) type conversion): + +- **assignment**: + Assignment expression (e.g. `a = 5 + 2` or `a <<= 2`). Where we make the distinction between: + + - **=**: We require that the RHS type matches the type of the LHS. If any side is known (partially) then the other side may be inferred. The return type is the type of the LHS. + - **\*=**, **/=**, **%=**, **+=**, **-=**: We require that the LHS is of the appropriate number type and the RHS is of equal type. The return type is the type of the LHS. + - **<<=**, **>>=**: The LHS is of an integer type and the RHS is of any integer type. The return type is the type of the LHS. + - **&=**, **|=**, **^=**: The LHS and the RHS are of equal integer type. The return type is the type of the LHS. + +- **conditional**: + C-like ternary operator (e.g. `something() ? true : false`). We require that the test-expression is of boolean type. We require that the true-expression and the false-expression are of equal type. The return type is of the same type as the true/false-expression return type. + +- **binary**: + Binary operator, where we make the distinction between: + - **@** (concatenate): *I might kick this one out*, requires that the LHS and RHS are arrays with equal subtype (`Array`), or LHS and RHS are strings. The result is an array with the same subtype (`Array`), or string. + - **||**, **&&**: Both LHS and RHS must be booleans. The result is a boolean. + - **|**, **&**, **^**: Both LHS and RHS must be of equal integer type. The return type is the type of the LHS. + - **==**, **!=**: Both LHS and RHS must be of the same type. The return type is boolean + - **<**, **>**, **<=**, **>=**: Both LHS and RHS must be of equal numeric type. The return type is boolean. + - **<<**, **>>**: The LHS is of an integer type and the RHS if of any integer type. The return type is the type of the LHS. + - **+**, **-**, **\***, **/**, **%**: The LHS and RHS are of equal integer type. The return type is the type of the LHS. + +- **unary**: + Unary operator, where we make the distinction between: + **+**, **-**: Argument may be any numeric type, the return type is the same as the argument. + **!**: Argument must be of boolean type, the return type is also boolean. + **~**: Argument must be of any integer type, the return type is the same as the argument. + **--**, **++**: Argument must be of any integer type. The return type is the same as the argument. + +- **indexing expression**: + Indexes into an array or a string (e.g. `variable[i]`). The index must always be a `usize`/`isize` integer type (with the appropriate implicit casts). The subject must always be of type `Array` or string and the result is of type `T`. + +- **slicing expression**: + Slices an array or a string (e.g. `some_array[5..calculate_the_end()]`). The beginning and ending indices must always be of **equal** `usize`/`isize` integer type. The subject must always be of type `Array` or string, the return type is of the same type as the subject. + +- **select expression**: + Selects a member from a struct (e.g. `some_struct.some_field`). This expression can only be evaluated if we know the type of the subject (which must be a struct). If the field exists on that struct then we may determine the return type to be the type of that field. + +- **array expression**: + Constructs an array by explicitly specifying the members of that array. Lets assume for now that you cannot specify a string in this silly fashion. Then all of the array elements must have the same type `T`, and the result type of the expression is `Array`. -## Conclusions - Type Inference +- **call expression**: + Calls a particular function (e.g. `some_function(some_arg)`). Non-polymorphic function arguments imply that the argument must be of the type of the argument. A non-polymorphic return type fixes the output type of the expression. -During type inference and typechecking we need to determine all concrete types. So we will arrive at a fully specified type: every polymorphic type will have its polyargs specified. Likewise every inferred type will be fully specified as well. + In the polymorphic case the inference works the other way around: once we know the output type of the expression that is used as a polymorphic variable, then we can fix that particular polymorphic argument and feed that information to all places where that polymorphic argument is used. -All of this hints at having to specify two classes of types: `ParserType` and a `Type`. The `ParserType` can be: + The same is true for the return type: if the return type of the call is polymorphic then the place where this return type is used determines the type of the polymorphic argument. -- A builtin type (with embedded `ParserType`s where appropriate) -- An inferred type -- A polyarg type -- A symbolic type (with embedded `ParsedType`s where appropriate) +- **constant expression**: *This requires a lot of rework at some point.* + A constant expression contains a literal of a specific type. Hence the output type of the literal is the type of the literal itself. In case of literal `struct`, `enum` and `union` instances the output type is that exact datatype. -While the final `Type` can be: + For the embedded types we have two cases: if the embedded type is not polymorphic and some expression is used at that position, then the embedded type must match the output type of the expression. If the embedded type is polymorphic then the output type of the expression determines the polymorphic argument. -- A builtin type (with embedded types where appropriate) -- A known user-defined type (with embedded types where appropriate) + In case of `string` literals the output type is also a `string`. However, in the case of integer literals we can only determine that the output type is some integer, the place where the literal is employed determines the integer type. It is valid to use this for byteshifting, but also for operating on floating-point types. In the case of decimal literals we can use these for operations on floating point types. But again: whether it is a `float` or a `double` depends on the place where the literal is used. -Note that we cannot typecheck polymorphic connectors and functions until we monomorphize them. \ No newline at end of file +- **variable expression**: + Refers to some variable declared somewhere. Hence the return type is the same as the type of the variable. \ No newline at end of file