diff --git a/notes_max.md b/notes_max.md deleted file mode 100644 index 3b771c846278c9851d416d2f969afdc6b68b05cb..0000000000000000000000000000000000000000 --- a/notes_max.md +++ /dev/null @@ -1,146 +0,0 @@ -# Noteworthy changes - -- `if`, `while`, `synchronous`, `function`, `primitive` and `composite` body statements, if not yet a block, are all converted into a block for simpler parsing/compiling. - -# Thinkings: Type System - -## Basic Types - -Builtin primitive types: - -- Boolean: `bool` -- Unsigned integers: `u8`, `u16`, `u32`, `u64` -- Signed integers: `i8`, `i16`, `i32`, `i64` -- Decimal numbers: `f32`, `f64` -- Message-based: `in`, `out`, `msg` - -Builtin compound types: - -- Array: `array` - -User-specified types (excluding polymorphism for now): - -- Connectors: `primitive`, `composite` -- Functions: `func` -- Enums: `enum` -- Unions: `enum` -- Structs: `struct` - -## Polymorphic Type Definitions - -Various types may be polymorphic. We will exclude builtin polymorphs from the following considerations (i.e. `in`, `out` and `array`, and their associated functions). - -Connectors, functions, enums, unions and structs may all be polymorphic. We'll use the C++/rust/Java syntax for specifying polymorphic types: - -```pdl -primitive fifo(in i, out o) { - // impl -} - -T pick_one_of_two(T one, T two) { - // impl -} - -enum Option{ None, Some(T) } -enum Result{ Ok(T), Err(E) } -struct Pair { T first, T second } -``` - -This means that during the initial validation/linker phase all of the types may have polymorphic arguments. These polyargs have just an identifier, we can only determine the concrete type when we encounter it in another body where we either defer the type or where the user has explicitly instantiated the type. - -For now we will use the C++-like trait/interfaceless polymorphism: once we know all of the polymorphic arguments we will try to monomorphize the type and check whether or not all calls make sense. This in itself is a recursive operation: inside the polymorph we may use other polymorphic types. - -## Polymorphic Type Usage - -Within functions and connectors we may employ polymorphic types. The specification of the polymorphic arguments is not required if they can be inferred. Polymorphic arguments may be partially specified by using somekind of throwaway `_` or `auto` type. When we are monomorphizing a type we must be able to fully determine all of the polymorphic arguments, if we can't then we throw a compiler error. - -## Type Inference - Monomorphization - -There are two parts to type inference: determining polymorphic variables and determining the `auto` variables. Among the polymorphic types we have: `struct`s, `enum`s, `union`s, `component`s and `function`s. The first three are datatypes and have no body to infer. The latter two are, for the lack of a better word, procedural types or proctypes. - -We may only infer the types within a proctype if its polymorphic arguments are already inferred. Hence the type inference algorithm for a proctype does not work on the polymorphic types in its body. The type inference algorithm needs to work on all of the `auto` types in the body (potentially those of polymorphic types without explicitly specified polymorphic arguments). - -If the type inference algorithm determines the arguments to a polymorphic type within the body of a proctype then we need to (recursively) instantiate a monomorph of that type. This process is potentially recursive in two ways: - -1. **for datatypes**: when one instantiates a monomorph we suddenly know all of the embedded types within a datatype. At this point we may determine whether the type is potentially recursive or not. We may use this fact to mark struct members or union variants as pointerlike. We can at this point lay out the size and the offsets of the type. - -2. **for proctypes**: when one instantiates a monomorph we can restart the inference process for the body of the function. During this process we may arrive at new monomorphs for datatypes or new monomorphs for function types. - -## Type Inference - The Algorithm - -So as determined above: when we perform type inference for a function we create a lookup for the assigned polymorphic variables and assign these where used in the expression trees. After this we end up with expression trees with `auto` types scattered here and there, potentially embedded within concrete types (e.g. `in>>`, an input port from which we get arrays of a monomorphized struct `SomeStruct` whose second argument should be inferred). - -Since memory statements have already been converted into variable declarations (present in the appropriate scope) we only truly need to care about the expression trees that reside within a body. The one exception is the place where those expression trees are used: the expression tree used as the condition in an `if` statement must have a boolean return type. We have the following types of expressions, together with the constraints they impose on the inference algorithm (note that I will use the word "matching" types a lot. With that I will disregard implicit casting, for now. This is later fixed by inserting casting expressions that allow for (builtin) type conversion): - -- **assignment**: - Assignment expression (e.g. `a = 5 + 2` or `a <<= 2`). Where we make the distinction between: - - - **=**: We require that the RHS type matches the type of the LHS. If any side is known (partially) then the other side may be inferred. The return type is the type of the LHS. - - **\*=**, **/=**, **%=**, **+=**, **-=**: We require that the LHS is of the appropriate number type and the RHS is of equal type. The return type is the type of the LHS. - - **<<=**, **>>=**: The LHS is of an integer type and the RHS is of any integer type. The return type is the type of the LHS. - - **&=**, **|=**, **^=**: The LHS and the RHS are of equal integer type. The return type is the type of the LHS. - -- **conditional**: - C-like ternary operator (e.g. `something() ? true : false`). We require that the test-expression is of boolean type. We require that the true-expression and the false-expression are of equal type. The return type is of the same type as the true/false-expression return type. - -- **binary**: - Binary operator, where we make the distinction between: - - **@** (concatenate): *I might kick this one out*, requires that the LHS and RHS are arrays with equal subtype (`Array`), or LHS and RHS are strings. The result is an array with the same subtype (`Array`), or string. - - **||**, **&&**: Both LHS and RHS must be booleans. The result is a boolean. - - **|**, **&**, **^**: Both LHS and RHS must be of equal integer type. The return type is the type of the LHS. - - **==**, **!=**: Both LHS and RHS must be of the same type. The return type is boolean - - **<**, **>**, **<=**, **>=**: Both LHS and RHS must be of equal numeric type. The return type is boolean. - - **<<**, **>>**: The LHS is of an integer type and the RHS if of any integer type. The return type is the type of the LHS. - - **+**, **-**, **\***, **/**, **%**: The LHS and RHS are of equal integer type. The return type is the type of the LHS. - -- **unary**: - Unary operator, where we make the distinction between: - **+**, **-**: Argument may be any numeric type, the return type is the same as the argument. - **!**: Argument must be of boolean type, the return type is also boolean. - **~**: Argument must be of any integer type, the return type is the same as the argument. - **--**, **++**: Argument must be of any integer type. The return type is the same as the argument. - -- **indexing expression**: - Indexes into an array or a string (e.g. `variable[i]`). The index must always be a `usize`/`isize` integer type (with the appropriate implicit casts). The subject must always be of type `Array` or string and the result is of type `T`. - -- **slicing expression**: - Slices an array or a string (e.g. `some_array[5..calculate_the_end()]`). The beginning and ending indices must always be of **equal** `usize`/`isize` integer type. The subject must always be of type `Array` or string, the return type is of the same type as the subject. - -- **select expression**: - Selects a member from a struct (e.g. `some_struct.some_field`). This expression can only be evaluated if we know the type of the subject (which must be a struct). If the field exists on that struct then we may determine the return type to be the type of that field. - -- **array expression**: - Constructs an array by explicitly specifying the members of that array. Lets assume for now that you cannot specify a string in this silly fashion. Then all of the array elements must have the same type `T`, and the result type of the expression is `Array`. - -- **call expression**: - Calls a particular function (e.g. `some_function(some_arg)`). Non-polymorphic function arguments imply that the argument must be of the type of the argument. A non-polymorphic return type fixes the output type of the expression. - - In the polymorphic case the inference works the other way around: once we know the output type of the expression that is used as a polymorphic variable, then we can fix that particular polymorphic argument and feed that information to all places where that polymorphic argument is used. - - The same is true for the return type: if the return type of the call is polymorphic then the place where this return type is used determines the type of the polymorphic argument. - -- **constant expression**: *This requires a lot of rework at some point.* - A constant expression contains a literal of a specific type. Hence the output type of the literal is the type of the literal itself. In case of literal `struct`, `enum` and `union` instances the output type is that exact datatype. - - For the embedded types we have two cases: if the embedded type is not polymorphic and some expression is used at that position, then the embedded type must match the output type of the expression. If the embedded type is polymorphic then the output type of the expression determines the polymorphic argument. - - In case of `string` literals the output type is also a `string`. However, in the case of integer literals we can only determine that the output type is some integer, the place where the literal is employed determines the integer type. It is valid to use this for byteshifting, but also for operating on floating-point types. In the case of decimal literals we can use these for operations on floating point types. But again: whether it is a `float` or a `double` depends on the place where the literal is used. - -- **variable expression**: - Refers to some variable declared somewhere. Hence the return type is the same as the type of the variable. - -## Type Inference - The Concrete Algorithmic Steps - -Lets start with places where the parser types (i.e. not the types assigned to the nodes in the expression trees) are actually specified: - -- Function and component arguments -- Function return types -- Local variable declarations (from both regular and channel variables) -- Struct fields -- Enum variants - -For now we do not consider struct/enum/union literals yet. We perform type inference in bodies of functions/components. Whenever we actually lay out a monomorphed proctype we already know the polymorphic arguments of that proctype. Hence, by definition, we know all the types of the proctype arguments and function return types of the proctype we're resolving. - -Hence we encounter: -- Inferred types of variable declarations: These may be fully inferred, partially inferred, or depend on the polymorphic variables of the wrapping proctype's polymorphic arguments (which are determined). Hence if they're somehow inferred, then we mark them as such. -- Inferred types of polyargs of called functions/components: Special case where we use the return type of the proctype and the expressions used as arguments to determine the polymorphic types. Once we know the polymorphic types we may determine other types, or the other way around. Now that I think about it, this seems like a special case of inference in the call/literal expression itself. So there seems to be no reason to pay particular attention to this. \ No newline at end of file diff --git a/src/protocol/parser/mod.rs b/src/protocol/parser/mod.rs index f1020b6808e082bf53e036106d419aada60a8c0f..55863e937755362f19a0a6deb35d74b09b6751d1 100644 --- a/src/protocol/parser/mod.rs +++ b/src/protocol/parser/mod.rs @@ -2,7 +2,7 @@ mod depth_visitor; mod symbol_table; // mod type_table_old; mod type_table; -// mod type_resolver; +mod type_resolver; mod visitor; mod visitor_linker; mod utils; diff --git a/src/protocol/parser/type_resolver.rs b/src/protocol/parser/type_resolver.rs index fbe7543187f3b9ae66fe601ef795f59cd5a84035..56d7b441234e7c575c8c677a3d10e0a9e77816ac 100644 --- a/src/protocol/parser/type_resolver.rs +++ b/src/protocol/parser/type_resolver.rs @@ -1,3 +1,5 @@ +use std::collections::{HashMap, VecDeque}; + use crate::protocol::ast::*; use crate::protocol::inputsource::*; use super::type_table::*; @@ -9,11 +11,14 @@ use super::visitor::{ Visitor2, VisitorResult }; -use std::collections::HashMap; +#[derive(Clone, Eq, PartialEq)] pub(crate) enum InferredPart { - // Unknown section of inferred type, yet to be inferred + // Unknown type, yet to be inferred Unknown, + // Special cases + Void, // For builtin functions without a return type. Result of a "call to a component" + IntegerLike, // For integer literals without a concrete type // No subtypes Message, Bool, @@ -31,6 +36,16 @@ pub(crate) enum InferredPart { Instance(DefinitionId, usize), } +impl InferredPart { + fn is_concrete_int(&self) -> bool { + use InferredPart as IP; + match self { + IP::Byte | IP::Short | IP::Int | IP::Long => true, + _ => false + } + } +} + impl From for InferredPart { fn from(v: ConcreteTypeVariant) -> Self { use ConcreteTypeVariant as CTV; @@ -54,24 +69,194 @@ impl From for InferredPart { } pub(crate) struct InferenceType { - origin: ParserTypeId, done: bool, - inferred: Vec, + parts: Vec, } impl InferenceType { - fn new(inferred_type: ParserTypeId) -> Self { - Self{ origin: inferred_type, done: false, inferred: vec![InferredPart::Unknown] } + fn new(done: bool, inferred: Vec) -> Self { + Self{ done, parts: inferred } + } + + fn find_subtree_end_idx(&self, mut idx: usize) -> usize { + use InferredPart as IP; + + let mut depth = 1; + loop { + match self.parts[idx] { + IP::Unknown | IP::Void | IP::IntegerLike | + IP::Message | IP::Bool | + IP::Byte | IP::Short | IP::Int | IP::Long | + IP::String => { + depth -= 1; + }, + IP::Array | IP::Slice | IP::Input | IP::Output => { + // depth remains unaltered + }, + IP::Instance(_, num_sub) => { + depth += (num_sub as i32) - 1 + } + } + + idx += 1; + if depth == 0 { + return idx + } + } } - fn assign_concrete(&mut self, concrete_type: &ConcreteType) { - self.done = true; - self.inferred.clear(); - self.inferred.reserve(concrete_type.v.len()); - for variant in concrete_type.v { - self.inferred.push(InferredPart::from(variant)) + // TODO: @float + fn might_be_numeric(&self) -> bool { + use InferredPart as IP; + + debug_assert!(!self.parts.is_empty()); + if self.parts.len() != 1 { return false; } + match self.parts[0] { + IP::Unknown | IP::IntegerLike | IP::Byte | IP::Short | IP::Int | IP::Long => true, + _ => false } } + + fn might_be_integer(&self) -> bool { + // TODO: @float Once floats are implemented this is no longer true + self.might_be_numeric() + } +} + +impl std::fmt::Display for InferenceType { + fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { + fn write_recursive(arg: ) + } +} + +enum InferenceResult { + Neither, // neither argument is clarified + First, // first argument is clarified using the second one + Second, // second argument is clarified using the first one + Both, // both arguments are clarified + Incompatible, // types are incompatible: programmer error +} + +impl InferenceResult { + fn modified_any(&self) -> bool { + match self { + InferenceResult::First | InferenceResult::Second | InferenceResult::Both => true, + _ => false + } + } + fn modified_lhs(&self) -> bool { + match self { + InferenceResult::First | InferenceResult::Both => true, + _ => false + } + } +} + +// Attempts to infer types within two `InferenceType` instances. If they are +// compatible then the result indicates which of the arguments were modified. +// After successful inference the parts of the inferred type have equal length. +// +// Personal note: inference is not the copying of types: this algorithm must +// infer that `TypeOuter` and `Struct` resolves to +// `TypeOuter`. +unsafe fn progress_inference_types(a: *mut InferenceType, b: *mut InferenceType) -> InferenceResult { + debug_assert!(!a.parts.is_empty()); + debug_assert!(!b.parts.is_empty()); + + // Iterate over the elements of both types. Each (partially) known type + // element must be compatible. Unknown types will be replaced by their + // concrete counterpart if possible. + let mut modified_a = false; + let mut modified_b = false; + let mut iter_idx = 0; + + while iter_idx < a.parts.len() { + let a_part = &a.parts[iter_idx]; + let b_part = &b.parts[iter_idx]; + + if a_part == b_part { + // Exact match + iter_idx += 1; + continue; + } + + // Not an exact match, so deal with edge cases + // - inference of integerlike types to conrete integers + if a_part == InferredPart::IntegerLike && b_part.is_concrete_int() { + modified_a = true; + a.parts[iter_idx] = b_part.clone(); + iter_idx += 1; + continue; + } + if b_part == InferredPart::IntegerLike && a_part.is_concrete_int() { + modified_b = true; + b.parts[iter_idx] = a_part.clone(); + iter_idx += 1; + continue; + } + + // - inference of unknown type + if a_part == InferredPart::Unknown { + let end_idx = b.find_subtree_end_idx(iter_idx); + a.parts[iter_idx] = b.parts[iter_idx].clone(); + for insert_idx in (iter_idx + 1)..end_idx { + a.parts.insert(insert_idx, b.parts[insert_idx].clone()); + } + + modified_a = true; + iter_idx = end_idx; + continue; + } + + if b_part == InferredPart::Unknown { + let end_idx = a.find_subtree_end_idx(iter_idx); + b.parts[iter_idx] = a.parts[iter_idx].clone(); + for insert_idx in (iter_idx + 1)..end_idx { + b.parts.insert(insert_idx, a.parts[insert_idx].clone()); + } + + modified_b = true; + iter_idx = end_idx; + continue; + } + + // If here then we cannot match the two parts + return InferenceResult::Incompatible; + } + + // TODO: @performance, can be done inline + a.done = true; + for part in &a.parts { + if part == InferredPart::Unknown { + a.done = false; + break + } + } + + b.done = true; + for part in &b.parts { + if part == InferredPart::Unknown { + b.done = false; + break; + } + } + + // If the inference parts are correctly constructed (>0 elements, proper + // trees) then being here means that both types are not only of compatible + // types, but MUST have the same length. + debug_assert_eq!(a.parts.len(), b.parts.len()); + match (modified_a, modified_b) { + (true, true) => InferenceResult::Both, + (true, false) => InferenceResult::First, + (false, true) => InferenceResult::Second, + (false, false) => InferenceResult::Neither + } +} + +enum DefinitionType{ + None, + Component(ComponentId), + Function(FunctionId), } // TODO: @cleanup I will do a very dirty implementation first, because I have no idea @@ -95,6 +280,8 @@ impl InferenceType { /// This will be achieved by slowly descending into the AST. At any given /// expression we may depend on pub(crate) struct TypeResolvingVisitor { + definition_type: DefinitionType, + // Buffers for iteration over substatements and subexpressions stmt_buffer: Vec, expr_buffer: Vec, @@ -105,27 +292,29 @@ pub(crate) struct TypeResolvingVisitor { polyvars: Vec, // Mapping from parser type to inferred type. We attempt to continue to // specify these types until we're stuck or we've fully determined the type. - infer_types: HashMap, - // Mapping from variable ID to parser type, optionally inferred, so then - var_types: HashMap, + infer_types: HashMap, + expr_types: HashMap, } impl TypeResolvingVisitor { pub(crate) fn new() -> Self { TypeResolvingVisitor{ + definition_type: DefinitionType::None, stmt_buffer: Vec::with_capacity(STMT_BUFFER_INIT_CAPACITY), expr_buffer: Vec::with_capacity(EXPR_BUFFER_INIT_CAPACITY), polyvars: Vec::new(), infer_types: HashMap::new(), - var_types: HashMap::new(), + expr_types: HashMap::new(), } } fn reset(&mut self) { + self.definition_type = DefinitionType::None; self.stmt_buffer.clear(); self.expr_buffer.clear(); + self.polyvars.clear(); self.infer_types.clear(); - self.var_types.clear(); + self.expr_types.clear(); } } @@ -134,10 +323,16 @@ impl Visitor2 for TypeResolvingVisitor { fn visit_component_definition(&mut self, ctx: &mut Ctx, id: ComponentId) -> VisitorResult { self.reset(); + self.definition_type = DefinitionType::Component(id); + let comp_def = &ctx.heap[id]; + debug_assert_eq!(comp_def.poly_vars.len(), self.polyvars.len(), "component polyvars do not match imposed polyvars"); + for param_id in comp_def.parameters.clone() { let param = &ctx.heap[param_id]; - self.var_types.insert(param_id.upcast(), param.parser_type); + let infer_type = self.determine_inference_type_from_parser_type(ctx, param.parser_type); + debug_assert!(infer_type.done, "expected component arguments to be concrete types"); + self.infer_types.insert(param_id.upcast(), infer_type); } let body_stmt_id = ctx.heap[id].body; @@ -146,14 +341,19 @@ impl Visitor2 for TypeResolvingVisitor { fn visit_function_definition(&mut self, ctx: &mut Ctx, id: FunctionId) -> VisitorResult { self.reset(); + self.definition_type = DefinitionType::Function(id); + let func_def = &ctx.heap[id]; + debug_assert_eq!(func_def.poly_vars.len(), self.polyvars.len(), "function polyvars do not match imposed polyvars"); + for param_id in func_def.parameters.clone() { let param = &ctx.heap[param_id]; - self.var_types.insert(param_id.upcast(), param.parser_type); + let infer_type = self.determine_inference_type_from_parser_type(ctx, param.parser_type); + debug_assert!(infer_type.done, "expected function arguments to be concrete types"); + self.infer_types.insert(param_id.upcast(), infer_type); } - let body_stmt_id = ctx.heap[id].body; - + let body_stmt_id = ctx.heap[id].body; self.visit_stmt(ctx, body_stmt_id) } @@ -172,67 +372,413 @@ impl Visitor2 for TypeResolvingVisitor { fn visit_local_memory_stmt(&mut self, ctx: &mut Ctx, id: MemoryStatementId) -> VisitorResult { let memory_stmt = &ctx.heap[id]; + let local = &ctx.heap[memory_stmt.variable]; - self.var_types.insert(memory_stmt.variable, ) + let infer_type = self.determine_inference_type_from_parser_type(ctx, local.parser_type); + self.infer_types.insert(memory_stmt.variable.upcast(), infer_type); + + let expr_id = memory_stmt.initial; + self.visit_expr(ctx, expr_id)?; Ok(()) } fn visit_local_channel_stmt(&mut self, ctx: &mut Ctx, id: ChannelStatementId) -> VisitorResult { + let channel_stmt = &ctx.heap[id]; + + let from_local = &ctx.heap[channel_stmt.from]; + let from_infer_type = self.determine_inference_type_from_parser_type(ctx, from_local.parser_type); + self.infer_types.insert(from_local.this.upcast(), from_infer_type); + + let to_local = &ctx.heap[channel_stmt.to]; + let to_infer_type = self.determine_inference_type_from_parser_type(ctx, to_local.parser_type); + self.infer_types.insert(to_local.this.upcast(), to_infer_type); + Ok(()) } + + fn visit_labeled_stmt(&mut self, ctx: &mut Ctx, id: LabeledStatementId) -> VisitorResult { + let labeled_stmt = &ctx.heap[id]; + let substmt_id = labeled_stmt.body; + self.visit_stmt(ctx, substmt_id) + } + + fn visit_if_stmt(&mut self, ctx: &mut Ctx, id: IfStatementId) -> VisitorResult { + let if_stmt = &ctx.heap[id]; + + let true_body_id = if_stmt.true_body; + let false_body_id = if_stmt.false_body; + let test_expr_id = if_stmt.test; + + self.visit_expr(ctx, test_expr_id)?; + self.visit_stmt(ctx, true_body_id)?; + self.visit_stmt(ctx, false_body_id)?; + + Ok(()) + } + + fn visit_while_stmt(&mut self, ctx: &mut Ctx, id: WhileStatementId) -> VisitorResult { + let while_stmt = &ctx.heap[id]; + + let body_id = while_stmt.body; + let test_expr_id = while_stmt.test; + + self.visit_expr(ctx, test_expr_id)?; + self.visit_stmt(ctx, body_id)?; + + Ok(()) + } + + fn visit_synchronous_stmt(&mut self, ctx: &mut Ctx, id: SynchronousStatementId) -> VisitorResult { + let sync_stmt = &ctx.heap[id]; + let body_id = sync_stmt.body; + + self.visit_stmt(ctx, body_id) + } + + fn visit_return_stmt(&mut self, ctx: &mut Ctx, id: ReturnStatementId) -> VisitorResult { + let return_stmt = &ctx.heap[id]; + let expr_id = return_stmt.expression; + + self.visit_expr(ctx, expr_id) + } + + fn visit_assert_stmt(&mut self, ctx: &mut Ctx, id: AssertStatementId) -> VisitorResult { + let assert_stmt = &ctx.heap[id]; + let test_expr_id = assert_stmt.expression; + + self.visit_expr(ctx, expr_id) + } + + fn visit_new_stmt(&mut self, ctx: &mut Ctx, id: NewStatementId) -> VisitorResult { + let new_stmt = &ctx.heap[id]; + let call_expr_id = new_stmt.expression; + + self.visit_call_expr(ctx, call_expr_id) + } + + fn visit_put_stmt(&mut self, ctx: &mut Ctx, id: PutStatementId) -> VisitorResult { + let put_stmt = &ctx.heap[id]; + + let port_expr_id = put_stmt.port; + let msg_expr_id = put_stmt.message; + // TODO: What what? + + self.visit_expr(ctx, port_expr_id)?; + self.visit_expr(ctx, msg_expr_id)?; + + Ok(()) + } + + fn visit_expr_stmt(&mut self, ctx: &mut Ctx, id: ExpressionStatementId) -> VisitorResult { + let expr_stmt = &ctx.heap[id]; + let subexpr_id = expr_stmt.expression; + + self.visit_expr(ctx, subexpr_id) + } + + // Expressions + + fn visit_assignment_expr(&mut self, ctx: &mut Ctx, id: AssignmentExpressionId) -> VisitorResult { + let upcast_id = id.upcast(); + self.expr_types.insert(upcast_id, self.determine_initial_expr_inference_type(ctx, upcast_id)); + + let assign_expr = &ctx.heap[id]; + let left_expr_id = assign_expr.left; + let right_expr_id = assign_expr.right; + + self.visit_expr(ctx, left_expr_id)?; + self.visit_expr(ctx, right_expr_id)?; + + // TODO: Initial progress? + } + + fn visit_conditional_expr(&mut self, ctx: &mut Ctx, id: ConditionalExpressionId) -> VisitorResult { + let upcast_id = id.upcast(); + self.expr_types.insert(upcast_id, self.determine_initial_expr_inference_type(ctx, upcast_id)); + + let conditional_expr = &ctx.heap[id]; + let test_expr_id = conditional_expr.test; + let true_expr_id = conditional_expr.true_expression; + let false_expr_id = conditional_expr.false_expression; + + self.expr_types.insert(test_expr_id, InferenceType::new(true, vec![InferredPart::Bool])); + self.visit_expr(ctx, test_expr_id)?; + self.visit_expr(ctx, true_expr_id)?; + self.visit_expr(ctx, false_expr_id)?; + + // TODO: Initial progress? + Ok(()) + } +} + +enum TypeClass { + Numeric, // int and float + Integer, +} + +macro_rules! debug_assert_expr_ids_unique_and_known { + // Base case for a single expression ID + ($id:ident) => { + if cfg!(debug_assertions) { + self.expr_types.contains_key(&$id); + } + }; + // Base case for two expression IDs + ($id1:ident, $id2:ident) => { + debug_assert_ne!($id1, $id2); + debug_assert_expr_id!($id1); + debug_assert_expr_id!($id2); + }; + // Generic case + ($id1:ident, $id2:ident, $($tail:ident),+) => { + debug_assert_ne!($id1, $id2); + debug_assert_expr_id!($id1); + debug_assert_expr_id!($id2, $($tail),+); + }; +} + +enum TypeConstraintResult { + Progress, // Success: Made progress in applying constraints + NoProgess, // Success: But did not make any progress in applying constraints + ErrExprType, // Error: Expression type did not match the argument(s) of the expression type + ErrArgType, // Error: Expression argument types did not match } impl TypeResolvingVisitor { - // We have a function that traverses the types of variable expressions. If - // we do not know the type yet we insert it in the "infer types" list. - // If we do know the type then we can return it and assign it in the - // variable expression. - // Hence although the parser types are recursive structures with nested - // ParserType IDs, here we need to traverse the type all at once. - fn insert_parser_type_if_needs_inference( - &mut self, ctx: &mut Ctx, root_id: RootId, parser_type_id: ParserTypeId - ) -> Result<(), ParseError2> { + fn progress_assignment_expr(&mut self, ctx: &mut Ctx, id: AssignmentExpressionId) { + use AssignmentOperator as AO; + + // TODO: Assignable check + let (type_class, arg1_expr_id, arg2_expr_id) = { + let expr = &ctx.heap[id]; + let type_class = match expr.operation { + AO::Set => + None, + AO::Multiplied | AO::Divided | AO::Added | AO::Subtracted => + Some(TypeClass::Numeric), + AO::Remained | AO::ShiftedLeft | AO::ShiftedRight | + AO::BitwiseAnded | AO::BitwiseXored | AO::BitwiseOred => + Some(TypeClass::Integer), + }; + + (type_class, expr.left, expr.right) + }; + + let upcast_id = id.upcast(); + self.apply_equal3_constraint(ctx, upcast_id, arg1_expr_id, arg2_expr_id); + } + + /// Applies a type constraint that expects all three provided types to be + /// equal. In case we can make progress in inferring the types then we + /// attempt to do so. If the call is successful then the composition of all + /// types is made equal. + fn apply_equal3_constraint( + &mut self, ctx: &mut Ctx, expr_id: ExpressionId, + arg1_id: ExpressionId, arg2_id: ExpressionId + ) -> Result { + // Safety: all expression IDs are always distinct, and we do not modify + // the container + debug_assert_expr_ids_unique_and_known!(id1, id2, id3); + let expr_type: *mut _ = self.expr_types.get_mut(&expr_id).unwrap(); + let arg1_type: *mut _ = self.expr_types.get_mut(&arg1_id).unwrap(); + let arg2_type: *mut _ = self.expr_types.get_mut(&arg2_id).unwrap(); + + let expr_res = unsafe{ progress_inference_types(expr_type, arg1_type) }; + if expr_res == InferenceResult::Incompatible { return TypeConstraintResult::ErrExprType; } + + let args_res = unsafe{ progress_inference_types(arg1_type, arg2_type) }; + if args_res == InferenceResult::Incompatible { return TypeConstraintResult::ErrArgType; } + + // If all types are compatible, but the second call caused type2 to be + // expanded, then we must also re-expand type1. + if args_res.modified_lhs() { + type1.parts.clear(); + type1.parts.extend(&type2.parts); + } + + if expr_res.modified_any() || args_res.modified_any() { + TypeConstraintResult::Progress + } else { + TypeConstraintResult::NoProgess + } + } + + /// Applies a typeclass constraint: checks if the type is of a particular + /// class or not + fn expr_type_is_of_type_class( + &mut self, ctx: &mut Ctx, expr_id: ExpressionId, type_class: TypeClass + ) -> bool { + debug_assert_expr_ids_unique_and_known!(expr_id); + let expr_type = self.expr_types.get(&expr_id).unwrap(); + + match type_class { + TypeClass::Numeric => expr_type.might_be_numeric(), + TypeClass::Integer => expr_type.might_be_integer() + } + } + + /// Determines the `InferenceType` for the expression based on the + /// expression parent. Note that if the parent is another expression, we do + /// not take special action, instead we let parent expressions fix the type + /// of subexpressions before they have a chance to call this function. + /// Hence: if the expression type is already set, this function doesn't do + /// anything. + fn insert_initial_expr_inference_type( + &mut self, ctx: &mut Ctx, expr_id: ExpressionId + ) { + // TODO: @cleanup Concept of "parent expression" can be removed, the + // type inferer/checker can set this upon the initial pass + use ExpressionParent as EP; + if self.expr_types.contains_key(&expr_id) { return; } + + let expr = &ctx.heap[expr_id]; + let inference_type = match expr.parent() { + EP::None => + // Should have been set by linker + unreachable!(), + EP::Memory(_) | EP::ExpressionStmt(_) | EP::Expression(_, _) => + // Determined during type inference + InferenceType::new(false, vec![InferredPart::Unknown]), + EP::If(_) | EP::While(_) | EP::Assert(_) => + // Must be a boolean + InferenceType::new(true, vec![InferredPart::Bool]), + EP::Return(_) => + // Must match the return type of the function + if let DefinitionType::Function(func_id) = self.definition_type { + let return_parser_type_id = ctx.heap[func_id].return_type; + self.determine_inference_type_from_parser_type(ctx, return_parser_type_id) + } else { + // Cannot happen: definition always set upon body traversal + // and "return" calls in components are illegal. + unreachable!(); + }, + EP::New(_) => + // Must be a component call, which we assign a "Void" return + // type + InferenceType::new(true, vec![InferredPart::Void]), + EP::Put(_, 0) => + // TODO: Change put to be a builtin function + // port of "put" call + InferenceType::new(false, vec![InferredPart::Output, InferredPart::Unknown]), + EP::Put(_, 1) => + // TODO: Change put to be a builtin function + // message of "put" call + InferenceType::new(true, vec![InferredPart::Message]), + EP::Put(_, _) => + unreachable!() + }; + + self.expr_types.insert(expr_id, inference_type); + } + + fn determine_inference_type_from_parser_type( + &mut self, ctx: &mut Ctx, parser_type_id: ParserTypeId + ) -> InferenceType { use ParserTypeVariant as PTV; + use InferredPart as IP; + + let mut to_consider = VecDeque::with_capacity(16); + to_consider.push_back(parser_type_id); + + let mut infer_type = Vec::new(); + let mut has_inferred = false; - let mut to_consider = vec![parser_type_id]; while !to_consider.is_empty() { - let parser_type_id = to_consider.pop().unwrap(); + let parser_type_id = to_consider.pop_front().unwrap(); let parser_type = &ctx.heap[parser_type_id]; - match &parser_type.variant { + PTV::Message => { infer_type.push(IP::Message); }, + PTV::Bool => { infer_type.push(IP::Bool); }, + PTV::Byte => { infer_type.push(IP::Byte); }, + PTV::Short => { infer_type.push(IP::Short); }, + PTV::Int => { infer_type.push(IP::Int); }, + PTV::Long => { infer_type.push(IP::Long); }, + PTV::String => { infer_type.push(IP::String); }, + PTV::IntegerLiteral => { unreachable!("integer literal type on variable type"); }, PTV::Inferred => { - self.env.insert(parser_type_id, InferenceType::new(parser_type_id)); + infer_type.push(IP::Unknown); + has_inferred = true; + }, + PTV::Array(subtype_id) => { + infer_type.push(IP::Array); + to_consider.push_front(*subtype_id); + }, + PTV::Input(subtype_id) => { + infer_type.push(IP::Input); + to_consider.push_front(*subtype_id); + }, + PTV::Output(subtype_id) => { + infer_type.push(IP::Output); + to_consider.push_front(*subtype_id); }, - PTV::Array(subtype_id) => { to_consider.push(*subtype_id); }, - PTV::Input(subtype_id) => { to_consider.push(*subtype_id); }, - PTV::Output(subtype_id) => { to_consider.push(*subtype_id); }, PTV::Symbolic(symbolic) => { - // variant is resolved in type table if a type definition, - // or in the linker phase if in a function body. - debug_assert!(symbolic.variant.is_some()); - + debug_assert!(symbolic.variant.is_some(), "symbolic variant not yet determined"); match symbolic.variant.unwrap() { SymbolicParserTypeVariant::PolyArg(_, arg_idx) => { - // Points to polyarg, which is resolved by definition - debug_assert!(arg_idx < self.polyvars.len()); + // Retrieve concrete type of argument and add it to + // the inference type. debug_assert!(symbolic.poly_args.is_empty()); // TODO: @hkt - - let mut inferred = InferenceType::new(parser_type_id); - inferred.assign_concrete(&self.polyvars[arg_idx]); - - self.infer_types.insert(parser_type_id, inferred); + debug_assert!(arg_idx < self.polyvars.len()); + for concrete_part in &self.polyvars[arg_idx].v { + infer_type.push(IP::from(*concrete_part)); + } }, SymbolicParserTypeVariant::Definition(definition_id) => { - // Points to a type definition, but if it has poly- - // morphic arguments then these need to be inferred. + // TODO: @cleanup + if cfg!(debug_assertions) { + let definition = &ctx.heap[definition_id]; + debug_assert!(definition.is_struct() || definition.is_enum()); // TODO: @function_ptrs + let num_poly = match definition { + Definition::Struct(v) => v.poly_vars.len(), + Definition::Enum(v) => v.poly_vars.len(), + _ => unreachable!(), + }; + debug_assert_eq!(symbolic.poly_args.len(), num_poly); + } + + infer_type.push(IP::Instance(definition_id, symbolic.poly_args.len())); + let mut poly_arg_idx = symbolic.poly_args.len(); + while poly_arg_idx > 0 { + poly_arg_idx -= 1; + to_consider.push_front(symbolic.poly_args[poly_arg_idx]); + } } } - }, - _ => {} // Builtin, doesn't require inference + } } } - Ok(()) + InferenceType::new(!has_inferred, infer_type) + } + + /// Construct an error when an expression's type does not match. This + /// happens if we infer the expression type from its arguments (e.g. the + /// expression type of an addition operator is the type of the arguments) + /// But the expression type was already set due to our parent (e.g. an + /// "if statement" or a "logical not" always expecting a boolean) + fn construct_expr_type_error( + &self, ctx: &Ctx, expr_id: ExpressionId, arg_id: ExpressionId + ) -> ParseError2 { + // TODO: Expand and provide more meaningful information for humans + let expr = &ctx.heap[expr_id]; + let arg_expr = &ctx.heap[arg_id]; + let expr_type = self.expr_types.get(&expr_id).unwrap(); + let arg_type = self.expr_types.get(&arg_id).unwrap(); + + return ParseError2::new_error( + &ctx.module.source, expr.position(), + "Incompatible types: this expression expected a '{}'" + ).with_postfixed_info( + &ctx.module.source, arg_expr.position(), + "But this expression yields a '{}'" + ) + } + + fn construct_arg_type_error( + &self, ctx: &Ctx, expr_id: ExpressionId, + arg1_id: ExpressionId, arg2_id: ExpressionId + ) -> ParseError2 { + // TODO: Expand and provide more meaningful information for humans } } \ No newline at end of file