Changeset - d05cbf5057e3
[Not reviewed]
0 3 1
MH - 4 years ago 2021-07-09 14:12:53
contact@maxhenger.nl
extend documentation, add initial tests for failing cases
4 files changed with 93 insertions and 13 deletions:
0 comments (0 inline, 0 general)
docs/types/infinite_types.md
Show inline comments
 
@@ -32,207 +32,236 @@ union A { B(B), C(C) }
 
union B { A(A), C(C) }
 
union C { A(A), B(B) }
 
```
 

	
 
If one wishes to express a type whose value causes it to terminate somewhere, the only option in this language is to use `union`s to indicate the optionality of a certain member. For example:
 

	
 
```pdl
 
// One option, allows setting branches individually
 
union Option<T>{ Some(T), None }
 
struct U32TreeExample {
 
    u32 value, 
 
    Option<TreeExample> left,
 
    Option<TreeExample> right,
 
}
 

	
 
// Another option, must be a well-formed binary tree
 
union U32TreeLink {
 
    Leaf,
 
    Node(U32TreeLink, U32TreeLink)
 
}
 
struct U32TreeNode {
 
    u32 value,
 
    U32TreeLink link,
 
}
 

	
 
// Another kind of thing
 
union List<T> { 
 
    End,
 
    Entry(T, List<T>)
 
}
 
```
 

	
 
These are all valid types, and we should be able to express them, but naively calculating their size causes one to run into issues. 
 

	
 
## Onwards to an Algorithm
 

	
 
So far one can see that `struct`s and `union`s are the elements that might constitute an infinite type. All of the other types (user defined `enum`s and all of the builtins like `u32`, `string`, `s64`, etc.) do not have any members and have a fixed size. Whenever there is a union in the type, we might have a type that is not truly infinite, but may terminate somewhere.
 

	
 
Imagine a graph containing all user-defined `struct` and `union` types. The edges in this graph are directional, and indicate that one type is embedded in the other. It is important to construct this graph on a per-member basis. `struct` members may have one edge running towards it, coming from the member's type. `union`s can have multiple edges running towards it, one for each embedded type in the union member. Note that types may refer to themselves.
 

	
 
Having constructed this graph, one can visualise the potentially infinite types by loops in the graph. A struct is potentially infinite if one of its members is part of a loop. A union is potentially infinite if all of its members are part of a loop. A potentially infinite type is a truly infinite type if the relevant loops all contain potentially infinite types. Or, conversely: a potentially infinite type is not actually infinite if the relevant loops contain a union that is not potentially infinite.
 

	
 
More practically: a potentially infinite struct (implying that at least one of its members is part of a type loop) is not infinite if each of its "looped members" is part of a loop which contains a non-infinite union. Likewise, a potentially infinite union (implying that all of its members are part of a loop) is not actually infinite if, for one of its members that contains a set of loops (because a union can embed multiple types per variant), each of those loops contain a non-infinite union.
 

	
 
And finally, in human terms: a type is not actually infinite if, given enough time, a programmer can express a literal of that type. So one may type (using the examples above):
 

	
 
```pdl
 
auto thing = List::Entry(3, List::Entry(2, List::Entry(1, List::End)));
 
auto flong = U32TreeExample{
 
    value: 0,
 
    left: Option::Some(U32TreeExample{
 
        value: 1,
 
        left: Option::None
 
    },
 
    right: Option::None
 
}
 
```
 

	
 
But one simply cannot express:
 

	
 
```pdl
 
auto cannot_construct = A::B(B::C(C::A(A::C(C::B( /* onwards, to infinity )))));
 
```
 

	
 
Now we arrive at several conditions to end up at a concrete algorithm. Firsly we have that types whose literals are expressable should be constructable in the language. Secondly, for the performance reasons mentioned above we would like to specify the memory layout for each type. So that includes the size and alignment of the type itself, and the offset (with implicit alignment and size) of each of its members. Thirdly, because we're going to have to put pointers somewhere in the aforementioned type loops (necessarily, because we have variably sized values, we need to perform some kind of allocation somewhere, hence we need to have pointers somewhere), we need these pointers to be placed consistently throughout the code. This last point is a design decision: we could decide to have some types including pointers to members, and some types excluding pointers to members. This would then require special conversion functions implemented when two values of the same type, but with a different memory layout, are converted into one another. This on top of the fact that each of the types containing a pointer will require a constructor and a destructor. Finally, albeit this point looks ahead a bit at the implementation, we desire that subsequent compilations of the same files, but potentially in a different order, result in the same layout of each of the types.
 

	
 
We may observe that only unions may break up potentially infinite types. So for each loop that contains a non-infinite union we need at least one union that is implemented using some kind of pointer. There are multiple ways we can fix the byte size of the union, thereby breaking the type loop:
 

	
 
1. Make each instance of the union a pointer to the data in that union: both the tag and the potentially embedded values. This has the upside of being relatively simple to implement. Furthermore the union itself will just have a fixed size. The downside is that for a tag lookup we have to follow the pointer, generally causing cache misses in the case that we just want to check the tag.
 
   
 
2. Keep the tag in a union value, but dynamically allocate all of the contents. This way the size of a union value becomes a tag and a pointer. On 32-bit systems this is most likely fine, on 64-bit systems (which is the norm, if we're not talking about embedded computers) alignment will generally cause the union size to bloat to 16-bytes. This kind of implementation has two subcases to consider. 
 
   
 
    - Allocate as much memory as the maximum union variant. This way changing the union variant will not force a reallocation. Instead we can use up the allocated memory.
 
      
 
    - Allocate as much memory as the variant requires. This way we will not waste any memory, at the cost of having to reallocate when the variant is changed.
 
    
 
3. Keep the tag in the union value, with the same up- and downsides as the previous point. Then dynamically allocate the members of the variants (that is, if we have `union Foo { Bar(CyclicType, u64, u64), Qux }`, then we will just make `CyclicType` a pointer). The obvious downside is that changing variants might cause (de)allocations. And if we have multiple members that cause a loop then we need to allocate each of them. The upside is that when we're only accessing the non-pointer members that they're likely already in the cache.
 

	
 
There are many more tiny variants here that are not explicitly stated. I'll document my reasons for choosing the variant I think works best:
 

	
 
- The whole idea of breaking the type cycles is not to allocate an infinite amount of memory, so always allocating the union is naturally out of the question. Furthermore type loops will generally only occur in particular datastructures that are potentially infinite: trees, linked lists, hashmaps, etc.
 
  
 
- The contract when using a union is that the byte size of a union value is the size of the largest member plus the size of the tag (plus padding). So allocating the union just once, with enough size to fit all possible allocated variants seems like a good idea to me. I suspect that in most cases these unions will express something along the lines of: there is nothing, or there is nothing.
 

	
 
Finally, as a note: all of this type-loop checking has to be performed per monomorphization. A polymorphic type specification itself does not need to be checked for type loops.
 

	
 
Lastly, we have to consider the point of making sure that multiple compilations, where the AST is constructed in a different order, hence where the types are "discovered" in a different order, will result in the same types. As a rather simple solution, instead of just marking one union in the type loop as pointerlike, all of the unions in the type loops will be implemented using some kind of pointer scheme.
 

	
 
## Concluding on the Algorithm
 

	
 
As an initial conclusion:
 

	
 
- All non-infinite unions in type loops are implemented using dynamic allocation to break the type loops and to prevent potentially infinite types from being truly infinite.
 
- The tag of the union will be a non-allocated value associated with the union. Considering that most of the time these kinds of unions will represent an option-like type (there is one more entry in the linked list, there is one more node in the tree, etc.) I think it is nice to not have cache misses when checking whether a value is present, or whether it is not present.
 
- Considering that we cannot allocate all of the time (because then we will still end up with an infinite amount of allocated memory), but I do not currently want to implement very complicated logic in the compiler for handling memory allocations, I will initially implement the following scheme: all union variants that do not contain any types part of a type loop will be stored on the stack, contributing to the byte size of the union. All variants that do contain such types will be fully allocated on the heap in one allocation. The size of the allocated region will be the size of the largest variant that requires dynamic allocation.
 

	
 
## Rought Specification of the Algorithm
 
## Rough Specification of the Algorithm
 

	
 
I might be missing some stuff here, that stuff will probably pop up while I'm implementing this:
 

	
 
- 
 
   Perform the basic pre-compilation for symbol discovery and definition discovery. This constructs the AST such that we can parse the AST to determine what fields are present on each `struct` and what variants are used in each `union`.
 

	
 
-
 
    For each type added to the type table (including monomorphs where we have all of the polymorphic variables fully specified), we will traverse all of the members and their types. For each type we will check if was already checked not to be cyclic. If it was already completely checked, then all types up until the most recently checked type cannot be cyclic (another member might still make them cyclic). If the type was not completely checked, but previously encountered in our search for loops, then all of types from the first encounter of that type are considered part of a type loop.
 

	
 
- 
 
   For each type added to the type table (including monomorphs where we have all of the polymorphic variables fully specified), we will traverse all of the members and their types. For each type we will check if was already checked not to be cyclic. If it was already checked, then necessarily it isn't cyclic and we can stop further traversal (to prevent the complexity of the algorithm to grow beyond linear). If it is cyclic, then necessarily this will be the first time we encounter the loop.
 
- 
 
   Note that this algorithm is recursive in the sense that for structs we need to check each member for type loops. For unions we need to check each embedded type in each variant. So when we encounter a type loop for a specific set of members, we walk back and mark each of the unions along the way as requiring allocation. If there were no such unions then we immediately return an error. We keep all of the unions that require allocation around in a list.
 
    Note that this algorithm is recursive in the sense that for structs we need to check each member for type loops. For unions we need to check each embedded type in each variant. So when we encounter a type loop for a specific set of members, we walk back and mark each of the unions along the way as requiring allocation. If there were no such unions then we immediately return an error. We keep all of the unions that require allocation around in a list.
 
  
 
- 
 
   While we're doing the above, if we encounter a union who has a member that does not exist in a type loop then we mark it as non-infinite.
 
   
 
- 
 
   After all of the above is done, we need to check all of the unions we have encountered. Necessarily if we started with a particular type, and recursively checked all of the embedded types, then we will discover *all* type loops. If none of the unions is non-infinite, then we throw an error (which might be kind of hard, because what kind of error do we show in the case of the `union A; union B; union C` example?). With the algorithm above, we now know that if there are type loops, that each of them contains at least one union. With the condition of at least one non-infinite union we now know that the type is expressable. The constructed list of unions only contains unions that were part of a type loop.
 
   After all of the above is done, we need to check all of the unions we have encountered. Necessarily if we started with a particular type, and recursively checked all of the embedded types, then we will discover *all* type loops. If none of the unions is non-infinite, then we throw an error (which might be kind of hard, because what kind of error do we show in the case of the `union A; union B; union C;` example?). With the algorithm above, we now know that if there are type loops, that each of them contains at least one union. With the condition that there is at least one non-infinite union we now know that the type is expressable. The constructed list of unions only contains unions that were part of a type loop.
 

	
 
   With that knowledge, to guarantee consistent compilation, we will decide to make each union in each type loop, whether it is infinite or non-infinite, a pointerlike union.
 

	
 
-
 
   A pointerlike union's value will consist of a tag, plus reserved size (with the proper alignment) for the largest non-pointerlike union variant if it is larger than the size of a pointer on the particular machine. If it is smaller, than we just need a tag, and a pointer that is properly aligned. So the union value will have a fixed size that is not dependent on the types that were part of the type cycle. I'll call the variants that were not part of type cycles "value variants", and parts that *were* part of type cycles "pointer variants".
 

	
 
Then, during runtime, as an initial implementation (we could be smarter if we know the old value and know the new value, but that would take a lot of static analysis, we're not yet at that point in the compiler), we will check the tag value whether the old and new values are pointers. If the old and new variants are both pointer variants, or when they're both value variants, then we do not (re)allocate. If we go from a value to a pointer variant then we allocate. If we go from a pointer variant to a value variant then we deallocate.
 
Then, during runtime, as an initial implementation (we could be smarter if we know the old value and know the new value, but that would take a lot of static analysis, we're not yet at that point in the compiler), we will check the tag value whether the old and new values are pointers. If the old and new variants are both pointer variants, or when they're both value variants, then we do not (re)allocate. If we go from a value variant to a pointer variant then we allocate. If we go from a pointer variant to a value variant then we deallocate.
 

	
 
## Some Examples of the Algorithm
 

	
 
I wrote these before I started writing all of the above, I'll keep it here for some concrete examples of the algorithm.
 

	
 
### Truly Infinite Example with Structs
 

	
 
```pdl
 
// Most simple case
 
struct SelfReferential{ SelfReferential field }
 

	
 
// Slightly more complicated case
 
struct DualA { DualB b }
 
struct DualB { DualA a }
 
```
 

	
 
In some sense we would be able to compute the sizes of these datastructures, we simply make their fields pointers. Let's assume 64 bit systems for all of these examples. For just the first case we would then have that `SelfReferential` has a size of 8 bytes. However, we always need to allocate its member `field`. So this would always take up an infinite amount of memory. We disallow this case because we would never be able to construct a literal of these kinds of datatypes.
 

	
 
### Truly Infinite Example with Unions
 

	
 
```pdl
 
struct S { UOne one, UTwo two }
 
union UOne { Struct(S), One(UOne), Two(UTwo) }
 
union UTwo { One(UOne), Two(UTwo) } 
 
```
 

	
 
Again, here we have a case where we can compute the sizes. We make the unions have pointerlike contents. So the unions will be 16 bytes (8 bytes for the tag and padding for pointer alignment, then an 8 byte pointer). Hence the struct will be 32 bytes. But once more we can never construct a literal and instantiating an element would always take up an infinite amount of memory.
 

	
 
### Slightly Ambiguous Example
 

	
 
With the following example we get one union which is non-infinite.
 

	
 
```pdl
 
struct SEnd { u32 some_value }
 
struct SMiddle {
 
   SEnd this_ends,
 
   UOne union_one,
 
   UTwo union_two,
 
}
 
union UOne { One(UOne), Two(UTwo), Struct(SMiddle) }
 
union UTwo { One(UOne), Two(UTwo), End(u32) }
 
```
 

	
 
If we draw the type graph, we get something like:
 

	
 
```pdl
 
+--------+        +---------+        +-------+        +-------+
 
| struct |        | struct  | <----- | union | <----- | union |
 
| SEnd   | -----> | SMiddle | -----> | UOne  | -----> | UTwo  |
 
+--------+        +---------+        +-------+        +-------+
 
                                      |     ^          |     ^
 
                                      |     |          |     |
 
                                      \-----/          \-----/
 
```
 

	
 
For this case we can actually always construct a literal of `SMiddle`. Viewing the literal as a tree (each node a type instance) then we can terminate the tree branches using `UTwo::End`. However, if we would just make `UTwo` pointerlike, we cannot compute the size of `UOne` (it contains a reference to itself, so would grow infinitely in size). This hints at that we should make `UOne` pointerlike as well.
 

	
 
### Slightly Ambiguous Example, Version 2
 

	
 
Not really ambiguous, but just to show that not all unions which are part of a connected type graph need to be turned into pointerlike unions.
 

	
 
```pdl
 
union Value {
 
    Unsigned(u64),
 
    Signed(s64),
 
    Undefined
 
}
 
struct Node {
 
    Value value,
 
    Children children
 
}
 

	
 
union Children {
 
    One(Node),
 
    Two(Node, Node),
 
    None
 
}
 
```
 

	
 
We get a cycle between `Node` and `Children`, so we would want to implement `Children` as being pointerlike. But we don't have to make `Value` pointerlike. In terms of the described algorithm: `Value` is not part of a type loop, so is not considered a candidate for being pointerlike.
 

	
 
### Slightly Ambigious Example, Version 3
 

	
 
Contains part of the type graph that is not part of a type loop. So the associated union should not be pointerlike.
 

	
 
```pdl
 
// Parts that are outside of the type loop
 
struct SOutside {
 
    UOutside next,
 
}
 
union UOutside {
 
    Next(SInside)
 
}
 

	
 
// Parts that are inside the type loop
 
struct SInside {
 
    UInside next,
 
}
 
union UInside {
 
    Next(SInside),
 
    NoNext,
 
}
 
```
 

	
 
Here `UInside` should become pointerlike, but `UOutside` should remain a normal value.
 

	
 
### Which Union Breaks the Cycle?
 

	
 
```pdl
 
struct S { UOne one }
 
union UOne { Two(UTwo), Nope }
 
union UTwo { Struct(S), Nope }
 
```
 

	
 
Here we see that we have a type loop to which both unions contribute. We can either lay out `UOne` as pointerlike, or `UTwo`. Both would allow us to calculate the size of the types and make the type expressable. However we need consistent compilation. For this reason and this reason only (because it is much more efficient to only lay out one of the unions as a pointer) we have to make both unions pointerlike. Perhaps in the future some other consistent metric can be applied.
src/protocol/parser/type_table.rs
Show inline comments
 
@@ -80,201 +80,211 @@ impl DefinedTypeVariant {
 

	
 
    pub(crate) fn as_enum(&self) -> &EnumType {
 
        match self {
 
            DefinedTypeVariant::Enum(v) => v,
 
            _ => unreachable!("Cannot convert {} to enum variant", self.type_class())
 
        }
 
    }
 

	
 
    pub(crate) fn as_union(&self) -> &UnionType {
 
        match self {
 
            DefinedTypeVariant::Union(v) => v,
 
            _ => unreachable!("Cannot convert {} to union variant", self.type_class())
 
        }
 
    }
 

	
 
    pub(crate) fn data_monomorphs(&self) -> &Vec<DataMonomorph> {
 
        use DefinedTypeVariant::*;
 

	
 
        match self {
 
            Enum(v) => &v.monomorphs,
 
            Union(v) => &v.monomorphs,
 
            Struct(v) => &v.monomorphs,
 
            _ => unreachable!("cannot get data monomorphs from {}", self.type_class()),
 
        }
 
    }
 

	
 
    pub(crate) fn data_monomorphs_mut(&mut self) -> &mut Vec<DataMonomorph> {
 
        use DefinedTypeVariant::*;
 

	
 
        match self {
 
            Enum(v) => &mut v.monomorphs,
 
            Union(v) => &mut v.monomorphs,
 
            Struct(v) => &mut v.monomorphs,
 
            _ => unreachable!("cannot get data monomorphs from {}", self.type_class()),
 
        }
 
    }
 

	
 
    pub(crate) fn procedure_monomorphs(&self) -> &Vec<ProcedureMonomorph> {
 
        use DefinedTypeVariant::*;
 

	
 
        match self {
 
            Function(v) => &v.monomorphs,
 
            Component(v) => &v.monomorphs,
 
            _ => unreachable!("cannot get procedure monomorphs from {}", self.type_class()),
 
        }
 
    }
 

	
 
    pub(crate) fn procedure_monomorphs_mut(&mut self) -> &mut Vec<ProcedureMonomorph> {
 
        use DefinedTypeVariant::*;
 

	
 
        match self {
 
            Function(v) => &mut v.monomorphs,
 
            Component(v) => &mut v.monomorphs,
 
            _ => unreachable!("cannot get procedure monomorphs from {}", self.type_class()),
 
        }
 
    }
 
}
 

	
 
pub struct PolymorphicVariable {
 
    identifier: Identifier,
 
    is_in_use: bool, // a polymorphic argument may be defined, but not used by the type definition
 
}
 

	
 
/// Data associated with a monomorphized datatype
 
pub struct DataMonomorph {
 
    pub poly_args: Vec<ConcreteType>,
 
}
 

	
 
/// Data associated with a monomorphized procedure type. Has the wrong name,
 
/// because it will also be used to store expression data for a non-polymorphic
 
/// procedure. (in that case, there will only ever be one)
 
pub struct ProcedureMonomorph {
 
    // Expression data for one particular monomorph
 
    pub poly_args: Vec<ConcreteType>,
 
    pub expr_data: Vec<MonomorphExpression>,
 
}
 

	
 
/// `EnumType` is the classical C/C++ enum type. It has various variants with
 
/// an assigned integer value. The integer values may be user-defined,
 
/// compiler-defined, or a mix of the two. If a user assigns the same enum
 
/// value multiple times, we assume the user is an expert and we consider both
 
/// variants to be equal to one another.
 
pub struct EnumType {
 
    pub variants: Vec<EnumVariant>,
 
    pub monomorphs: Vec<DataMonomorph>,
 
}
 

	
 
// TODO: Also support maximum u64 value
 
pub struct EnumVariant {
 
    pub identifier: Identifier,
 
    pub value: i64,
 
}
 

	
 
/// `UnionType` is the algebraic datatype (or sum type, or discriminated union).
 
/// A value is an element of the union, identified by its tag, and may contain
 
/// a single subtype.
 
/// For potentially infinite types (i.e. a tree, or a linked list) only unions
 
/// can break the infinite cycle. So when we lay out these unions in memory we
 
/// will reserve enough space on the stack for all union variants that do not
 
/// cause "type loops" (i.e. a union `A` with a variant containing a struct
 
/// `B`). And we will reserve enough space on the heap (and store a pointer in
 
/// the union) for all variants which do cause type loops (i.e. a union `A`
 
/// with a variant to a struct `B` that contains the union `A` again).
 
pub struct UnionType {
 
    pub variants: Vec<UnionVariant>,
 
    pub monomorphs: Vec<DataMonomorph>,
 
    pub requires_allocation: bool,
 
    pub contains_unallocated_variant: bool,
 
}
 

	
 
pub struct UnionVariant {
 
    pub identifier: Identifier,
 
    pub embedded: Vec<ParserType>, // zero-length does not have embedded values
 
    pub tag_value: i64,
 
    pub exists_in_heap: bool,
 
}
 

	
 
pub struct StructType {
 
    pub fields: Vec<StructField>,
 
    pub monomorphs: Vec<DataMonomorph>,
 
}
 

	
 
pub struct StructField {
 
    pub identifier: Identifier,
 
    pub parser_type: ParserType,
 
}
 

	
 
pub struct FunctionType {
 
    pub return_types: Vec<ParserType>,
 
    pub arguments: Vec<FunctionArgument>,
 
    pub monomorphs: Vec<ProcedureMonomorph>,
 
}
 

	
 
pub struct ComponentType {
 
    pub variant: ComponentVariant,
 
    pub arguments: Vec<FunctionArgument>,
 
    pub monomorphs: Vec<ProcedureMonomorph>
 
}
 

	
 
pub struct FunctionArgument {
 
    identifier: Identifier,
 
    parser_type: ParserType,
 
}
 

	
 
/// Represents the data associated with a single expression after type inference
 
/// for a monomorph (or just the normal expression types, if dealing with a
 
/// non-polymorphic function/component).
 
pub struct MonomorphExpression {
 
    // The output type of the expression. Note that for a function it is not the
 
    // function's signature but its return type
 
    pub(crate) expr_type: ConcreteType,
 
    // Has multiple meanings: the field index for select expressions, the
 
    // monomorph index for polymorphic function calls or literals. Negative
 
    // values are never used, but used to catch programming errors.
 
    pub(crate) field_or_monomorph_idx: i32,
 
}
 

	
 
//------------------------------------------------------------------------------
 
// Type table
 
//------------------------------------------------------------------------------
 

	
 
// TODO: @cleanup Do I really need this, doesn't make the code that much cleaner
 
struct TypeIterator {
 
    breadcrumbs: Vec<(RootId, DefinitionId)>
 
}
 

	
 
impl TypeIterator {
 
    fn new() -> Self {
 
        Self{ breadcrumbs: Vec::with_capacity(32) }
 
    }
 

	
 
    fn reset(&mut self, root_id: RootId, definition_id: DefinitionId) {
 
        self.breadcrumbs.clear();
 
        self.breadcrumbs.push((root_id, definition_id))
 
    }
 

	
 
    fn push(&mut self, root_id: RootId, definition_id: DefinitionId) {
 
        self.breadcrumbs.push((root_id, definition_id));
 
    }
 

	
 
    fn contains(&self, root_id: RootId, definition_id: DefinitionId) -> bool {
 
        for (stored_root_id, stored_definition_id) in self.breadcrumbs.iter() {
 
            if *stored_root_id == root_id && *stored_definition_id == definition_id { return true; }
 
        }
 

	
 
        return false
 
    }
 

	
 
    fn top(&self) -> Option<(RootId, DefinitionId)> {
 
        self.breadcrumbs.last().map(|(r, d)| (*r, *d))
 
    }
 

	
 
    fn pop(&mut self) {
 
        debug_assert!(!self.breadcrumbs.is_empty());
 
        self.breadcrumbs.pop();
 
    }
 
}
 

	
 
/// Result from attempting to resolve a `ParserType` using the symbol table and
 
/// the type table.
 
enum ResolveResult {
 
    Builtin,
 
    PolymoprhicArgument,
 
    /// ParserType points to a user-defined type that is already resolved in the
 
    /// type table.
 
    Resolved(RootId, DefinitionId),
 
    /// ParserType points to a user-defined type that is not yet resolved into
 
    /// the type table.
 
    Unresolved(RootId, DefinitionId)
 
}
 

	
 
@@ -467,217 +477,220 @@ impl TypeTable {
 
        let definition = ctx.heap[definition_id].as_enum();
 

	
 
        let mut enum_value = -1;
 
        let mut min_enum_value = 0;
 
        let mut max_enum_value = 0;
 
        let mut variants = Vec::with_capacity(definition.variants.len());
 
        for variant in &definition.variants {
 
            enum_value += 1;
 
            match &variant.value {
 
                EnumVariantValue::None => {
 
                    variants.push(EnumVariant{
 
                        identifier: variant.identifier.clone(),
 
                        value: enum_value,
 
                    });
 
                },
 
                EnumVariantValue::Integer(override_value) => {
 
                    enum_value = *override_value;
 
                    variants.push(EnumVariant{
 
                        identifier: variant.identifier.clone(),
 
                        value: enum_value,
 
                    });
 
                }
 
            }
 
            if enum_value < min_enum_value { min_enum_value = enum_value; }
 
            else if enum_value > max_enum_value { max_enum_value = enum_value; }
 
        }
 

	
 
        // Ensure enum names and polymorphic args do not conflict
 
        self.check_identifier_collision(
 
            modules, root_id, &variants, |variant| &variant.identifier, "enum variant"
 
        )?;
 

	
 
        // Because we're parsing an enum, the programmer cannot put the
 
        // polymorphic variables inside the variants. But the polymorphic
 
        // variables might still be present as "marker types"
 
        self.check_poly_args_collision(modules, ctx, root_id, &definition.poly_vars)?;
 
        let poly_vars = Self::create_polymorphic_variables(&definition.poly_vars);
 

	
 
        self.lookup.insert(definition_id, DefinedType {
 
            ast_root: root_id,
 
            ast_definition: definition_id,
 
            definition: DefinedTypeVariant::Enum(EnumType{
 
                variants,
 
                monomorphs: Vec::new(),
 
            }),
 
            poly_vars,
 
            is_polymorph: false,
 
        });
 

	
 
        Ok(true)
 
    }
 

	
 
    /// Resolves the basic union definiton to an entry in the type table. It
 
    /// will not instantiate any monomorphized instances of polymorphic union
 
    /// definitions. If a subtype has to be resolved first then this function
 
    /// will return `false` after calling `ingest_resolve_result`.
 
    fn resolve_base_union_definition(&mut self, modules: &[Module], ctx: &mut PassCtx, root_id: RootId, definition_id: DefinitionId) -> Result<bool, ParseError> {
 
        debug_assert!(ctx.heap[definition_id].is_union());
 
        debug_assert!(!self.lookup.contains_key(&definition_id), "base union already resolved");
 

	
 
        let definition = ctx.heap[definition_id].as_union();
 

	
 
        // Make sure all embedded types are resolved
 
        for variant in &definition.variants {
 
            match &variant.value {
 
                UnionVariantValue::None => {},
 
                UnionVariantValue::Embedded(embedded) => {
 
                    for parser_type in embedded {
 
                        let resolve_result = self.resolve_base_parser_type(modules, ctx, root_id, parser_type, false)?;
 
                        if !self.ingest_resolve_result(modules, ctx, resolve_result)? {
 
                            return Ok(false)
 
                        }
 
                    }
 
                }
 
            }
 
        }
 

	
 
        // If here then all embedded types are resolved
 

	
 
        // Determine the union variants
 
        let mut tag_value = -1;
 
        let mut variants = Vec::with_capacity(definition.variants.len());
 
        for variant in &definition.variants {
 
            tag_value += 1;
 
            let embedded = match &variant.value {
 
                UnionVariantValue::None => { Vec::new() },
 
                UnionVariantValue::Embedded(embedded) => {
 
                    // Type should be resolvable, we checked this above
 
                    embedded.clone()
 
                },
 
            };
 

	
 
            variants.push(UnionVariant{
 
                identifier: variant.identifier.clone(),
 
                embedded,
 
                tag_value,
 
                exists_in_heap: false
 
            })
 
        }
 

	
 
        // Ensure union names and polymorphic args do not conflict
 
        self.check_identifier_collision(
 
            modules, root_id, &variants, |variant| &variant.identifier, "union variant"
 
        )?;
 
        self.check_poly_args_collision(modules, ctx, root_id, &definition.poly_vars)?;
 

	
 
        // Construct polymorphic variables and mark the ones that are in use
 
        let mut poly_vars = Self::create_polymorphic_variables(&definition.poly_vars);
 
        for variant in &variants {
 
            for parser_type in &variant.embedded {
 
                Self::mark_used_polymorphic_variables(&mut poly_vars, parser_type);
 
            }
 
        }
 
        let is_polymorph = poly_vars.iter().any(|arg| arg.is_in_use);
 

	
 
        // Insert base definition in type table
 
        self.lookup.insert(definition_id, DefinedType {
 
            ast_root: root_id,
 
            ast_definition: definition_id,
 
            definition: DefinedTypeVariant::Union(UnionType{
 
                variants,
 
                monomorphs: Vec::new(),
 
                requires_allocation: false,
 
                contains_unallocated_variant: true
 
            }),
 
            poly_vars,
 
            is_polymorph,
 
        });
 

	
 
        Ok(true)
 
    }
 

	
 
    /// Resolves the basic struct definition to an entry in the type table. It
 
    /// will not instantiate any monomorphized instances of polymorphic struct
 
    /// definitions.
 
    fn resolve_base_struct_definition(&mut self, modules: &[Module], ctx: &mut PassCtx, root_id: RootId, definition_id: DefinitionId) -> Result<bool, ParseError> {
 
        debug_assert!(ctx.heap[definition_id].is_struct());
 
        debug_assert!(!self.lookup.contains_key(&definition_id), "base struct already resolved");
 

	
 
        let definition = ctx.heap[definition_id].as_struct();
 

	
 
        // Make sure all fields point to resolvable types
 
        for field_definition in &definition.fields {
 
            let resolve_result = self.resolve_base_parser_type(modules, ctx, root_id, &field_definition.parser_type, false)?;
 
            if !self.ingest_resolve_result(modules, ctx, resolve_result)? {
 
                return Ok(false)
 
            }
 
        }
 

	
 
        // All fields types are resolved, construct base type
 
        let mut fields = Vec::with_capacity(definition.fields.len());
 
        for field_definition in &definition.fields {
 
            fields.push(StructField{
 
                identifier: field_definition.field.clone(),
 
                parser_type: field_definition.parser_type.clone(),
 
            })
 
        }
 

	
 
        // And make sure no conflicts exist in field names and/or polymorphic args
 
        self.check_identifier_collision(
 
            modules, root_id, &fields, |field| &field.identifier, "struct field"
 
        )?;
 
        self.check_poly_args_collision(modules, ctx, root_id, &definition.poly_vars)?;
 

	
 
        // Construct representation of polymorphic arguments
 
        let mut poly_vars = Self::create_polymorphic_variables(&definition.poly_vars);
 
        for field in &fields {
 
            Self::mark_used_polymorphic_variables(&mut poly_vars, &field.parser_type);
 
        }
 

	
 
        let is_polymorph = poly_vars.iter().any(|arg| arg.is_in_use);
 

	
 
        self.lookup.insert(definition_id, DefinedType{
 
            ast_root: root_id,
 
            ast_definition: definition_id,
 
            definition: DefinedTypeVariant::Struct(StructType{
 
                fields,
 
                monomorphs: Vec::new(),
 
            }),
 
            poly_vars,
 
            is_polymorph,
 
        });
 

	
 
        Ok(true)
 
    }
 

	
 
    /// Resolves the basic function definition to an entry in the type table. It
 
    /// will not instantiate any monomorphized instances of polymorphic function
 
    /// definitions.
 
    fn resolve_base_function_definition(&mut self, modules: &[Module], ctx: &mut PassCtx, root_id: RootId, definition_id: DefinitionId) -> Result<bool, ParseError> {
 
        debug_assert!(ctx.heap[definition_id].is_function());
 
        debug_assert!(!self.lookup.contains_key(&definition_id), "base function already resolved");
 

	
 
        let definition = ctx.heap[definition_id].as_function();
 

	
 
        // Check the return type
 
        debug_assert_eq!(definition.return_types.len(), 1, "not one return type"); // TODO: @ReturnValues
 
        let resolve_result = self.resolve_base_parser_type(modules, ctx, root_id, &definition.return_types[0], definition.builtin)?;
 
        if !self.ingest_resolve_result(modules, ctx, resolve_result)? {
 
            return Ok(false)
 
        }
 

	
 
        // Check the argument types
 
        for param_id in &definition.parameters {
 
            let param = &ctx.heap[*param_id];
 
            let resolve_result = self.resolve_base_parser_type(modules, ctx, root_id, &param.parser_type, definition.builtin)?;
 
            if !self.ingest_resolve_result(modules, ctx, resolve_result)? {
 
                return Ok(false)
 
            }
 
        }
 

	
 
        // Construct arguments to function
 
        let mut arguments = Vec::with_capacity(definition.parameters.len());
 
        for param_id in &definition.parameters {
 
            let param = &ctx.heap[*param_id];
 
            arguments.push(FunctionArgument{
 
                identifier: param.identifier.clone(),
 
                parser_type: param.parser_type.clone(),
 
            })
 
        }
 
@@ -732,193 +745,192 @@ impl TypeTable {
 
        }
 

	
 
        // Construct argument types
 
        let mut arguments = Vec::with_capacity(definition.parameters.len());
 
        for param_id in &definition.parameters {
 
            let param = &ctx.heap[*param_id];
 
            arguments.push(FunctionArgument{
 
                identifier: param.identifier.clone(),
 
                parser_type: param.parser_type.clone()
 
            })
 
        }
 

	
 
        // Check conflict of argument and polyarg identifiers
 
        self.check_identifier_collision(
 
            modules, root_id, &arguments, |arg| &arg.identifier, "component argument"
 
        )?;
 
        self.check_poly_args_collision(modules, ctx, root_id, &definition.poly_vars)?;
 

	
 
        // Construct polymorphic arguments
 
        let mut poly_vars = Self::create_polymorphic_variables(&definition.poly_vars);
 
        for argument in &arguments {
 
            Self::mark_used_polymorphic_variables(&mut poly_vars, &argument.parser_type);
 
        }
 

	
 
        let is_polymorph = poly_vars.iter().any(|v| v.is_in_use);
 

	
 
        // Construct entry in type table
 
        self.lookup.insert(definition_id, DefinedType{
 
            ast_root: root_id,
 
            ast_definition: definition_id,
 
            definition: DefinedTypeVariant::Component(ComponentType{
 
                variant: component_variant,
 
                arguments,
 
                monomorphs: Vec::new(),
 
            }),
 
            poly_vars,
 
            is_polymorph,
 
        });
 

	
 
        Ok(true)
 
    }
 

	
 
    /// Takes a ResolveResult and returns `true` if the caller can happily
 
    /// continue resolving its current type, or `false` if the caller must break
 
    /// resolving the current type and exit to the outer resolving loop. In the
 
    /// latter case the `result` value was `ResolveResult::Unresolved`, implying
 
    /// that the type must be resolved first.
 
    fn ingest_resolve_result(&mut self, modules: &[Module], ctx: &PassCtx, result: ResolveResult) -> Result<bool, ParseError> {
 
        match result {
 
            ResolveResult::Builtin | ResolveResult::PolymoprhicArgument => Ok(true),
 
            ResolveResult::Resolved(_, _) => Ok(true),
 
            ResolveResult::Unresolved(root_id, definition_id) => {
 
                if self.iter.contains(root_id, definition_id) {
 
                    // Cyclic dependency encountered
 
                    // TODO: Allow this
 
                    let module_source = &modules[root_id.index as usize].source;
 
                    let mut error = ParseError::new_error_str_at_span(
 
                        module_source, ctx.heap[definition_id].identifier().span,
 
                        "Evaluating this type definition results in a cyclic type"
 
                    );
 

	
 
                    for (breadcrumb_idx, (root_id, definition_id)) in self.iter.breadcrumbs.iter().enumerate() {
 
                        let msg = if breadcrumb_idx == 0 {
 
                            "The cycle started with this definition"
 
                        } else {
 
                            "Which depends on this definition"
 
                        };
 

	
 
                        let module_source = &modules[root_id.index as usize].source;
 
                        error = error.with_info_str_at_span(module_source, ctx.heap[*definition_id].identifier().span, msg);
 
                    }
 

	
 
                    Err(error)
 
                } else {
 
                    // Type is not yet resolved, so push IDs on iterator and
 
                    // continue the resolving loop
 
                    self.iter.push(root_id, definition_id);
 
                    Ok(false)
 
                }
 
            }
 
        }
 
    }
 

	
 
    /// Each type may consist of embedded types. If this type does not have a
 
    /// fixed implementation (e.g. an input port may have an embedded type
 
    /// indicating the type of messages, but it always exists in the runtime as
 
    /// a port identifier, so it has a fixed implementation) then this function
 
    /// will traverse the embedded types to ensure all of them are resolved.
 
    ///
 
    /// Hence if one checks a particular parser type for being resolved, one may
 
    /// get back a result value indicating an embedded type (with a different
 
    /// DefinitionId) is unresolved.
 
    fn resolve_base_parser_type(
 
        &mut self, modules: &[Module], ctx: &PassCtx, root_id: RootId,
 
        parser_type: &ParserType, is_builtin: bool
 
    ) -> Result<ResolveResult, ParseError> {
 
        // Note that as we iterate over the elements of a
 
        use ParserTypeVariant as PTV;
 

	
 
        // Result for the very first time we resolve a type (i.e the outer type
 
        // that we're actually looking up)
 
        let mut resolve_result = None;
 
        let mut set_resolve_result = |v: ResolveResult| {
 
            if resolve_result.is_none() { resolve_result = Some(v); }
 
        };
 

	
 
        for element in parser_type.elements.iter() {
 
            match element.variant {
 
                PTV::Void | PTV::InputOrOutput | PTV::ArrayLike | PTV::IntegerLike => {
 
                    if is_builtin {
 
                        set_resolve_result(ResolveResult::Builtin);
 
                    } else {
 
                        unreachable!("compiler-only ParserTypeVariant within type definition");
 
                    }
 
                },
 
                PTV::Message | PTV::Bool |
 
                PTV::UInt8 | PTV::UInt16 | PTV::UInt32 | PTV::UInt64 |
 
                PTV::SInt8 | PTV::SInt16 | PTV::SInt32 | PTV::SInt64 |
 
                PTV::Character | PTV::String |
 
                PTV::Array | PTV::Input | PTV::Output => {
 
                    // Nothing to do: these are builtin types or types with a
 
                    // fixed implementation
 
                    set_resolve_result(ResolveResult::Builtin);
 
                },
 
                PTV::IntegerLiteral | PTV::Inferred => {
 
                    // As we're parsing the type definitions where these kinds
 
                    // of types are impossible/disallowed to express:
 
                    unreachable!("illegal ParserTypeVariant within type definition");
 
                },
 
                PTV::PolymorphicArgument(_, _) => {
 
                    set_resolve_result(ResolveResult::PolymoprhicArgument);
 
                },
 
                PTV::Definition(embedded_id, _) => {
 
                    let definition = &ctx.heap[embedded_id];
 
                    if !(definition.is_struct() || definition.is_enum() || definition.is_union()) {
 
                        let module_source = &modules[root_id.index as usize].source;
 
                        return Err(ParseError::new_error_str_at_span(
 
                            module_source, element.element_span, "expected a datatype (struct, enum or union)"
 
                        ))
 
                    }
 

	
 
                    if self.lookup.contains_key(&embedded_id) {
 
                        set_resolve_result(ResolveResult::Resolved(definition.defined_in(), embedded_id))
 
                    } else {
 
                        return Ok(ResolveResult::Unresolved(definition.defined_in(), embedded_id))
 
                    }
 
                }
 
            }
 
        }
 

	
 
        // If here then all types in the embedded type's tree were resolved.
 
        debug_assert!(resolve_result.is_some(), "faulty logic in ParserType resolver");
 
        return Ok(resolve_result.unwrap())
 
    }
 

	
 
    /// Go through a list of identifiers and ensure that all identifiers have
 
    /// unique names
 
    fn check_identifier_collision<T: Sized, F: Fn(&T) -> &Identifier>(
 
        &self, modules: &[Module], root_id: RootId, items: &[T], getter: F, item_name: &'static str
 
    ) -> Result<(), ParseError> {
 
        for (item_idx, item) in items.iter().enumerate() {
 
            let item_ident = getter(item);
 
            for other_item in &items[0..item_idx] {
 
                let other_item_ident = getter(other_item);
 
                if item_ident == other_item_ident {
 
                    let module_source = &modules[root_id.index as usize].source;
 
                    return Err(ParseError::new_error_at_span(
 
                        module_source, item_ident.span, format!("This {} is defined more than once", item_name)
 
                    ).with_info_at_span(
 
                        module_source, other_item_ident.span, format!("The other {} is defined here", item_name)
 
                    ));
 
                }
 
            }
 
        }
 

	
 
        Ok(())
 
    }
 

	
 
    /// Go through a list of polymorphic arguments and make sure that the
 
    /// arguments all have unique names, and the arguments do not conflict with
 
    /// any symbols defined at the module scope.
 
    fn check_poly_args_collision(
 
        &self, modules: &[Module], ctx: &PassCtx, root_id: RootId, poly_args: &[Identifier]
 
    ) -> Result<(), ParseError> {
 
        // Make sure polymorphic arguments are unique and none of the
 
        // identifiers conflict with any imported scopes
 
        for (arg_idx, poly_arg) in poly_args.iter().enumerate() {
 
            for other_poly_arg in &poly_args[..arg_idx] {
 
                if poly_arg == other_poly_arg {
 
                    let module_source = &modules[root_id.index as usize].source;
 
                    return Err(ParseError::new_error_str_at_span(
 
                        module_source, poly_arg.span,
 
                        "This polymorphic argument is defined more than once"
src/protocol/tests/mod.rs
Show inline comments
 
/**
 
 * protocol/tests.rs
 
 *
 
 * Contains tests for various parts of the lexer/parser and the evaluator of the
 
 * code. These are intended to be temporary tests such that we're sure that we
 
 * don't break existing functionality.
 
 *
 
 * In the future these should be replaced by proper testing protocols.
 
 *
 
 * If any of these tests fail, and you think they're not needed anymore, feel
 
 * free to cast them out into oblivion, where dead code goes to die.
 
 */
 

	
 
mod utils;
 
mod lexer;
 
mod parser_validation;
 
mod parser_inference;
 
mod parser_monomorphs;
 
mod parser_imports;
 
mod parser_binding;
 
mod parser_imports;
 
mod parser_inference;
 
mod parser_literals;
 
mod eval_operators;
 
mod parser_monomorphs;
 
mod parser_types;
 
mod parser_validation;
 
mod eval_binding;
 
mod eval_calls;
 
mod eval_casting;
 
mod eval_binding;
 
mod eval_operators;
 
mod eval_silly;
 

	
 
pub(crate) use utils::{Tester}; // the testing harness
 
pub(crate) use crate::protocol::eval::value::*; // to test functions
 
\ No newline at end of file
src/protocol/tests/parser_types.rs
Show inline comments
 
new file 100644
 
/// parser_types.rs
 
///
 
/// Tests for (infinite) type construction
 

	
 
use super::*;
 

	
 
#[test]
 
fn test_invalid_struct_type_loops() {
 
    Tester::new_single_source_expect_err(
 
        "self-referential struct",
 
        "struct Foo { Foo foo }"
 
    ).error(|e| { e
 
        .assert_occurs_at(0, "Foo {")
 
        .assert_msg_has(0, "cyclic type");
 
    });
 
}
 

	
 
#[test]
 
fn test_invalid_union_type_loops() {
 
    Tester::new_single_source_expect_err(
 
        "self-referential union",
 
        "union Foo { One(Foo) }"
 
    ).error(|e| { e
 
        .assert_occurs_at(0, "Foo {")
 
        .assert_msg_has(0, "cyclic type");
 
    });
 

	
 
    Tester::new_single_source_expect_err(
 
        "two-cycle union",
 
        "
 
        union UOne { Two(UTwo) }
 
        union UTwo { One(UOne) }
 
        "
 
    ).error(|e| { e
 
        .assert_occurs_at(0, "UOne")
 
        .assert_msg_has(0, "cyclic type");
 
    });
 
}
 
\ No newline at end of file
0 comments (0 inline, 0 general)