Geode Logo

Parsing Generics is hard.

One of my summer projects has been working on a programming language called geode. here’s a simple look at it in it’s current state:

is main
include "std:io"


func fib(int n) int {
	if n < 2 {
		return n;
	}
	return fib(n - 1) + fib(n - 2);
}

func main(int argc) int {
	int a;
	a <- 30;
	io:print("%d", fib(a));
	return 0;
}

It’s pretty simple right now - no structs, you cannot assign to an array index (though you can read from one), and I want to do some more work with generics - the topic of this post

Here’s what I’d like to have regarding Generics:

# std:mem
is mem
func get<T>(int size) *T {
	# ...
}


# ./main.g
is main
include "std:mem"
func getData *int -> mem:get<int>(300);

The std:mem standard library will contain a get function that will simply take the type of data and a count and return a pointer to memory containing that many of that type. (I’ll probably do another post about the dependency system later, but it’s much like c’s with simpler namespacing than c++.)

Well, as it turns out, the < operator is used in more places than you’d think. For example, in the less than operation, foo < bar. It’s also used in the above generic malloc function call. This makes it difficult to tell if a certain variable is a logic operation or an attempt at a generic function call. My lexer is contextless, so I cannot determine at lex time if a certain token is the start of a generic statement or just a less than sign.

The big problem

The big problem lies here:

mem:get<int>(300);
       ^   ^

When my parser comes across an intentifier token (a keyword, function name, variable name, etc…) it calls a single function and that function does all determinations of structure. Because of this limitation of most recursive decent parsers, The parser cannot determine the difference between these two statements:

# stmt 1
int a := foo<int>();

# stmt 2
int a := foo < bar;

This is a major problem as it limits what I can do with this language. When my parser comes across an ident, it follows 1 of three paths after parsing the identifiers name, currently. It can,

  • parse parenthesis (means this identifier leads to a function call)
  • parse a generic
  • end because neither of the previous statements are true, resulting in a variable reference node

I managed to get it somewhat working by having a state saver system in my parser. This means I can save the state of a parser and return to that state if something happened that shouldn’t have. For example, this is how I determine if the node has generics or not:

func (p *Parser) parseIdentifierExpr(allowVariableDefn bool) Node {
	// ...
	name := p.parseName()
  // An array of generic symbols
	var generics []*GenericSymbol
	// save the parsers state
	state := p.save()
	genValid := false
	// parse the generic if I can
	if p.token.Is(lexer.TokOper) && p.token.Value == "<" {
		generics, genValid = p.parseGenericExpression(false)
	}
	// If the generic parsing failed, restore state
	if !genValid {
		p.restore(state)
	}
	// ...

This works for basic function calls, but in more complex code it falls apart.

int foo := bar < baz > quux;

This would parse as a generic expression even though it is not because the token layout of a generic statement is

TokIdent TokOper(<) TokIdent [TokComma TokIdent]... TokOper(>)

which matches the above code perfectly well.

My solution

Turns out, I haven’t found one besides the one written above regarding saving and restoring parsing states. I need to test languages that have generics to see how they implement it.


An as I was writing this, I found another bit of code that matches the required structure to be a generic:

someFunction(a < foo, foo, foo > b);

yay