Lexing
Lexing is the act of taking in an input stream and splitting it into lexemes.
Colloquially, lexing is often described as splitting input into words. In
grmtools
, a Lexeme has a type (e.g. "INT", "ID"), a value (e.g. "23",
"xyz"), and knows which part of the user's input matched (e.g. "the input
starting at index 7 to index 10"). There is also a simple mechanism to
differentiate lexemes of zero length (e.g. DEDENT
tokens in Python) from
lexemes inserted by error recovery.
Users can write custom lexers that conform to the lrpar::lex::Lexer
trait.
This API allows users to deal with streaming data since the parser asks the
Lexer
for one token at a time. However, note that users can later ask the
Lexer
to return the string from the input matching a lexeme: users need to
buffer input to provide this information.
Hand-written lexers are not particularly difficult to write and, for better or
worse, are necessary for many real-world languages. However, a subset of
languages can use a simpler lex
/flex
style approach to lexing, for which
lrlex
can be used.