Generator

Input

Output

About

This is a parser generator for JavaScript

Syntax

The grammar is comprised of definitions / production rules in the form
name = expression
or
name : "name for errors" = expression
Names are identifiers and can only include alphanumeric characters (uppercase, lowercase, and digits) and _, and they cannot start with a digit
The optional string after the colon is used in error messages. The characters should be lowercased (or uppercased) as if it were in the middle of a sentence.
The starting production rule is by default main, but the generated parser can be called with a specific starting production rule to allow parsing from any production rule.
Important: whitespace is not treated differently to any other characters in input text, so you have to explicitly accept whitespace between tokens (TODO maybe add a way around this?).

An expression can yield/emit values that will be part of the result of using the generated parser. Expressions can also yield nothing, or discard the yield from a sub-expression.
An expression can chooose to not consume the input to the parser, leaving it to be consumed by another expression. The expression can still yield things or not yield anything.
The output of the generated parser is an array-like object containing a list of yielded values.

Whitespace in the grammar of parsergen4 is in a few cases significant. a(b) is a named expression but a (b) is a concatenation expression with b in parentheses.

These are the expression types, not ordered in any particular order:

Name Syntax Description
String "string here" Matches the specified string. Uses js-like escape sequences. Yields nothing.
Identifier name Matches the expression of the definition / production rule named. If none exist, it attempts to get a js-implemented one from an optional js function in parser's arguments, which can be used for loads of great stuff.
Concatenation expression1 expression2 ... Matches all the sub-expressions concatenated together. Concatenates the yields of the expressions. Fails (does not match) if any sub-expressions fail.
Parentheses ( expression ) Matches the sub-expression. Just changes the precedence of the expression.
Choice expression1 | expression2 | ... Matches the first matching sub-expression, or doesn't match if none match.
Repeat { expression } Greedily matches 0 or more repetitions of the sub-expression, as many repetitions as possible.
Repeat 1 or more { expression }+ Greedily matches 1 or more repetitions of the sub-expression, as many as possible.
Optional [ expression ] Matches the sub-expression if possible, otherwise matches but yields nothing.
Not / Not followed by !expression Matches if the sub-expression does not match. Yields nothing. Because this only matches if the sub-expression is not matched, this will never consume anything.
And / Followed by &expression Matches if the sub-expression matches, does not consume, and discards it's yield (yields nothing).
No emit / Skip !!expression Matches if the sub-expression matches, and consumes, but discards it's yield (yields nothing).
No consume / Yield following &&expression Matches if the sub-expression matches, does not consume, but yields anyway (yields the sub-expression's yield).
Range A-B Matches any character in the inclusive range from the character before the hyphen to the character after the hyphen. Allows the same escape sequences as in strings, for example: \x00-\xFF
Backquote emit `text to emit` Yields the specified text, in the output of the generated parser as a javascript string. Does NOT allow escape sequences to allow code with backslashes and such. This can be used to make the generated parser effectively be a compiler, by concatenating the javascript strings emitted by these in the parser's output, but this does lead to messy grammar code.
Emit matched text %expression Matches the sub-expression but discards it's yield, and instead, yields the input text that the sub-expression consumed, in the output of the generated parser as a javascript string. Allows escape sequences. Can be used with backquote emits to make quick compilers.
Named name(expression) Matches the sub-expression. Wraps the yield, a list of values, in a named value. Named values are explained later.

Comments start with #.

Named values

A named value is a grouped yield (list of values) with a name. In the parser's output it is an array-like object with the list of values.
If you access a property on the named value, it returns all the named-values inside it with that name with concatenated together into one such named value, of course with yields concatenated.
Named values in the output also have a .emitted property to get the emitted js strings joined together, and a .text property in case you want the consumed input text of the named value.
The return value of the generated parser functions the same as a named value.

An relatively simple example:
Grammar:
whitespace = { " " | "\t" | "\r" | "\n" } # Optional whitespace _ = whitespace # An alias for whitespace that I like to use integer = ["-" `-`] {%0-9}+ # Here we emit the digits as javascript strings by using %, you can also use backquote strings as showed for the - sign. identifier = !0-9 {( A-Z | a-z | 0-9 | "_" )}+ # Here I show that you can do it a different and easier way, just use the .text property instead in javascript, and you dont have to emit stuff in grammar main = _ "{" _ Assignments({ assignment _ }) _ "}" _ assignment = Assignment( Key(identifier) _ "=" _ Value(expression) ) expression = Identifier(identifier) | Integer(integer) The named values start with an uppercase letter to avoid name collisions with properties of the js named value object and to make it easier to read, but they can even have the same name as production rules.
Test input:
{ my_thing = hello a = b c = d} JavaScript example usage after generating the parser:
const input_text = ` { my_thing = hello a = b c = d} `; const ast = parse(input_text); for(let assignment of ast.Assignments) { if(assignment.Value.Integer) console.log(assignment.Key.text, "=", assignment.Value.Integer.emitted); // Here i use .emitted else if(assignment.Value.Identifier) console.log(assignment.Key.text, "=", assignment.Value.Identifier.text); // Here i use .text instead to show the difference, it is easier because you don't need to emit/yield strings in the grammar else throw new Error("Should not happen"); }

TODO: show example of idend needed, show example of simple compiler (bf to c?)