How Glimmer.Js compiler powers the Ember.Js ecosystem

  Ember.js might not be the hottest trend in town, but it is battle hardened and jam packed with more than a decade of learning & best practices in production. We are looking at a virtual machine that powers Ember.js called - Glimmer-VM. Any developer who visited sites like Linkedin would definitely have got their response served after execution within these codes.

  The main repo hosts multiple packages, among which we are going to focus on the Parser and Compiler. The Compiler is a multi-pass compiler package, part of Glimmer VM. It converts the template into Glimmer binary bytecode, which is called as the wire-format. This compiler being optimizing, it needs to know the list of all templates recursively beforehand. Final output is saved in .gbx format and served to client by frameworks like ember.js.

Precompiler

During this conversion, the template go throught a state called intermediate representation (IR) with symbolic references to handles like templates, helpers etc. This phase is called precompile, from @glimmer/compiler package. And during the second pass, linking is done to resolve all references. The references are stored in External module table, which is a data structure to enable execution of Glimmer bytecode in JavaScript VM. This helps to rehydrated with minimal overhead.

Let's step into each stage:

@glimmer/syntax overview

  It has multiple submodules, like parser to convert to AST and generation to go in the reverse direction. Along the way it has few other submodules like traversal, source etc. We are ignoring ASTv1 for now, and considering only ASTv2 flow alone.

  The main parsing task itself is split into tokenization and AST conversion. The tokenization is a very compute intensive task, which is common across all programming code processings. Example, its intensity is strongly highlighted in code comments of PostCSS library, which also reflects in that library's code for tokenization. We will discuss this as part as a separate article.

Detailed discussion

Before we get into the details, we should keep in mind that many IR would be in the form of a Tuple (typescript), or array of fixed length with predefined types of its elements.

Now lets get into the nitty gritty of the whole process.

  The story begins with precompile function in @glimmer/compiler, with optional argument to set id and custom component name. Along with these params, we can pass in plugins for AST transformations which we will discuss later. The source code is converted to ASTv2 the code using normalize method by passing in the source code wrapped in Source, this is a short gist. And the Source helps to track the location of tokens in source code for debugging.

  Next, the flow is passed to @glimmer/syntax to do the preprocessing in precompile mode, which directly employs @handlebars/parser to generate the AST. You can explore astexplorer.net, with handlerbars option to understand the return value. Offset of tokens in terms of column and line format is computed using SourceSpan class and added to loc in AST. We are not going into the codemod options here.

Now, this AST is of type HBS.Program for which we have a HandlebarsNodeVisitors to handle different node types like Program, BlockStatement, StringLiteral, PathExpression etc. They help to convert to ASTv1.Template. Finally, the list of plugins which were passed via precomile optional params are executed sequentially in a loop. This itself can be a separate topic where we can include various custom codemods or push any custom addition syntax into a separate entity/file for further processing.

Final representation is expected to look like this, a representation of the precompile result:

wire-format.d.ts

export interface SerializedTemplateWithLazyBlock {
  id?: Nullable<string>;
  block: SerializedTemplateBlockJSON;   // Stringified JSON, alias for `string`
  moduleName: string;
  scope?: (() => unknown[]) | undefined | null;
  isStrictMode: boolean;
}

The stage is followed by a call to buildStatements, which returns WireFormat.Statement[].

Encoded Wire Format

  UI rendering can be done using IR itself, but it is further optimized to produce OpCodes to ship optimized payload to client side. A 6-bit representation is chosen with a mapping between a character and binary number. It can be even called as Base89 encoding using that map. There are fixed rules and values for primitives, expressions etc. Example given was:

template.hbs

<h1>Welcome to the Glimmer Playground!</h1>
<p>You have clicked the button {{count}} times.</p>
<button onclick={{action increment}}>Click</button>

which is converted to a wire-format:

wire-format.json

{
  "statements": [
    [9, "h1", true][10],
    [1, "Welcome to the Glimmer Playground!", true],
    [11],
    [9, "p", true],
    [10],
    [1, "You have clicked the button ", true],
    [1, [26, 0, "AppendSingleId"], false], // this string isn't right
    [1, " times", true],
    [11],
    [9, "button", true],
    [18, "onclick", [31, [25, 1], [[25, 2]]]],
    [10],
    [1, "Click", true][11]
  ],
  "upvars": ["count", "action", "increment"]
}

and Base89 encoded to:

output.txt

2qcount"qaction"qincrement"Iqh1"8qWelcome to the Glimmer Playground!"HIp"8q"You have clicked the button "0C08q times"HGbutton"Qonclick"UCaction"Cincrement"H8qClick"I

Conclusion

  This is the optimized payload sent to client side. Knowing compilation is not mandatory for learning ember. But this comes handy when doing optimizations like HTML minification, extracting metadata from templates into separate file during build etc.

References