How Glimmer.Js compiler powers the Ember.Js ecosystem
Ember.js might not be the hottest trend in town, but it is battle hardened and jam packed with more than a decade of learning & best practices in production. We are looking at a virtual machine that powers Ember.js called - Glimmer-VM. Any developer who visited sites like Linkedin would definitely have got their response served after execution within these codes.
The main repo hosts multiple packages, among which we are going to focus on the Parser and Compiler. The Compiler is a multi-pass compiler package, part of Glimmer VM. It converts the template into Glimmer binary bytecode, which is called as the wire-format. This compiler being optimizing, it needs to know the list of all templates recursively beforehand. Final output is saved in .gbx
format and served to client by frameworks like ember.js
.
Precompiler
During this conversion, the template go throught a state called intermediate representation (IR) with symbolic references to handles
like templates, helpers etc. This phase is called precompile, from @glimmer/compiler
package. And during the second pass, linking is done to resolve all references. The references are stored in External module table, which is a data structure to enable execution of Glimmer bytecode in JavaScript VM. This helps to rehydrated with minimal overhead.
Let's step into each stage:
- first, the template string is converted into AST using @glimmer/syntax and Handlebars parser, details below.
- next would be the normalization process where, the Higher-IR (HIR) treats expressions like
yield
,has-block
,#in-element
etc. - now variable bindings and references are identified for linking. This leads to mid-level IR (MIR).
- At last, encoding is done for transmission to the rendering pipeline.
@glimmer/syntax overview
It has multiple submodules, like parser
to convert to AST and generation
to go in the reverse direction. Along the way it has few other submodules like traversal
, source
etc. We are ignoring ASTv1 for now, and considering only ASTv2 flow alone.
The main parsing task itself is split into tokenization and AST conversion. The tokenization is a very compute intensive task, which is common across all programming code processings. Example, its intensity is strongly highlighted in code comments of PostCSS
library, which also reflects in that library's code for tokenization. We will discuss this as part as a separate article.
Detailed discussion
Before we get into the details, we should keep in mind that many IR would be in the form of a Tuple (typescript), or array of fixed length with predefined types of its elements.
Now lets get into the nitty gritty of the whole process.
The story begins with precompile function in @glimmer/compiler
, with optional argument to set id
and custom component name. Along with these params, we can pass in plugins for AST transformations which we will discuss later. The source code is converted to ASTv2
the code using normalize
method by passing in the source code wrapped in Source, this is a short gist. And the Source
helps to track the location of tokens in source code for debugging.
Next, the flow is passed to @glimmer/syntax
to do the preprocessing in precompile
mode, which directly employs @handlebars/parser
to generate the AST. You can explore astexplorer.net, with handlerbars
option to understand the return value. Offset of tokens in terms of column and line format is computed using SourceSpan class and added to loc
in AST. We are not going into the codemod
options here.
Now, this AST is of type HBS.Program
for which we have a HandlebarsNodeVisitors to handle different node types like Program
, BlockStatement
, StringLiteral
, PathExpression
etc. They help to convert to ASTv1.Template
. Finally, the list of plugins which were passed via precomile
optional params are executed sequentially in a loop. This itself can be a separate topic where we can include various custom codemods or push any custom addition syntax into a separate entity/file for further processing.
Final representation is expected to look like this, a representation of the precompile
result:
wire-format.d.ts
export interface SerializedTemplateWithLazyBlock {
id?: Nullable<string>;
block: SerializedTemplateBlockJSON; // Stringified JSON, alias for `string`
moduleName: string;
scope?: (() => unknown[]) | undefined | null;
isStrictMode: boolean;
}
The stage is followed by a call to buildStatements, which returns WireFormat.Statement[]
.
Encoded Wire Format
UI rendering can be done using IR itself, but it is further optimized to produce OpCodes
to ship optimized payload to client side. A 6-bit representation is chosen with a mapping between a character and binary number. It can be even called as Base89 encoding using that map. There are fixed rules and values for primitives, expressions etc. Example given was:
template.hbs
<h1>Welcome to the Glimmer Playground!</h1>
<p>You have clicked the button {{count}} times.</p>
<button onclick={{action increment}}>Click</button>
which is converted to a wire-format:
wire-format.json
{
"statements": [
[9, "h1", true][10],
[1, "Welcome to the Glimmer Playground!", true],
[11],
[9, "p", true],
[10],
[1, "You have clicked the button ", true],
[1, [26, 0, "AppendSingleId"], false], // this string isn't right
[1, " times", true],
[11],
[9, "button", true],
[18, "onclick", [31, [25, 1], [[25, 2]]]],
[10],
[1, "Click", true][11]
],
"upvars": ["count", "action", "increment"]
}
and Base89 encoded to:
output.txt
2qcount"qaction"qincrement"Iqh1"8qWelcome to the Glimmer Playground!"HIp"8q"You have clicked the button "0C08q times"HGbutton"Qonclick"UCaction"Cincrement"H8qClick"I
Conclusion
This is the optimized payload sent to client side. Knowing compilation is not mandatory for learning ember. But this comes handy when doing optimizations like HTML minification, extracting metadata from templates into separate file during build etc.
References
- Glimmer VM official guide
- Glimmer Compiler package - README
- EBNF notation and its railroad representation for reference.