../

Writing a JavaScript VM in Go - Part 1

· authored by yog

JavaScript virtual machines are becoming more commonplace in the commercial anti-bot space, with most major players adopting some kind of VM protection; Cloudflare, Akamai, DataDome and even Incapsula. We’re also seeing companies roll their own VM protections - Bet365 and ChatGPT for example.

We spend a lot of time reverse-engineering these VMs, but we rarely talk about how they’re actually made. In this post, we’ll walk through the basics of a JavaScript VM, how to write a basic compiler using go-fAST - a library I am a big advocate for and have helped contribute to. go-fAST is an SWC alternative for Go - it implements a similar architecture and its speeds are comparable.

Some terminology

Before we start, it’s important to understand what some of the different terminology of a VM actually means;

  • Stack-based - a type of VM that uses a stack. For example if we wanted to add two numbers, the instructions would be something like; PUSH 2, PUSH 3, ADD
  • Register-based - a type of VM that uses registers.
  • Register - a named/numbered slot the VM uses to hold a value while it works. In a register-based VM, instructions name their registers directly, e.g. ADD R3, R1, R2 meaning “add R1 and R2, store the result in R3”.
  • Operator / Opcode - the operation an instruction performs, the what to do part. In ADD the operator is addition; in PUSH it’s “push onto the stack”. The opcode is that operator encoded as a number in the bytecode, so where you write ADD, the VM actually stores something like 0x0A
  • Instruction - a single executable command, made up of an operator plus zero or more operands. PUSH 2 is an instruction (operator PUSH + operand 2). ADD is another (an operator with no operands). A program is just a sequence of instructions called bytecode.
  • Operand - the data an instruction works on. For example in PUSH 2, 2 is the operand (also sometimes called an immediate)

Why register-based, and why Go

The VM we’ll be making today is register-based, rather than stack-based. Most teaching I’ve seen online focuses on stack-based VMs, so I figured I will show you how register-based ones work. We’ll model it loosely on the Lua VM which uses the iABC layout. It packs the opcode plus three operands: A, B, and C. A typical use is an arithmetic op like ADD A B C, meaning roughly R[A] = R[B] + R[C], where the operands index into registers (or constants)

Writing a register-based VM is slightly more work than a stack-based VM as we have to hand out register numbers, but it’s worth it for two reasons. Register code is denser and faster (it has fewer instructions) and no push/pop churn.

The parser we’ll use is go-fAST. Why? Well I helped write some of the library so I have a natural bias, but also in a world where speed and performance is critical, using a fast JavaScript parsing library makes sense (and I really like writing code in Go). We could use libraries such as Babel, but they’re heavy and slow. Not to say that using Babel is wrong, in fact it’s still used by Akamai for their obfuscation :)

A snippet of Akamai’s Babel-generated obfuscation

go-fAST also has an update on its dev branch that puts it in serious contention with libraries such as OXC and SWC. It isn’t in a tagged release yet, but it’s far enough along that I wanted to show it off here. I’ll explain a little more about it further down.

go-fAST vs OXC and SWC benchmark: parse, traverse, codegen and combined timings

The shape

JavaScript source
   │  go-fAST: parser.Parse

AST
   │  compiler: walk the tree, hand out registers

Bytecode  (constants pool + instruction array)
   │  serialise → XOR-encrypt → base64

Encrypted blob  (one base64 string)
   │  emitter: splice the blob + interpreter into one file

JavaScript (blob + VM)  ──► node runs it

The sample program we’ll virtualise. Yes it’s small but it demonstrates constants, variables, a loop, a comparison and output:

let total = 0;
let i = 1;
while (i <= 5) {
  total = total + i;
  i = i + 1;
}
console.log(total);

Parsing with go-fAST

parser.Parse takes source and returns an *ast.Program.

prog, err := parser.Parse(src)
if err != nil {
    panic(err)
}
for i := range prog.Body {
    c.stmt(&prog.Body[i])
}

A note on tagged unions

This is the update I mentioned earlier. go-fAST reworked its AST on its dev branch to use tagged unions, and it’s worth a quick word because it changes how you walk the tree. Anywhere the grammar allows one of several node shapes - a statement, an expression, the property part of a member access - the old API used a Go interface (ast.Stmt, ast.Expr) boxed inside a thin wrapper struct, and you recovered the concrete node with a type switch. The new API replaces that with a concrete value type (ast.Statement, ast.Expression, and friends) carrying a small kind tag plus a pointer to the real node. No interface, no boxing - which is why we range over prog.Body by index above and hand stmt a pointer: the elements are values now, not interfaces.

In practice you switch on .Kind() rather than on the dynamic type, and pull the concrete node out with a generated accessor - s.MustWhile() returns the *ast.WhileStatement, or s.While() hands you a (*ast.WhileStatement, bool) if you’d rather check than panic. The fields hang off that the way you’d expect: a WhileStatement has n.Test (an *ast.Expression) and n.Body (an *ast.Statement); a BinaryExpression has n.Operator, n.Left, n.Right - all already typed, with none of the old .Expr / .Stmt unwrapping. The pay-off is fewer allocations and no interface dispatch on the hot path, which is exactly what you want when you’re parsing JavaScript at the edge.

go-fAST also comes with a resolver for scope analysis - we don’t actually need it for this example, but you’d reach for it the moment you need to add support for real variable or function scopes.

The instruction set

The next part of the design is to come up with an instruction format and a set of opcodes. Since we’re loosely modelling Lua’s VM, we can model this quite simply as;

const (
    LOADK = iota // R[A] = K[B]          load constant
    MOVE         // R[A] = R[B]
    ADD          // R[A] = R[B] + R[C]
    SUB          // R[A] = R[B] - R[C]
    MUL          // R[A] = R[B] * R[C]
    LE           // R[A] = R[B] <= R[C] ? 1 : 0
    LT           // R[A] = R[B] <  R[C] ? 1 : 0
    JMP          // pc = A
    JMPF         // if (!R[A]) pc = B    jump if false
    PRINT        // console.log(R[A])
    HALT
)

type Instr struct{ Op, A, B, C int }

Note that we have an explicit PRINT instruction - in the real world we probably wouldn’t have this instruction, or if we did we would hide it behind a property lookup and a call, but for the sake of demonstration we’ll leave it as its own instruction.

The program constants (e.g. the literals 0, 1, 5) don’t live in the instruction stream - they’ll go in a side table and instructions reference them by index. There are many ways this could be improved (for example Kasada bakes the literals into the bytecode stream and slices it out during setup). Incapsula use heavy type coercion to build literals rather than storing them as strings - so you can get creative with how you hide them. For this we’ll keep it simple.

The compiler

With real world compilers, they would typically lower the program to IR first, do some optimisations, and then emit the bytecode. For this example we’ll again keep it simple, we’ll virtualise the JavaScript example how we see it, without lowering to IR first. The only real bookkeeping is the register allocation, and we keep it as dumb as possible:

  • Each declared variable gets a permanent register, low numbers first
  • Expression temporaries are allocated above the variables with a bump pointer (top), and released back down at the end of every statement
type Compiler struct {
    consts []float64
    code   []Instr
    vars   map[string]int
    nvars  int
    top    int // next free scratch register
}

// emit appends an instruction and returns its index, so a caller can patch
// it later - the loop relies on this to back-patch its exit jump.
func (c *Compiler) emit(op, a, b, cc int) int {
    c.code = append(c.code, Instr{op, a, b, cc})
    return len(c.code) - 1
}

// konst interns a literal in the constants table and returns its index.
func (c *Compiler) konst(v float64) int {
    for i, k := range c.consts {
        if k == v {
            return i
        }
    }
    c.consts = append(c.consts, v)
    return len(c.consts) - 1
}

func (c *Compiler) declare(name string) int {
    r := c.nvars
    c.nvars++
    c.vars[name] = r
    c.top = c.nvars
    return r
}
func (c *Compiler) alloc() int { r := c.top; c.top++; return r } // scratch

The heart of it is two mutually-recursive expression helpers. expr(e, dest) compiles e so its result lands in a specific register; operand(e) returns some register holding e’s value, allocating scratch only when it has to - a variable is already in a register, so we just hand back its number instead of copying it:

// compile e, leaving the result in register dest
func (c *Compiler) expr(e *ast.Expression, dest int) {
    switch e.Kind() {
    case ast.ExprNumberLit:
        c.emit(LOADK, dest, c.konst(e.MustNumberLit().Value), 0)
    case ast.ExprIdentifier:
        if r := c.vars[e.MustIdentifier().Name]; r != dest {
            c.emit(MOVE, dest, r, 0)
        }
    case ast.ExprBinary:
        n := e.MustBinary()
        b := c.operand(n.Left)
        cc := c.operand(n.Right)
        c.emit(binop(n.Operator), dest, b, cc)
    case ast.ExprAssign:
        n := e.MustAssign()
        r := c.vars[n.Left.MustIdentifier().Name]
        c.expr(n.Right, r)
        if dest != r {
            c.emit(MOVE, dest, r, 0)
        }
    }
}

// a register holding e, without forcing a destination
func (c *Compiler) operand(e *ast.Expression) int {
    if id, ok := e.Identifier(); ok {
        if r, ok := c.vars[id.Name]; ok {
            return r
        }
    }
    r := c.alloc()
    c.expr(e, r)
    return r
}

binop is the small bridge between the two worlds. It maps a go-fAST source operator (an ast.BinaryOperator such as ast.BinaryAddition) to the matching VM opcode (ADD) - it’s the one place the AST’s + becomes our ADD. Note the operator now lives on the node as a typed enum value, so there’s no token package import any more:

func binop(op ast.BinaryOperator) int {
    switch op {
    case ast.BinaryAddition:
        return ADD
    case ast.BinarySubtraction:
        return SUB
    case ast.BinaryMultiplication:
        return MUL
    case ast.BinaryLessEqualThan:
        return LE
    case ast.BinaryLessThan:
        return LT
    }
    panic("unsupported operator: " + op.String())
}

Statements are the same idea one level up. The interesting one is the loop, because it needs back-patching: when you emit the conditional jump that exits the loop, you don’t yet know where the loop ends, so you emit it with a placeholder target and fill it in once you do.

case ast.StmtWhile:
    n := s.MustWhile()                 // pull the *ast.WhileStatement out of the union
    top := len(c.code)                 // remember the loop entry
    cond := c.operand(n.Test)          // evaluate i <= 5
    jmpf := c.emit(JMPF, cond, 0, 0)   // exit if false, target unknown
    c.top = c.nvars                    // release the condition's scratch
    c.stmt(n.Body)                     // compile the body
    c.emit(JMP, top, 0, 0)             // jump back to the test
    c.code[jmpf].B = len(c.code)       // back-patch: exit lands here

That’s the entire control-flow story for a while. An if is the same minus the back-jump.

This is the same shape Lua’s own compiler uses for a while: its whilestat marks the loop top with luaK_getlabel, compiles the condition, emits the body, jumps back to the top, then back-patches the exit with luaK_patchtohere. The only real difference is that Lua patches a whole list of jumps to one target (so a single label can collect breaks and short-circuit and/or), where we only ever have one jump to fix.

The bytecode

Compile the sample and dump it, and you get this:

  0  LOADK  0 0 0     ; R0 = K[0]=0      total = 0
  1  LOADK  1 1 0     ; R1 = K[1]=1      i = 1
  2  LOADK  3 2 0     ; R3 = K[2]=5      (scratch for the literal 5)
  3  LE     2 1 3     ; R2 = R1 <= R3    i <= 5
  4  JMPF   2 9 0     ; if !R2 goto 9    exit loop
  5  ADD    0 0 1     ; R0 = R0 + R1     total = total + i
  6  LOADK  2 1 0     ; R2 = K[1]=1      (scratch for the literal 1)
  7  ADD    1 1 2     ; R1 = R1 + R2     i = i + 1
  8  JMP    2 0 0     ; goto 2           back to the test
  9  PRINT  0 0 0     ; console.log(R0)
 10  HALT

R0 is total, R1 is i - the two variables, parked in the low registers for the whole program. Everything from R2 up is scratch, handed out and reused: R2/R3 hold the comparison at the top of the loop, then R2 is reused for the literal 1 further down. The loop is right there in the open - test at 2-4, body at 5-7, the back-jump at 8, the exit target at 9. Constants pool: [0, 1, 5].

Notice there’s nothing JavaScript-shaped left. No while, no <=, no variable names. Just registers and integers. That’s the point of virtualization - and it’s why the layers we add in later parts have so much room to hide.

Shipping the bytecode: binary, XOR, base64

Rather than just printing the bytecode as a readable JavaScript array, we can serialize it to a compact binary blob, XOR-encrypt it, and ship it as a single base64 string. The layout is a one-byte constants count, the constants as float64, a two-byte instruction count, then fixed-width instructions.

// nConsts:u8, consts:f64..., nInstrs:u16, instrs:{ op:u8, A:u16, B:u16, C:u16 }…
func (c *Compiler) serialize() []byte {
    buf := new(bytes.Buffer)
    buf.WriteByte(byte(len(c.consts)))
    for _, k := range c.consts {
        binary.Write(buf, binary.BigEndian, k) // float64, 8 bytes
    }
    binary.Write(buf, binary.BigEndian, uint16(len(c.code)))
    for _, in := range c.code {
        buf.WriteByte(byte(in.Op))
        binary.Write(buf, binary.BigEndian, uint16(in.A))
        binary.Write(buf, binary.BigEndian, uint16(in.B))
        binary.Write(buf, binary.BigEndian, uint16(in.C))
    }
    return buf.Bytes()
}

The cipher is the most basic one there is - repeating-key XOR. It’s symmetric, so the exact same routine encrypts in Go and decrypts in the VM:

func xorCipher(data, key []byte) []byte {
    out := make([]byte, len(data))
    for i := range data {
        out[i] = data[i] ^ key[i%len(key)]
    }
    return out
}

This isn’t real cryptography - the key ships in the clear right next to the payload, so anyone who reads the loader can undo it. But even now it earns its place: the bytecode is no longer human-readable in the output, and once the key is randomised per build (a one-line change), two builds of the same program look entirely different. Our sample serializes to 104 bytes; encrypted and base64’d it’s one opaque string:

nTdPwZ43T8GeCL/BnjdPwZ53W8GeN0/BnjdEwZ43T8GeN0/Bn...

The interpreter

The VM decodes that blob and runs it. And instead of one big switch, each opcode is its own function in a table indexed by opcode number. That’s how production VMs dispatch - a handler table, sometimes one trampoline per opcode - and it’s what makes the later tricks possible: you can shuffle the table per build, or swap handlers in and out at runtime, without touching the bytecode.

(function () {
  var B = "nTdPwZ43T8GeCL/BnjdPwZ53W8GeN0/B...";  // the encrypted program
  var KEY = [158, 55, 79, 193];

  // 1. base64 decode, then XOR-decrypt
  var raw = atob(B), bytes = new Uint8Array(raw.length);
  for (var n = 0; n < raw.length; n++) bytes[n] = raw.charCodeAt(n) ^ KEY[n % KEY.length];

  // 2. parse the binary blob into constants K and program P
  var dv = new DataView(bytes.buffer), off = 0;
  var K = [], nc = bytes[off++];
  for (var n = 0; n < nc; n++) { K.push(dv.getFloat64(off)); off += 8; }
  var P = [], ni = dv.getUint16(off); off += 2;
  for (var n = 0; n < ni; n++) {
    P.push([bytes[off], dv.getUint16(off + 1), dv.getUint16(off + 3), dv.getUint16(off + 5)]);
    off += 7;
  }

  // 3. one handler function per opcode, indexed by opcode number
  var R = [], pc = 0;
  var H = [
    function (i) { R[i[1]] = K[i[2]]; },                    // 0  LOADK
    function (i) { R[i[1]] = R[i[2]]; },                    // 1  MOVE
    function (i) { R[i[1]] = R[i[2]] + R[i[3]]; },          // 2  ADD
    function (i) { R[i[1]] = R[i[2]] - R[i[3]]; },          // 3  SUB
    function (i) { R[i[1]] = R[i[2]] * R[i[3]]; },          // 4  MUL
    function (i) { R[i[1]] = R[i[2]] <= R[i[3]] ? 1 : 0; }, // 5  LE
    function (i) { R[i[1]] = R[i[2]] <  R[i[3]] ? 1 : 0; }, // 6  LT
    function (i) { pc = i[1]; },                            // 7  JMP
    function (i) { if (!R[i[1]]) pc = i[2]; },              // 8  JMPF
    function (i) { console.log(R[i[1]]); },                 // 9  PRINT
    function (i) { pc = P.length; }                         // 10 HALT
  ];

  // 4. fetch-decode-dispatch
  while (pc < P.length) { var i = P[pc++]; H[i[0]](i); }
})();

The loop shrinks to almost nothing: fetch the next instruction, look its handler up by opcode, call it. The handlers close over R, K, P and pc, so a jump is just a handler assigning to pc, and HALT parks pc past the end to stop the loop. After decode each instruction is still a four-element array - the program counter indexes instructions and jumps assign to pc, exactly as the bytecode intended; the binary format is purely how it travels.

And running it:

$ node out.js
15

What we actually built

So there we have it - we’ve built a (very basic) JavaScript virtual machine obfuscator in Go. There are still a lot of concepts that would be needed to make this into a viable solution for protecting code in the real world such as; function closures, scoping, objects, arrays etc.

The beauty of building this in Go is for the speed. The thing that makes VM obfuscation difficult to reverse engineer isn’t necessarily how complex the instruction set is, or how well the constants are hidden - it’s how dynamic the VM is. If the VM is constantly changing shape, dropping opcodes, adding opcodes, randomly fusing opcodes together - that’s what makes it challenging to defeat. With go-fAST you could deliver this live, at the edge. Imagine a scenario where the user gets a new copy of the VM each time they visit the website. An attacker would have to disassemble it each time.

Some additional security mechanisms we could introduce to make reverse-engineering difficult:

  • Control-flow flattening - dissolve the structured loops and branches into a dispatch-driven state machine so the bytecode no longer reads top to bottom
  • Opcode permutation - shuffle the opcode numbers and the handler table on every build, so no two outputs agree on which number means ADD, and static signatures don’t carry across builds
  • Super-operators - fuse common instruction sequences into single bespoke opcodes, which both speeds things up and erases recognisable primitives.
  • Hardening the crypto - derive the XOR key at runtime, based on certain parameters
  • Anti-debug and self-integrity - Google’s botguard does a good job of this. Timing checks to see if we’re being debugged that silently corrupts random register indexes. The VM execution executes as normal, but the output is wrong.

Anyway - this wasn’t meant to be anything complex, rather a demonstration of how we can implement JavaScript obfuscation in Go. In the next parts we’ll look at implementing the VM further.

Until next time!

Full source

The complete compiler is on GitHub: disasm-dev/js-vm-in-go. It is the same code this post walks through, in a single main.go. Clone it and run it:

git clone https://github.com/disasm-dev/js-vm-in-go
cd js-vm-in-go
go run .      # virtualizes the embedded sample, writes out.js
node out.js   # prints 15