← All posts

From Replay to a Real Live Debugger

How do you build a WASM debugger that actually pauses at a breakpoint, in the browser, without Asyncify/JSPI? Run the program on a Web Worker against shared memory and block that worker with Atomics.wait — the main thread reads live memory while the worker is parked. Every compiler hook is gated, so a normal Run stays byte-identical (corpus 1004/1015, 0 RUNTIME regressions). This is the story of how MiniSwift Studio's debugger went from a fake to a real one.

What changed

Studio's old "debugger" was a replay / time-travel tool: it ran the program once to the end, recorded a {line, vars} snapshot of every step, then let you scrub back and forth over that recording. It worked, but it was a fake debugger — the program never actually stopped; you were just playing back a tape. On infinite/long loops the recording got capped, nothing was "live," and a breakpoint was just a jump to an index in the tape.

Now there's a real live breakpoint debugger: when the running program hits a breakpoint it genuinely freezes, you read the state at that instant (locals, call stack, watch), and continue/step resumes from where it stopped. Like Xcode.

Old: Replay (trace)New: Live (breakpoint)
Executionrun once → record → scrubreal-time, actually stops
Pausingnone (index in a tape)real pause via Atomics.wait
Staterecorded snapshotlive memory read
Stepping±1 over the tape indexreal step over/into/out (via call depth)
Step backyes (free)no (real execution is one-way)
Infinite loopstape overflows / capsfine — stop it whenever you want
Requirementnonecross-origin isolation (COI)

Key insight: we used to think "this needs Asyncify or JSPI" (you can't block WASM on the main thread — it'd freeze the UI). That was wrong. Thanks to the wasm threads that came with the worker-pool work (shared memory + Atomics.wait), running the program on a separate worker and blocking that worker is perfectly legal — the main thread never freezes. Asyncify/JSPI turned out to be unnecessary.

The live path requires COI; without it (e.g. a dev server that doesn't send COOP/COEP headers) Studio automatically falls back to the old replay debugger.

Why it's hard

The one trick a debugger must pull off: "stop now, I want to look." Natively the OS/ptrace does this. To stop WASM in a browser you have three options:

  1. Asyncify — the compiler bloats the code so the whole stack can be unwound/rewound. Heavy, slow, pollutes every function.
  2. JSPI (JS Promise Integration) — new, experimental, limited support.
  3. Our way — move the program onto a worker and block that worker with Atomics.wait. Atomics.wait is forbidden on the main thread but allowed on a worker. Meanwhile the main thread reads the worker's live memory over a SharedArrayBuffer / shared WebAssembly.Memory. Pause = the worker sleeps on a futex; continue = the main thread wakes the futex with Atomics.notify.

Approach 3 came "for free" because we already had threads — and none of Asyncify's code bloat.

Architecture — three pieces

  MAIN THREAD — debugui.js · debug_live.js
    · sets breakpoints + conditions, sends commands
    · reads live memory / locals / call stack
              │
              │   control SAB:  state · line · fidx · depth · cmd · resume
              │                 + breakpoint bitset (256 words)
              ▼
  DEBUG WORKER — debug_worker.js running the instrumented .wasm
    · __ms_step(line, fidx)   →  breakpoint?  →  Atomics.wait   (worker parks)
    · __ms_enter / __ms_exit  →  call stack + frameBase
              ▲
              │   while parked, the main thread reads live state through
              │   the shared WebAssembly.Memory
              ▼
  continue / step  →  RESUME++ , Atomics.notify  →  the worker wakes & resumes
  • debug_ctrl.js — the control-SAB protocol: state / line / fidx / depth / cmd / resume futex + a breakpoint bitset (256 words). sendCmd bumps RESUME and wakes the worker with Atomics.notify.
  • debug_worker.js — runs the instrumented wasm. __ms_step (with an fidx-reconcile fallback for implicit returns + the pause decision), call-stack maintenance via __ms_enter(fidx, frameBase) / __ms_exit, step-into/over/out by call depth, and a thrown sentinel for Stop. pauseAt() = Atomics.wait.
  • debug_live.js — the main-thread controller: spawns the worker, manages breakpoints, and reads globals + any selected frame's locals from that frame's frameBase (recorded by __ms_enter). readSwiftString() decodes Strings; shouldPause() evaluates conditional breakpoints.
  • debugui.js — the UI: a docked bottom pane (Xcode's debug area). A controls toolbar (continue/pause/step×3/stop) over a full-width Call Stack / Variables / Watch split.

Compiler side — the hooks

The critical constraint: debug instrumentation must never change a normal Run. Studio's Debug button calls the ms_set_debug() export to flip the gate flag (g_ms_debug_steps); when it's off the emitted wasm is byte-identical. That's why the corpus stays 1004/1015, 0 RUNTIME.

The step hook

At the start of every statement a host import __ms_step(line, fidx) is called. The worker catches it, checks whether the line is in the breakpoint set, and pauses if needed.

Locals — a debug shadow stack

The problem: named scalar locals live in WASM locals, which the host (JS) can't read. The fix: the step hook mirrors them into memory, and the host reads them from there. Each live frame gets its own slot:

  • A zero-init depth counter at a fixed address (DBG_DEPTH_ADDR).
  • frameBase = DBG_STACK_BASE(8MB) + depth*DBG_FRAME_STRIDE(256).
  • frameBase is captured at function entry into a WASM local (__dbg_fb) and passed to __ms_enter(fidx, frameBase) — so the host reads each frame from the exact base the wasm mirrored to.
  • Depth is ++ at entry and −− at every return — balanced because every function ends in a return (verified: enter==exit).

This is what makes parent frames' locals readable too — click any frame in the call stack and see its locals, not just the innermost.

Putting the imports last

The __ms_enter/__ms_exit imports are emitted last and only in debug, so existing function indices never shift (FIDX_USER_BASE moves 188→190 only when debugging). Both reuse the step hook's type signature (i32,i32)->void — so remember to push two args.

Strings, loop variables, metadata

  • A String is an i32 absolute byte offset into memory (NUL-terminated UTF-8, no header). The host decodes the UTF-8 at that pointer (SAB-safe copy).
  • For-loop variables are named __for_var_<id>_<srcname> so the host can show i in Variables (nested same-name loops get disambiguated i, i#2).
  • A miniswift_debug custom section carries, per function, {fidx, name, locals:[{name, off, type, wt}]} — the host uses it to name the call stack and type the locals.

Conditional breakpoints

The worker can't read values (it only knows the line), so it pauses on every bp hit; the condition is evaluated host-side (reading the innermost frame's variables while parked), and if it isn't met the controller silently continues before touching the UI (no flash). The evaluator uses no eval() — a condition is OR-groups of AND-clauses, and it's fail-open: an unknown variable or a parse error means pause (a real hit is never silently skipped).

How we verified it

  • Compiler invariant: at every phase the corpus stays 1004/1015, 0 RUNTIME (the gating keeps a normal Run byte-identical).
  • Three-level end-to-end with a headless-Chrome harness: node wasm level (live locals n=10→x=20), dev + prod-build, and the live deploy (breakpoint → locals → step into/out → call stack [sq(), main()] → watch → output, zero page errors).

What's next

Very deep recursion (>~32K frames) would overrun the shadow region; hit-count conditions and Substring decoding aren't done yet. And honestly, the editor's own download is still dominated by Monaco's ~3.5 MB core — trimming that is a separate job.

The debugger, though, is real now. You can set a breakpoint, watch it stop, and poke at live memory — in a browser tab, with no native toolchain in sight.

← All posts