From Replay to a Real Live Debugger
How do you build a WASM debugger that actually pauses at a breakpoint, in the browser, without Asyncify/JSPI? Run the program on a Web Worker against shared memory and block that worker with Atomics.wait — the main thread reads live memory while the worker is parked. Every compiler hook is gated, so a normal Run stays byte-identical (corpus 1004/1015, 0 RUNTIME regressions). This is the story of how MiniSwift Studio's debugger went from a fake to a real one.
What changed
Studio's old "debugger" was a replay / time-travel tool: it ran the program once to the end, recorded a {line, vars} snapshot of every step, then let you scrub back and forth over that recording. It worked, but it was a fake debugger — the program never actually stopped; you were just playing back a tape. On infinite/long loops the recording got capped, nothing was "live," and a breakpoint was just a jump to an index in the tape.
Now there's a real live breakpoint debugger: when the running program hits a breakpoint it genuinely freezes, you read the state at that instant (locals, call stack, watch), and continue/step resumes from where it stopped. Like Xcode.
| Old: Replay (trace) | New: Live (breakpoint) | |
|---|---|---|
| Execution | run once → record → scrub | real-time, actually stops |
| Pausing | none (index in a tape) | real pause via Atomics.wait |
| State | recorded snapshot | live memory read |
| Stepping | ±1 over the tape index | real step over/into/out (via call depth) |
| Step back | yes (free) | no (real execution is one-way) |
| Infinite loops | tape overflows / caps | fine — stop it whenever you want |
| Requirement | none | cross-origin isolation (COI) |
Key insight: we used to think "this needs Asyncify or JSPI" (you can't block WASM on the main thread — it'd freeze the UI). That was wrong. Thanks to the wasm threads that came with the worker-pool work (shared memory +
Atomics.wait), running the program on a separate worker and blocking that worker is perfectly legal — the main thread never freezes. Asyncify/JSPI turned out to be unnecessary.
The live path requires COI; without it (e.g. a dev server that doesn't send COOP/COEP headers) Studio automatically falls back to the old replay debugger.
Why it's hard
The one trick a debugger must pull off: "stop now, I want to look." Natively the OS/ptrace does this. To stop WASM in a browser you have three options:
- Asyncify — the compiler bloats the code so the whole stack can be unwound/rewound. Heavy, slow, pollutes every function.
- JSPI (JS Promise Integration) — new, experimental, limited support.
- Our way — move the program onto a worker and block that worker with
Atomics.wait.Atomics.waitis forbidden on the main thread but allowed on a worker. Meanwhile the main thread reads the worker's live memory over aSharedArrayBuffer/ sharedWebAssembly.Memory. Pause = the worker sleeps on a futex; continue = the main thread wakes the futex withAtomics.notify.
Approach 3 came "for free" because we already had threads — and none of Asyncify's code bloat.
Architecture — three pieces
MAIN THREAD — debugui.js · debug_live.js
· sets breakpoints + conditions, sends commands
· reads live memory / locals / call stack
│
│ control SAB: state · line · fidx · depth · cmd · resume
│ + breakpoint bitset (256 words)
▼
DEBUG WORKER — debug_worker.js running the instrumented .wasm
· __ms_step(line, fidx) → breakpoint? → Atomics.wait (worker parks)
· __ms_enter / __ms_exit → call stack + frameBase
▲
│ while parked, the main thread reads live state through
│ the shared WebAssembly.Memory
▼
continue / step → RESUME++ , Atomics.notify → the worker wakes & resumes
debug_ctrl.js— the control-SAB protocol: state / line / fidx / depth / cmd / resume futex + a breakpoint bitset (256 words).sendCmdbumps RESUME and wakes the worker withAtomics.notify.debug_worker.js— runs the instrumented wasm.__ms_step(with an fidx-reconcile fallback for implicit returns + the pause decision), call-stack maintenance via__ms_enter(fidx, frameBase)/__ms_exit, step-into/over/out by call depth, and a thrown sentinel for Stop.pauseAt()=Atomics.wait.debug_live.js— the main-thread controller: spawns the worker, manages breakpoints, and reads globals + any selected frame's locals from that frame's frameBase (recorded by__ms_enter).readSwiftString()decodes Strings;shouldPause()evaluates conditional breakpoints.debugui.js— the UI: a docked bottom pane (Xcode's debug area). A controls toolbar (continue/pause/step×3/stop) over a full-width Call Stack / Variables / Watch split.
Compiler side — the hooks
The critical constraint: debug instrumentation must never change a normal Run. Studio's Debug button calls the ms_set_debug() export to flip the gate flag (g_ms_debug_steps); when it's off the emitted wasm is byte-identical. That's why the corpus stays 1004/1015, 0 RUNTIME.
The step hook
At the start of every statement a host import __ms_step(line, fidx) is called. The worker catches it, checks whether the line is in the breakpoint set, and pauses if needed.
Locals — a debug shadow stack
The problem: named scalar locals live in WASM locals, which the host (JS) can't read. The fix: the step hook mirrors them into memory, and the host reads them from there. Each live frame gets its own slot:
- A zero-init depth counter at a fixed address (
DBG_DEPTH_ADDR). frameBase = DBG_STACK_BASE(8MB) + depth*DBG_FRAME_STRIDE(256).- frameBase is captured at function entry into a WASM local (
__dbg_fb) and passed to__ms_enter(fidx, frameBase)— so the host reads each frame from the exact base the wasm mirrored to. - Depth is
++at entry and−−at every return — balanced because every function ends in a return (verified: enter==exit).
This is what makes parent frames' locals readable too — click any frame in the call stack and see its locals, not just the innermost.
Putting the imports last
The __ms_enter/__ms_exit imports are emitted last and only in debug, so existing function indices never shift (FIDX_USER_BASE moves 188→190 only when debugging). Both reuse the step hook's type signature (i32,i32)->void — so remember to push two args.
Strings, loop variables, metadata
- A String is an i32 absolute byte offset into memory (NUL-terminated UTF-8, no header). The host decodes the UTF-8 at that pointer (SAB-safe copy).
- For-loop variables are named
__for_var_<id>_<srcname>so the host can showiin Variables (nested same-name loops get disambiguatedi,i#2). - A
miniswift_debugcustom section carries, per function,{fidx, name, locals:[{name, off, type, wt}]}— the host uses it to name the call stack and type the locals.
Conditional breakpoints
The worker can't read values (it only knows the line), so it pauses on every bp hit; the condition is evaluated host-side (reading the innermost frame's variables while parked), and if it isn't met the controller silently continues before touching the UI (no flash). The evaluator uses no eval() — a condition is OR-groups of AND-clauses, and it's fail-open: an unknown variable or a parse error means pause (a real hit is never silently skipped).
How we verified it
- Compiler invariant: at every phase the corpus stays 1004/1015, 0 RUNTIME (the gating keeps a normal Run byte-identical).
- Three-level end-to-end with a headless-Chrome harness: node wasm level (live locals
n=10→x=20), dev + prod-build, and the live deploy (breakpoint → locals → step into/out → call stack[sq(), main()]→ watch → output, zero page errors).
What's next
Very deep recursion (>~32K frames) would overrun the shadow region; hit-count conditions and Substring decoding aren't done yet. And honestly, the editor's own download is still dominated by Monaco's ~3.5 MB core — trimming that is a separate job.
The debugger, though, is real now. You can set a breakpoint, watch it stop, and poke at live memory — in a browser tab, with no native toolchain in sight.