Bug 1476251 - Generate stack maps in the Wasm Baseline compiler. r=lth.
authorJulian Seward <jseward@acm.org>
Fri, 14 Dec 2018 13:00:44 +0100
changeset 450780 5ca49059949b3c73db196be7280cfd9472631492
parent 450779 7e88127111fe72ae74ed3d975d2566cfc01f6c69
child 450781 d9e2ee18925b37da06bb63c956acc165ab303837
push id35208
push usercsabou@mozilla.com
push dateSat, 15 Dec 2018 02:48:07 +0000
treeherdermozilla-central@d86d184dc7d6 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewerslth
bugs1476251
milestone66.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1476251 - Generate stack maps in the Wasm Baseline compiler. r=lth. This is a first implementation of stack map generation and usage for wasm baseline. It is intended to be a relatively simple starting point for testing and refinement. With the patch in place, it is possible to run Wasm test cases that involve GC objects and garbage collection. The general way the patch works is: * During compilation, all state to do with generating stack maps is held in a new struct, StackMapGenerator. The BaseCompiler updates various fields in StackMapGenerator as it goes, and can call StackMapGenerator::createStackMap to create a map. The StackMapGenerator holds various integer and Maybe-integer fields, but most importantly it maintains a MachineStackTracker, which is a vector of booleans which track the pointerness of each word in the current frame except for pointers corresponding to stack-resident ref-typed entries in the compiler's evaluation stack. When we want to create a stack map, BaseCompiler calls one of its four different ::createStackMap functions. These are simple wrappers which add various default and other parameters and call onwards to StackMapGenerator::createStackMap, which does the actual work. * StackMapGenerator::createStackMap works by augmenting the MachineStackTracker with pointers that are implied to exist as a result of stack-resident ref-typed entries in the compiler's evaluation stack (BaseCompiler::stk_). The resulting entries are then copied off into a bit vector (a wasm::StackMap) for storage. Extra entries may be added for the case where a trap exit stub's not-really-a-frame may be pushed over the current frame. The presence of a ref-typed DebugFrame, if any, in the map, is also noted. * All stack maps created also cover those words above the current frame's return address, that are used to pass parameters in memory to this function. In other words, the incoming argument area for a function is covered by that function's stackmap. The alignment padding that may be above that area *isn't* however included; that belongs (logically) to the caller. * For places where a function may invoke the wasm trap exit stub (by executing an illegal instruction), a composite map is created. This contains entries for the stack entries that would exist in the absence of the stub, but also contains extra entries for the save area that the stub will push "on top" of the normal frame. To describe the layout of the save area, a new routine, wasm::GenerateTrapExitMachineState, generates a description of the area from which the stackmap component for the save area is computed. * Completed stackmaps are temporarily stored in BaseCompiler::stackMaps_. They are further processed in WasmGenerator.cpp: - ModuleGenerator::linkCompiledCode biases them by the module offset and moves them to the metadataTier_. - ModuleGenerator::finishMetadata sorts and sanity checks them - ModuleGenerator::finish biases them again, so as to give them their final code addresses, and checks (in debug builds) that they are associated with plausible instructions. * When it comes to GC time, TraceJitActivations iterates over activations, and TraceJitActivation iterates over individual frames. The latter calculates, in each frame, the address of the next instruction to be executed in that frame. It hands this, and, effectively, the stack pointer value, to new function Instance::traceFrame. * Instance::traceFrame does what you'd expect -- looks up the map, using Code::lookupStackMap. Then it scans the frame, doing as many sanity checks as it reasonably can on the way. There are heuristic (but safe) checks to ensure that the maps sync with the actual stack values, and also that the map sizes are correct. * There are 3 new test files: stackmaps1.js -- tests unwinding w/ maps, for direct/indirect calls only. stackmaps2.js -- as stackmaps1.js but in the presence of many interrupts stackmaps3.js -- tries hard to generate many cells which are held live only from the stack, whilst dealing with interrupts * New types: MachineStackTracker -- as described above StackMapGenerator -- as described above struct wasm::StackMap -- a single stack map -- basically a bit vector class wasm::StackMaps -- a mapping from code addr to wasm::StackMap * The zero value pushed by GenerateTrapExit has been changed to 1337 and had been given a symbolic name. This isn't entirely frivolous: detecting the zero in Instance::traceFrame is a useful sanity check, but zeroes occur relatively frequently in the stack. 1337 is much less likely to randomly appear. It's not pointer aligned and denotes "page zero" so seems relatively safe. Supporting changes: * JitFrameIter and WasmFrameIter have a new method, returnAddressToFp(), which produces the address of the next instruction to be executed for both JS and Wasm frames. * For all call instructions generated, the relevant MacroAssembler routines have been modified so as to return a CodeOffset to guarantees to refer to the first byte of the instruction immediately following the call. This is so as to ensure that the stack map refers to the correct instruction even in the case where the MacroAssembler routine adds further instruction after the actual call instruction. * Stackmap generation can fail due to lack of memory. So all call chains that can lead to a call to StackMapGenerator::createStackMap now return a MOZ_MUST_USE bool and must detect and handle OOMs in the normal way. * struct BaseCompiler::Stk (evaluation stack elements) has been lifted out to the top level and placed above struct StackMapGenerator, since StackMapGenerator needs to be able to inspect stack entries.
js/src/jit-test/tests/wasm/gc/stackmaps1.js
js/src/jit-test/tests/wasm/gc/stackmaps2.js
js/src/jit-test/tests/wasm/gc/stackmaps3.js
js/src/jit/JitFrames.cpp
js/src/jit/MacroAssembler-inl.h
js/src/jit/MacroAssembler.cpp
js/src/jit/MacroAssembler.h
js/src/jit/Registers.h
js/src/jit/arm/MacroAssembler-arm.cpp
js/src/jit/arm64/MacroAssembler-arm64.cpp
js/src/jit/mips-shared/MacroAssembler-mips-shared.cpp
js/src/jit/x86-shared/MacroAssembler-x86-shared.cpp
js/src/vm/Stack.cpp
js/src/vm/Stack.h
js/src/wasm/WasmBaselineCompile.cpp
js/src/wasm/WasmBaselineCompile.h
js/src/wasm/WasmCode.cpp
js/src/wasm/WasmCode.h
js/src/wasm/WasmFrameIter.cpp
js/src/wasm/WasmFrameIter.h
js/src/wasm/WasmGenerator.cpp
js/src/wasm/WasmGenerator.h
js/src/wasm/WasmInstance.cpp
js/src/wasm/WasmInstance.h
js/src/wasm/WasmStubs.cpp
js/src/wasm/WasmStubs.h
js/src/wasm/WasmTypes.h
new file mode 100644
--- /dev/null
+++ b/js/src/jit-test/tests/wasm/gc/stackmaps1.js
@@ -0,0 +1,89 @@
+// |jit-test| skip-if: !wasmGcEnabled()
+
+// Tests wasm frame tracing.  Only tests for direct and indirect call chains
+// in wasm that lead to JS allocation.  Does not test any timeout or interrupt
+// related aspects.  The structure is
+//
+//   test top level: call fn2
+//   fn2: call fn1
+//   fn1: do 100k times { call-direct fn0; call-indirect fn0; }
+//   fn0: call out to JS that does allocation
+//
+// Eventually fn0 will trigger GC and we expect the chain of resulting frames
+// to be traced correctly.  fn2, fn1 and fn0 have some ref-typed args, so
+// there will be traceable stack words to follow, in the sequence of frames.
+
+const {Module,Instance} = WebAssembly;
+
+let t =
+  `(module
+     (gc_feature_opt_in 2)
+     (import $check3 "" "check3" (func (param anyref) (param anyref) (param anyref)))
+     (type $typeOfFn0
+           (func (result i32) (param i32) (param anyref) (param i32)
+                              (param anyref) (param anyref) (param i32)))
+     (table 1 1 anyfunc)
+     (elem (i32.const 0) $fn0)
+
+     (import $alloc "" "alloc" (func (result anyref)))
+
+     ;; -- fn 0
+     (func $fn0 (export "fn0")
+                (result i32) (param $arg1 i32) (param $arg2 anyref) (param $arg3 i32)
+                             (param $arg4 anyref) (param $arg5 anyref) (param $arg6 i32)
+       (call $alloc)
+       drop
+       (i32.add (i32.add (get_local $arg1) (get_local $arg3)) (get_local $arg6))
+
+       ;; Poke the ref-typed arguments, to be sure that they got kept alive
+       ;; properly across any GC that the |alloc| call might have done.
+       (call $check3 (get_local $arg2) (get_local $arg4) (get_local $arg5))
+     )
+
+     ;; -- fn 1
+     (func $fn1 (export "fn1") (param $arg1 anyref) (result i32)
+       (local $i i32)
+
+       (loop i32
+         ;; call direct 0
+         (call $fn0 (i32.const 10) (get_local $arg1) (i32.const 12)
+                    (get_local $arg1) (get_local $arg1) (i32.const 15))
+
+         ;; call indirect 0
+         (call_indirect $typeOfFn0
+                    (i32.const 10) (get_local $arg1) (i32.const 12)
+                    (get_local $arg1) (get_local $arg1) (i32.const 15)
+                    (i32.const 0)) ;; table index
+
+         i32.add
+
+         ;; Do 60k iterations of this loop, to get a good amount of allocation
+         (set_local $i (i32.add (get_local $i) (i32.const 1)))
+         (br_if 0 (i32.lt_s (get_local $i) (i32.const 60000)))
+       )
+     )
+
+     ;; -- fn 2
+     (func $fn2 (export "fn2") (param $arg1 anyref) (result i32)
+       (call $fn1 (get_local $arg1))
+     )
+   )`;
+
+function Croissant(chocolate, number) {
+    this.chocolate = chocolate;
+    this.number = number;
+}
+
+function allocates() {
+    return new Croissant(true, 271828);
+}
+
+function check3(a1, a2, a3) {
+    assertEq(a1.number, 31415927);
+    assertEq(a2.number, 31415927);
+    assertEq(a3.number, 31415927);
+}
+
+let i = wasmEvalText(t, {"":{alloc: allocates, check3: check3}});
+
+print(i.exports.fn2( new Croissant(false, 31415927) ));
new file mode 100644
--- /dev/null
+++ b/js/src/jit-test/tests/wasm/gc/stackmaps2.js
@@ -0,0 +1,134 @@
+// |jit-test| skip-if: !wasmGcEnabled()
+
+// Tests wasm frame tracing in the presence of interrupt handlers that perform
+// allocation.  The structure is
+//
+//   test top level: call fn2
+//   fn2: call fn1
+//   fn1: repeat { call-direct fn0; call-indirect fn0; }
+//   fn0: a 100-iteration loop that does nothing except waste time
+//
+// At the same time we are asynchronously runnning handler(), which does a lot
+// of allocation.  At some point that will trigger a GC.  Assuming that
+// handler() runs whilst fn0 is running (the most likely scenario, since fn0
+// consumes the majority of the wasm running time), then the runtime will walk
+// the stack from the wasm exit frame, through fn0, fn1 and finally fn2.  As
+// with stackmaps1.js, there are some ref-typed args in use so as provide
+// traceable stack slots to follow.
+//
+// The test runs until the loop in fn1 determines that handler() has allocated
+// sufficient memory as to have caused at least three collections.  This helps
+// keep the test effective in the face of wide variations in the rate of
+// progress of the handler()'s loop (eg x86+native is fast, arm64+simulator is
+// slow).
+
+const {Module,Instance} = WebAssembly;
+
+const DEBUG = false;
+
+let t =
+  `(module
+     (gc_feature_opt_in 2)
+     (type $typeOfFn0
+           (func (result i32) (param i32) (param anyref) (param i32)
+                              (param anyref) (param anyref) (param i32)))
+     (table 1 1 anyfunc)
+     (elem (i32.const 0) $fn0)
+
+     (import $alloc "" "alloc" (func (result anyref)))
+     (import $quitp "" "quitp" (func (result i32)))
+     (import $check3 "" "check3" (func (param anyref) (param anyref) (param anyref)))
+
+     ;; -- fn 0
+     (func $fn0 (export "fn0")
+                (result i32) (param $arg1 i32) (param $arg2 anyref) (param $arg3 i32)
+                             (param $arg4 anyref) (param $arg5 anyref) (param $arg6 i32)
+       (local $i i32)
+
+       ;; spinloop to waste time
+       (loop
+         (set_local $i (i32.add (get_local $i) (i32.const 1)))
+         (br_if 0 (i32.lt_s (get_local $i) (i32.const 100)))
+       )
+
+       (i32.add (i32.add (get_local $arg1) (get_local $arg3)) (get_local $arg6))
+
+       ;; Poke the ref-typed arguments, to be sure that they got kept alive
+       ;; properly across any GC that might have happened.
+       (call $check3 (get_local $arg2) (get_local $arg4) (get_local $arg5))
+     )
+
+     ;; -- fn 1
+     (func $fn1 (export "fn1") (param $arg1 anyref) (result i32)
+       (loop i32
+         ;; call direct to $fn0
+         (call $fn0 (i32.const 10) (get_local $arg1) (i32.const 12)
+                    (get_local $arg1) (get_local $arg1) (i32.const 15))
+
+         ;; call indirect to table index 0, which is $fn0
+         (call_indirect $typeOfFn0
+                    (i32.const 10) (get_local $arg1) (i32.const 12)
+                    (get_local $arg1) (get_local $arg1) (i32.const 15)
+                    (i32.const 0)) ;; table index
+
+         i32.add
+
+         ;; Continue iterating until handler() has allocated enough
+         (br_if 0 (i32.eq (call $quitp) (i32.const 0)))
+       )
+     )
+
+     ;; -- fn 2
+     (func $fn2 (export "fn2") (param $arg1 anyref) (result i32)
+       (call $fn1 (get_local $arg1))
+     )
+   )`;
+
+function Croissant(chocolate, number) {
+    this.chocolate = chocolate;
+    this.number = number;
+}
+
+function allocates() {
+    return new Croissant(true, 271828);
+}
+
+let totAllocs = 0;
+
+function handler() {
+    if (DEBUG) {
+        print('XXXXXXXX icallback: START');
+    }
+    let q = allocates();
+    let sum = 0;
+    let iters = 15000;
+    for (let i = 0; i < iters; i++) {
+        let x = allocates();
+        // Without this hoop jumping to create an apparent use of |x|, Ion
+        // will remove the allocation call and make the test pointless.
+        if (x == q) { sum++; }
+    }
+    totAllocs += iters;
+    // Artificial use of |sum|.  See comment above.
+    if (sum == 133713371337) { print("unlikely!"); }
+    timeout(0.5, handler);
+    if (DEBUG) {
+        print('XXXXXXXX icallback: END');
+    }
+    return true;
+}
+
+function quitp() {
+    return totAllocs > 200000 ? 1 : 0;
+}
+
+function check3(a1, a2, a3) {
+    assertEq(a1.number, 31415927);
+    assertEq(a2.number, 31415927);
+    assertEq(a3.number, 31415927);
+}
+
+let i = wasmEvalText(t, {"":{alloc: allocates, quitp: quitp, check3: check3}});
+
+timeout(0.5, handler);
+print(i.exports.fn2( new Croissant(false, 31415927) ));
new file mode 100644
--- /dev/null
+++ b/js/src/jit-test/tests/wasm/gc/stackmaps3.js
@@ -0,0 +1,204 @@
+// |jit-test| skip-if: !wasmGcEnabled()
+
+// Generates a bunch of numbers-on-the-heap, and tries to ensure that they are
+// held live -- at least for a short while -- only by references from the wasm
+// evaluation stack.  Then assembles them in a list and checks that the list
+// is as expected (and we don't segfault).  While all this is running we also
+// have an regular interrupt whose handler does a bunch of allocation, so as
+// to cause as much disruption as possible.
+
+// Note this makes an assumption about how the wasm compiler works.  There's
+// no particular reason that the wasm compiler needs to keep the results of
+// the $mkBoxedInt calls on the machine stack.  It could equally cache them in
+// registers or even reorder the call sequences so as to interleave
+// construction of the list elements with construction of the list itself.  It
+// just happens that our baseline compiler will behave as described.  That
+// said, however, it's hard to imagine how an implementation could complete
+// the list construction without having at least one root in a register or on
+// the stack, so the test still has value regardless of how the underlying
+// implementation works.
+
+const {Module,Instance} = WebAssembly;
+
+const DEBUG = false;
+
+let t =
+  `(module
+     (gc_feature_opt_in 2)
+     (import $mkCons "" "mkCons" (func (result anyref)
+                                       (param anyref) (param anyref)))
+     (import $mkBoxedInt "" "mkBoxedInt" (func (result anyref)))
+
+     (func $mkNil (result anyref)
+       ref.null
+     )
+
+     (func $mkConsIgnoringScalar (result anyref)
+              (param $hd anyref) (param i32) (param $tl anyref)
+        (get_local $hd)
+        (get_local $tl)
+        call $mkCons
+     )
+
+     (func $mkList (export "mkList") (result anyref)
+        call $mkList20
+     )
+
+     (func $mkList20 (result anyref)
+       ;; create 20 pointers to boxed ints on the stack, plus a few
+       ;; scalars for added confusion
+       (local $scalar99 i32)
+       (local $scalar97 i32)
+       (set_local $scalar99 (i32.const 99))
+       (set_local $scalar97 (i32.const 97))
+
+       call $mkBoxedInt
+       get_local $scalar99
+       call $mkBoxedInt
+       call $mkBoxedInt
+       get_local $scalar97
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkBoxedInt
+       call $mkNil
+       ;; Now we have (pointers to) 20 boxed ints and a NIL on the stack, and
+       ;; nothing else holding them live.  Build a list from the elements.
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkCons
+       call $mkConsIgnoringScalar
+       call $mkCons
+       call $mkConsIgnoringScalar
+     )
+   )`;
+
+let boxedIntCounter = 0;
+
+function BoxedInt() {
+    this.theInt = boxedIntCounter;
+    boxedIntCounter++;
+}
+
+function mkBoxedInt() {
+    return new BoxedInt();
+}
+
+function printBoxedInt(bi) {
+    print(bi.theInt);
+}
+
+function Cons(hd, tl) {
+    this.hd = hd;
+    this.tl = tl;
+}
+
+function mkCons(hd, tl) {
+    return new Cons(hd, tl);
+}
+
+function showList(list) {
+    print("[");
+    while (list) {
+        printBoxedInt(list.hd);
+        print(",");
+        list = list.tl;
+    }
+    print("]");
+}
+
+function checkList(list, expectedHdValue, expectedLength) {
+    while (list) {
+        if (expectedLength <= 0)
+            return false;
+        if (list.hd.theInt !== expectedHdValue) {
+            return false;
+        }
+        list = list.tl;
+        expectedHdValue++;
+        expectedLength--;
+    }
+    if (expectedLength == 0) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+let i = wasmEvalText(t, {"":{mkCons: mkCons, mkBoxedInt: mkBoxedInt}});
+
+
+function Croissant(chocolate) {
+    this.chocolate = chocolate;
+}
+
+function allocates() {
+    return new Croissant(true);
+}
+
+function handler() {
+    if (DEBUG) {
+        print('XXXXXXXX icallback: START');
+    }
+    let q = allocates();
+    let sum = 0;
+    for (let i = 0; i < 15000; i++) {
+        let x = allocates();
+        // Without this hoop jumping to create an apparent use of |x|, Ion
+        // will remove the allocation call and make the test pointless.
+        if (x == q) { sum++; }
+    }
+    // Artificial use of |sum|.  See comment above.
+    if (sum == 133713371337) { print("unlikely!"); }
+    timeout(1, handler);
+    if (DEBUG) {
+        print('XXXXXXXX icallback: END');
+    }
+    return true;
+}
+
+timeout(1, handler);
+
+for (let n = 0; n < 10000; n++) {
+    let listLowest = boxedIntCounter;
+
+    // Create the list in wasm land, possibly inducing GC on the way
+    let aList = i.exports.mkList();
+
+    // Check it is as we expect
+    let ok = checkList(aList, listLowest, 20/*expected length*/);
+    if (!ok) {
+        print("Failed on list: ");
+        showList(aList);
+    }
+    assertEq(ok, true);
+}
+
+// If we get here, the test finished successfully.
--- a/js/src/jit/JitFrames.cpp
+++ b/js/src/jit/JitFrames.cpp
@@ -1290,16 +1290,20 @@ static void TraceJitActivation(JSTracer*
     // when a GC happens.
     activation->setCheckRegs(false);
   }
 #endif
 
   activation->traceRematerializedFrames(trc);
   activation->traceIonRecovery(trc);
 
+  // This is used for sanity checking continuity of the sequence of wasm stack
+  // maps as we unwind.  It has no functional purpose.
+  uintptr_t highestByteVisitedInPrevWasmFrame = 0;
+
   for (JitFrameIter frames(activation); !frames.done(); ++frames) {
     if (frames.isJSJit()) {
       const JSJitFrameIter& jitFrame = frames.asJSJit();
       switch (jitFrame.type()) {
         case FrameType::Exit:
           TraceJitExitFrame(trc, jitFrame);
           break;
         case FrameType::BaselineJS:
@@ -1326,19 +1330,26 @@ static void TraceJitActivation(JSTracer*
           // in the next iteration.
           break;
         case FrameType::JSJitToWasm:
           TraceJSJitToWasmFrame(trc, jitFrame);
           break;
         default:
           MOZ_CRASH("unexpected frame type");
       }
+      highestByteVisitedInPrevWasmFrame = 0; /* "unknown" */
     } else {
       MOZ_ASSERT(frames.isWasm());
-      frames.asWasm().instance()->trace(trc);
+      uint8_t* nextPC = frames.returnAddressToFp();
+      MOZ_ASSERT(nextPC != 0);
+      wasm::WasmFrameIter& wasmFrameIter = frames.asWasm();
+      wasm::Instance* instance = wasmFrameIter.instance();
+      instance->trace(trc);
+      highestByteVisitedInPrevWasmFrame = instance->traceFrame(
+          trc, wasmFrameIter, nextPC, highestByteVisitedInPrevWasmFrame);
     }
   }
 }
 
 void TraceJitActivations(JSContext* cx, JSTracer* trc) {
   for (JitActivationIterator activations(cx); !activations.done();
        ++activations) {
     TraceJitActivation(trc, activations->asJit());
--- a/js/src/jit/MacroAssembler-inl.h
+++ b/js/src/jit/MacroAssembler-inl.h
@@ -46,37 +46,42 @@ CodeOffset MacroAssembler::PushWithPatch
   return PushWithPatch(ImmWord(uintptr_t(imm.value)));
 }
 
 // ===============================================================
 // Simple call functions.
 
 void MacroAssembler::call(TrampolinePtr code) { call(ImmPtr(code.value)); }
 
-void MacroAssembler::call(const wasm::CallSiteDesc& desc, const Register reg) {
+CodeOffset MacroAssembler::call(const wasm::CallSiteDesc& desc,
+                                const Register reg) {
   CodeOffset l = call(reg);
   append(desc, l);
+  return l;
 }
 
-void MacroAssembler::call(const wasm::CallSiteDesc& desc, uint32_t funcIndex) {
+CodeOffset MacroAssembler::call(const wasm::CallSiteDesc& desc,
+                                uint32_t funcIndex) {
   CodeOffset l = callWithPatch();
   append(desc, l, funcIndex);
+  return l;
 }
 
 void MacroAssembler::call(const wasm::CallSiteDesc& desc, wasm::Trap trap) {
   CodeOffset l = callWithPatch();
   append(desc, l, trap);
 }
 
-void MacroAssembler::call(const wasm::CallSiteDesc& desc,
-                          wasm::SymbolicAddress imm) {
+CodeOffset MacroAssembler::call(const wasm::CallSiteDesc& desc,
+                                wasm::SymbolicAddress imm) {
   MOZ_ASSERT(wasm::NeedsBuiltinThunk(imm),
              "only for functions which may appear in profiler");
-  call(imm);
-  append(desc, CodeOffset(currentOffset()));
+  CodeOffset raOffset = call(imm);
+  append(desc, raOffset);
+  return raOffset;
 }
 
 // ===============================================================
 // ABI function calls.
 
 void MacroAssembler::passABIArg(Register reg) {
   passABIArg(MoveOperand(reg), MoveOp::GENERAL);
 }
--- a/js/src/jit/MacroAssembler.cpp
+++ b/js/src/jit/MacroAssembler.cpp
@@ -3147,37 +3147,41 @@ void MacroAssembler::callWithABINoProfil
     branch32(Assembler::Equal, flagAddr, Imm32(0), &ok);
     assumeUnreachable("callWithABI: callee did not use AutoUnsafeCallWithABI");
     bind(&ok);
     pop(ReturnReg);
   }
 #endif
 }
 
-void MacroAssembler::callWithABI(wasm::BytecodeOffset bytecode,
-                                 wasm::SymbolicAddress imm,
-                                 MoveOp::Type result) {
+CodeOffset MacroAssembler::callWithABI(wasm::BytecodeOffset bytecode,
+                                       wasm::SymbolicAddress imm,
+                                       MoveOp::Type result) {
   MOZ_ASSERT(wasm::NeedsBuiltinThunk(imm));
 
   // We clobber WasmTlsReg below in the loadWasmTlsRegFromFrame(), but Ion
   // assumes it is non-volatile, so preserve it manually.
   Push(WasmTlsReg);
 
   uint32_t stackAdjust;
   callWithABIPre(&stackAdjust, /* callFromWasm = */ true);
 
   // The TLS register is used in builtin thunks and must be set, by ABI:
   // reload it after passing arguments, which might have used it at spill
   // points when placing arguments.
   loadWasmTlsRegFromFrame();
 
-  call(wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
+  CodeOffset raOffset = call(
+      wasm::CallSiteDesc(bytecode.offset(), wasm::CallSite::Symbolic), imm);
+
   callWithABIPost(stackAdjust, result, /* callFromWasm = */ true);
 
   Pop(WasmTlsReg);
+
+  return raOffset;
 }
 
 // ===============================================================
 // Exit frame footer.
 
 void MacroAssembler::linkExitFrame(Register cxreg, Register scratch) {
   loadPtr(Address(cxreg, JSContext::offsetOfActivation()), scratch);
   storeStackPtr(Address(scratch, JitActivation::offsetOfPackedExitFP()));
@@ -3389,18 +3393,18 @@ void MacroAssembler::wasmReserveStackChe
     branchStackPtrRhs(Assembler::Below,
                       Address(WasmTlsReg, offsetof(wasm::TlsData, stackLimit)),
                       &ok);
     wasmTrap(wasm::Trap::StackOverflow, trapOffset);
     bind(&ok);
   }
 }
 
-void MacroAssembler::wasmCallImport(const wasm::CallSiteDesc& desc,
-                                    const wasm::CalleeDesc& callee) {
+CodeOffset MacroAssembler::wasmCallImport(const wasm::CallSiteDesc& desc,
+                                          const wasm::CalleeDesc& callee) {
   // Load the callee, before the caller's registers are clobbered.
   uint32_t globalDataOffset = callee.importGlobalDataOffset();
   loadWasmGlobalPtr(globalDataOffset + offsetof(wasm::FuncImportTls, code),
                     ABINonArgReg0);
 
 #ifndef JS_CODEGEN_NONE
   static_assert(ABINonArgReg0 != WasmTlsReg, "by constraint");
 #endif
@@ -3411,20 +3415,20 @@ void MacroAssembler::wasmCallImport(cons
   loadPtr(Address(WasmTlsReg, offsetof(wasm::TlsData, cx)), ABINonArgReg2);
   storePtr(ABINonArgReg1, Address(ABINonArgReg2, JSContext::offsetOfRealm()));
 
   // Switch to the callee's TLS and pinned registers and make the call.
   loadWasmGlobalPtr(globalDataOffset + offsetof(wasm::FuncImportTls, tls),
                     WasmTlsReg);
   loadWasmPinnedRegsFromTls();
 
-  call(desc, ABINonArgReg0);
+  return call(desc, ABINonArgReg0);
 }
 
-void MacroAssembler::wasmCallBuiltinInstanceMethod(
+CodeOffset MacroAssembler::wasmCallBuiltinInstanceMethod(
     const wasm::CallSiteDesc& desc, const ABIArg& instanceArg,
     wasm::SymbolicAddress builtin) {
   MOZ_ASSERT(instanceArg != ABIArg());
 
   if (instanceArg.kind() == ABIArg::GPR) {
     loadPtr(Address(WasmTlsReg, offsetof(wasm::TlsData, instance)),
             instanceArg.gpr());
   } else if (instanceArg.kind() == ABIArg::Stack) {
@@ -3432,22 +3436,22 @@ void MacroAssembler::wasmCallBuiltinInst
     Register scratch = ABINonArgReg0;
     loadPtr(Address(WasmTlsReg, offsetof(wasm::TlsData, instance)), scratch);
     storePtr(scratch,
              Address(getStackPointer(), instanceArg.offsetFromArgBase()));
   } else {
     MOZ_CRASH("Unknown abi passing style for pointer");
   }
 
-  call(desc, builtin);
+  return call(desc, builtin);
 }
 
-void MacroAssembler::wasmCallIndirect(const wasm::CallSiteDesc& desc,
-                                      const wasm::CalleeDesc& callee,
-                                      bool needsBoundsCheck) {
+CodeOffset MacroAssembler::wasmCallIndirect(const wasm::CallSiteDesc& desc,
+                                            const wasm::CalleeDesc& callee,
+                                            bool needsBoundsCheck) {
   Register scratch = WasmTableCallScratchReg0;
   Register index = WasmTableCallIndexReg;
 
   // Optimization opportunity: when offsetof(FunctionTableElem, code) == 0, as
   // it is at present, we can probably generate better code here by folding
   // the address computation into the load.
 
   static_assert(sizeof(wasm::FunctionTableElem) == 8 ||
@@ -3460,18 +3464,17 @@ void MacroAssembler::wasmCallIndirect(co
     loadWasmGlobalPtr(callee.tableFunctionBaseGlobalDataOffset(), scratch);
     if (sizeof(wasm::FunctionTableElem) == 8) {
       computeEffectiveAddress(BaseIndex(scratch, index, TimesEight), scratch);
     } else {
       lshift32(Imm32(4), index);
       addPtr(index, scratch);
     }
     loadPtr(Address(scratch, offsetof(wasm::FunctionTableElem, code)), scratch);
-    call(desc, scratch);
-    return;
+    return call(desc, scratch);
   }
 
   MOZ_ASSERT(callee.which() == wasm::CalleeDesc::WasmTable);
 
   // Write the functype-id into the ABI functype-id register.
   wasm::FuncTypeIdDesc funcTypeId = callee.wasmTableSigId();
   switch (funcTypeId.kind()) {
     case wasm::FuncTypeIdDescKind::Global:
@@ -3514,17 +3517,17 @@ void MacroAssembler::wasmCallIndirect(co
   wasmTrap(wasm::Trap::IndirectCallToNull, trapOffset);
   bind(&nonNull);
 
   loadWasmPinnedRegsFromTls();
   switchToWasmTlsRealm(index, WasmTableCallScratchReg1);
 
   loadPtr(Address(scratch, offsetof(wasm::FunctionTableElem, code)), scratch);
 
-  call(desc, scratch);
+  return call(desc, scratch);
 }
 
 void MacroAssembler::emitPreBarrierFastPath(JSRuntime* rt, MIRType type,
                                             Register temp1, Register temp2,
                                             Register temp3, Label* noBarrier) {
   MOZ_ASSERT(temp1 != PreBarrierReg);
   MOZ_ASSERT(temp2 != PreBarrierReg);
   MOZ_ASSERT(temp3 != PreBarrierReg);
--- a/js/src/jit/MacroAssembler.h
+++ b/js/src/jit/MacroAssembler.h
@@ -416,32 +416,36 @@ class MacroAssembler : public MacroAssem
   // Manipulated by the AutoGenericRegisterScope class.
   AllocatableRegisterSet debugTrackedRegisters_;
 #endif  // DEBUG
 
  public:
   // ===============================================================
   // Simple call functions.
 
+  // The returned CodeOffset is the assembler offset for the instruction
+  // immediately following the call; that is, for the return point.
   CodeOffset call(Register reg) PER_SHARED_ARCH;
   CodeOffset call(Label* label) PER_SHARED_ARCH;
+
   void call(const Address& addr) PER_SHARED_ARCH;
   void call(ImmWord imm) PER_SHARED_ARCH;
   // Call a target native function, which is neither traceable nor movable.
   void call(ImmPtr imm) PER_SHARED_ARCH;
-  void call(wasm::SymbolicAddress imm) PER_SHARED_ARCH;
-  inline void call(const wasm::CallSiteDesc& desc, wasm::SymbolicAddress imm);
+  CodeOffset call(wasm::SymbolicAddress imm) PER_SHARED_ARCH;
+  inline CodeOffset call(const wasm::CallSiteDesc& desc,
+                         wasm::SymbolicAddress imm);
 
   // Call a target JitCode, which must be traceable, and may be movable.
   void call(JitCode* c) PER_SHARED_ARCH;
 
   inline void call(TrampolinePtr code);
 
-  inline void call(const wasm::CallSiteDesc& desc, const Register reg);
-  inline void call(const wasm::CallSiteDesc& desc, uint32_t funcDefIndex);
+  inline CodeOffset call(const wasm::CallSiteDesc& desc, const Register reg);
+  inline CodeOffset call(const wasm::CallSiteDesc& desc, uint32_t funcDefIndex);
   inline void call(const wasm::CallSiteDesc& desc, wasm::Trap trap);
 
   CodeOffset callWithPatch() PER_SHARED_ARCH;
   void patchCall(uint32_t callerOffset, uint32_t calleeOffset) PER_SHARED_ARCH;
 
   // Push the return address and make a call. On platforms where this function
   // is not defined, push the link register (pushReturnAddress) at the entry
   // point of the callee.
@@ -577,18 +581,18 @@ class MacroAssembler : public MacroAssem
 
   inline void callWithABI(
       void* fun, MoveOp::Type result = MoveOp::GENERAL,
       CheckUnsafeCallWithABI check = CheckUnsafeCallWithABI::Check);
   inline void callWithABI(Register fun, MoveOp::Type result = MoveOp::GENERAL);
   inline void callWithABI(const Address& fun,
                           MoveOp::Type result = MoveOp::GENERAL);
 
-  void callWithABI(wasm::BytecodeOffset offset, wasm::SymbolicAddress fun,
-                   MoveOp::Type result = MoveOp::GENERAL);
+  CodeOffset callWithABI(wasm::BytecodeOffset offset, wasm::SymbolicAddress fun,
+                         MoveOp::Type result = MoveOp::GENERAL);
 
  private:
   // Reinitialize the variables which have to be cleared before making a call
   // with callWithABI.
   void setupABICall();
 
   // Reserve the stack and resolve the arguments move.
   void callWithABIPre(uint32_t* stackAdjust,
@@ -1858,29 +1862,30 @@ class MacroAssembler : public MacroAssem
       DEFINED_ON(arm64, x86, x64, mips64);
   void oolWasmTruncateCheckF32ToI64(FloatRegister input, Register64 output,
                                     TruncFlags flags, wasm::BytecodeOffset off,
                                     Label* rejoin)
       DEFINED_ON(arm, arm64, x86_shared, mips_shared);
 
   // This function takes care of loading the callee's TLS and pinned regs but
   // it is the caller's responsibility to save/restore TLS or pinned regs.
-  void wasmCallImport(const wasm::CallSiteDesc& desc,
-                      const wasm::CalleeDesc& callee);
+  CodeOffset wasmCallImport(const wasm::CallSiteDesc& desc,
+                            const wasm::CalleeDesc& callee);
 
   // WasmTableCallIndexReg must contain the index of the indirect call.
-  void wasmCallIndirect(const wasm::CallSiteDesc& desc,
-                        const wasm::CalleeDesc& callee, bool needsBoundsCheck);
+  CodeOffset wasmCallIndirect(const wasm::CallSiteDesc& desc,
+                              const wasm::CalleeDesc& callee,
+                              bool needsBoundsCheck);
 
   // This function takes care of loading the pointer to the current instance
   // as the implicit first argument. It preserves TLS and pinned registers.
   // (TLS & pinned regs are non-volatile registers in the system ABI).
-  void wasmCallBuiltinInstanceMethod(const wasm::CallSiteDesc& desc,
-                                     const ABIArg& instanceArg,
-                                     wasm::SymbolicAddress builtin);
+  CodeOffset wasmCallBuiltinInstanceMethod(const wasm::CallSiteDesc& desc,
+                                           const ABIArg& instanceArg,
+                                           wasm::SymbolicAddress builtin);
 
   // As enterFakeExitFrame(), but using register conventions appropriate for
   // wasm stubs.
   void enterFakeExitFrameForWasm(Register cxreg, Register scratch,
                                  ExitFrameType type) PER_SHARED_ARCH;
 
  public:
   // ========================================================================
--- a/js/src/jit/Registers.h
+++ b/js/src/jit/Registers.h
@@ -258,16 +258,19 @@ class MachineState {
 
   bool has(Register reg) const { return regs_[reg.code()] != nullptr; }
   bool has(FloatRegister reg) const { return fpregs_[reg.code()] != nullptr; }
   uintptr_t read(Register reg) const { return regs_[reg.code()]->r; }
   double read(FloatRegister reg) const { return fpregs_[reg.code()]->d; }
   void write(Register reg, uintptr_t value) const {
     regs_[reg.code()]->r = value;
   }
+  const Registers::RegisterContent* address(Register reg) const {
+    return regs_[reg.code()];
+  }
   const FloatRegisters::RegisterContent* address(FloatRegister reg) const {
     return fpregs_[reg.code()];
   }
 };
 
 class MacroAssembler;
 
 // Declares a register as owned within the scope of the object.
--- a/js/src/jit/arm/MacroAssembler-arm.cpp
+++ b/js/src/jit/arm/MacroAssembler-arm.cpp
@@ -4172,19 +4172,19 @@ CodeOffset MacroAssembler::call(Label* l
 void MacroAssembler::call(ImmWord imm) { call(ImmPtr((void*)imm.value)); }
 
 void MacroAssembler::call(ImmPtr imm) {
   BufferOffset bo = m_buffer.nextOffset();
   addPendingJump(bo, imm, RelocationKind::HARDCODED);
   ma_call(imm);
 }
 
-void MacroAssembler::call(wasm::SymbolicAddress imm) {
+CodeOffset MacroAssembler::call(wasm::SymbolicAddress imm) {
   movePtr(imm, CallReg);
-  call(CallReg);
+  return call(CallReg);
 }
 
 void MacroAssembler::call(const Address& addr) {
   loadPtr(addr, CallReg);
   call(CallReg);
 }
 
 void MacroAssembler::call(JitCode* c) {
--- a/js/src/jit/arm64/MacroAssembler-arm64.cpp
+++ b/js/src/jit/arm64/MacroAssembler-arm64.cpp
@@ -578,22 +578,22 @@ CodeOffset MacroAssembler::call(Label* l
 void MacroAssembler::call(ImmWord imm) { call(ImmPtr((void*)imm.value)); }
 
 void MacroAssembler::call(ImmPtr imm) {
   syncStackPtr();
   movePtr(imm, ip0);
   Blr(vixl::ip0);
 }
 
-void MacroAssembler::call(wasm::SymbolicAddress imm) {
+CodeOffset MacroAssembler::call(wasm::SymbolicAddress imm) {
   vixl::UseScratchRegisterScope temps(this);
   const Register scratch = temps.AcquireX().asUnsized();
   syncStackPtr();
   movePtr(imm, scratch);
-  call(scratch);
+  return call(scratch);
 }
 
 void MacroAssembler::call(const Address& addr) {
   vixl::UseScratchRegisterScope temps(this);
   const Register scratch = temps.AcquireX().asUnsized();
   syncStackPtr();
   loadPtr(addr, scratch);
   call(scratch);
--- a/js/src/jit/mips-shared/MacroAssembler-mips-shared.cpp
+++ b/js/src/jit/mips-shared/MacroAssembler-mips-shared.cpp
@@ -1452,19 +1452,19 @@ CodeOffset MacroAssembler::farJumpWithPa
 
 void MacroAssembler::patchFarJump(CodeOffset farJump, uint32_t targetOffset) {
   uint32_t* u32 =
       reinterpret_cast<uint32_t*>(editSrc(BufferOffset(farJump.offset())));
   MOZ_ASSERT(*u32 == UINT32_MAX);
   *u32 = targetOffset - farJump.offset();
 }
 
-void MacroAssembler::call(wasm::SymbolicAddress target) {
+CodeOffset MacroAssembler::call(wasm::SymbolicAddress target) {
   movePtr(target, CallReg);
-  call(CallReg);
+  return call(CallReg);
 }
 
 void MacroAssembler::call(const Address& addr) {
   loadPtr(addr, CallReg);
   call(CallReg);
 }
 
 void MacroAssembler::call(ImmWord target) { call(ImmPtr((void*)target.value)); }
--- a/js/src/jit/x86-shared/MacroAssembler-x86-shared.cpp
+++ b/js/src/jit/x86-shared/MacroAssembler-x86-shared.cpp
@@ -594,19 +594,19 @@ void MacroAssembler::PopStackPtr() { Pop
 CodeOffset MacroAssembler::call(Register reg) { return Assembler::call(reg); }
 
 CodeOffset MacroAssembler::call(Label* label) { return Assembler::call(label); }
 
 void MacroAssembler::call(const Address& addr) {
   Assembler::call(Operand(addr.base, addr.offset));
 }
 
-void MacroAssembler::call(wasm::SymbolicAddress target) {
+CodeOffset MacroAssembler::call(wasm::SymbolicAddress target) {
   mov(target, eax);
-  Assembler::call(eax);
+  return Assembler::call(eax);
 }
 
 void MacroAssembler::call(ImmWord target) { Assembler::call(target); }
 
 void MacroAssembler::call(ImmPtr target) { Assembler::call(target); }
 
 void MacroAssembler::call(JitCode* target) { Assembler::call(target); }
 
--- a/js/src/vm/Stack.cpp
+++ b/js/src/vm/Stack.cpp
@@ -535,16 +535,23 @@ JS::Realm* JitFrameIter::realm() const {
 
   if (isWasm()) {
     return asWasm().instance()->realm();
   }
 
   return asJSJit().script()->realm();
 }
 
+uint8_t* JitFrameIter::returnAddressToFp() const {
+  if (isWasm()) {
+    return asWasm().returnAddressToFp();
+  }
+  return asJSJit().returnAddressToFp();
+}
+
 bool JitFrameIter::done() const {
   if (!isSome()) {
     return true;
   }
   if (isJSJit()) {
     return asJSJit().done();
   }
   if (isWasm()) {
--- a/js/src/vm/Stack.h
+++ b/js/src/vm/Stack.h
@@ -1878,16 +1878,20 @@ class JitFrameIter {
 
   // Operations common to all frame iterators.
   const jit::JitActivation* activation() const { return act_; }
   bool done() const;
   void operator++();
 
   JS::Realm* realm() const;
 
+  // Returns the return address of the frame above this one (that is, the
+  // return address that returns back to the current frame).
+  uint8_t* returnAddressToFp() const;
+
   // Operations which have an effect only on JIT frames.
   void skipNonScriptedJSFrames();
 
   // Returns true iff this is a JIT frame with a self-hosted script. Note: be
   // careful, JitFrameIter does not consider functions inlined by Ion.
   bool isSelfHostedIgnoringInlining() const;
 };
 
--- a/js/src/wasm/WasmBaselineCompile.cpp
+++ b/js/src/wasm/WasmBaselineCompile.cpp
@@ -137,16 +137,17 @@
 #include "jit/mips-shared/Assembler-mips-shared.h"
 #include "jit/mips64/Assembler-mips64.h"
 #endif
 
 #include "wasm/WasmGenerator.h"
 #include "wasm/WasmInstance.h"
 #include "wasm/WasmOpIter.h"
 #include "wasm/WasmSignalHandlers.h"
+#include "wasm/WasmStubs.h"
 #include "wasm/WasmValidate.h"
 
 #include "jit/MacroAssembler-inl.h"
 
 using mozilla::DebugOnly;
 using mozilla::FloorLog2;
 using mozilla::IsPowerOfTwo;
 using mozilla::Maybe;
@@ -1577,20 +1578,20 @@ class BaseStackFrame final : public Base
   void storeLocalF64(RegF64 src, const Local& dest) {
     masm.storeDouble(src, Address(sp_, localOffset(dest)));
   }
 
   void storeLocalF32(RegF32 src, const Local& dest) {
     masm.storeFloat32(src, Address(sp_, localOffset(dest)));
   }
 
- private:
   // Offset off of sp_ for `local`.
   int32_t localOffset(const Local& local) { return localOffset(local.offs); }
 
+ private:
   // Offset off of sp_ for a local with offset `offset` from Frame.
   int32_t localOffset(int32_t offset) { return masm.framePushed() - offset; }
 
  public:
   ///////////////////////////////////////////////////////////////////////////
   //
   // Dynamic area
 
@@ -1822,16 +1823,562 @@ void BaseStackFrame::zeroLocals(BaseRegA
     masm.storePtr(zero, Address(p, -(wordSize * i)));
   }
 
   ra->freeI32(p);
   ra->freeI32(lim);
   ra->freeI32(zero);
 }
 
+// Value stack: stack elements
+
+struct Stk {
+ private:
+  Stk() : kind_(Unknown), i64val_(0) {}
+
+ public:
+  enum Kind {
+    // The Mem opcodes are all clustered at the beginning to
+    // allow for a quick test within sync().
+    MemI32,  // 32-bit integer stack value ("offs")
+    MemI64,  // 64-bit integer stack value ("offs")
+    MemF32,  // 32-bit floating stack value ("offs")
+    MemF64,  // 64-bit floating stack value ("offs")
+    MemRef,  // reftype (pointer wide) stack value ("offs")
+
+    // The Local opcodes follow the Mem opcodes for a similar
+    // quick test within hasLocal().
+    LocalI32,  // Local int32 var ("slot")
+    LocalI64,  // Local int64 var ("slot")
+    LocalF32,  // Local float32 var ("slot")
+    LocalF64,  // Local double var ("slot")
+    LocalRef,  // Local reftype (pointer wide) var ("slot")
+
+    RegisterI32,  // 32-bit integer register ("i32reg")
+    RegisterI64,  // 64-bit integer register ("i64reg")
+    RegisterF32,  // 32-bit floating register ("f32reg")
+    RegisterF64,  // 64-bit floating register ("f64reg")
+    RegisterRef,  // reftype (pointer wide) register ("refReg")
+
+    ConstI32,  // 32-bit integer constant ("i32val")
+    ConstI64,  // 64-bit integer constant ("i64val")
+    ConstF32,  // 32-bit floating constant ("f32val")
+    ConstF64,  // 64-bit floating constant ("f64val")
+    ConstRef,  // reftype (pointer wide) constant ("refval")
+
+    Unknown,
+  };
+
+  Kind kind_;
+
+  static const Kind MemLast = MemRef;
+  static const Kind LocalLast = LocalRef;
+
+  union {
+    RegI32 i32reg_;
+    RegI64 i64reg_;
+    RegPtr refReg_;
+    RegF32 f32reg_;
+    RegF64 f64reg_;
+    int32_t i32val_;
+    int64_t i64val_;
+    intptr_t refval_;
+    float f32val_;
+    double f64val_;
+    uint32_t slot_;
+    uint32_t offs_;
+  };
+
+  explicit Stk(RegI32 r) : kind_(RegisterI32), i32reg_(r) {}
+  explicit Stk(RegI64 r) : kind_(RegisterI64), i64reg_(r) {}
+  explicit Stk(RegPtr r) : kind_(RegisterRef), refReg_(r) {}
+  explicit Stk(RegF32 r) : kind_(RegisterF32), f32reg_(r) {}
+  explicit Stk(RegF64 r) : kind_(RegisterF64), f64reg_(r) {}
+  explicit Stk(int32_t v) : kind_(ConstI32), i32val_(v) {}
+  explicit Stk(int64_t v) : kind_(ConstI64), i64val_(v) {}
+  explicit Stk(float v) : kind_(ConstF32), f32val_(v) {}
+  explicit Stk(double v) : kind_(ConstF64), f64val_(v) {}
+  explicit Stk(Kind k, uint32_t v) : kind_(k), slot_(v) {
+    MOZ_ASSERT(k > MemLast && k <= LocalLast);
+  }
+  static Stk StkRef(intptr_t v) {
+    Stk s;
+    s.kind_ = ConstRef;
+    s.refval_ = v;
+    return s;
+  }
+
+  void setOffs(Kind k, uint32_t v) {
+    MOZ_ASSERT(k <= MemLast);
+    kind_ = k;
+    offs_ = v;
+  }
+
+  Kind kind() const { return kind_; }
+  bool isMem() const { return kind_ <= MemLast; }
+
+  RegI32 i32reg() const {
+    MOZ_ASSERT(kind_ == RegisterI32);
+    return i32reg_;
+  }
+  RegI64 i64reg() const {
+    MOZ_ASSERT(kind_ == RegisterI64);
+    return i64reg_;
+  }
+  RegPtr refReg() const {
+    MOZ_ASSERT(kind_ == RegisterRef);
+    return refReg_;
+  }
+  RegF32 f32reg() const {
+    MOZ_ASSERT(kind_ == RegisterF32);
+    return f32reg_;
+  }
+  RegF64 f64reg() const {
+    MOZ_ASSERT(kind_ == RegisterF64);
+    return f64reg_;
+  }
+
+  int32_t i32val() const {
+    MOZ_ASSERT(kind_ == ConstI32);
+    return i32val_;
+  }
+  int64_t i64val() const {
+    MOZ_ASSERT(kind_ == ConstI64);
+    return i64val_;
+  }
+  intptr_t refval() const {
+    MOZ_ASSERT(kind_ == ConstRef);
+    return refval_;
+  }
+
+  // For these two, use an out-param instead of simply returning, to
+  // use the normal stack and not the x87 FP stack (which has effect on
+  // NaNs with the signaling bit set).
+
+  void f32val(float* out) const {
+    MOZ_ASSERT(kind_ == ConstF32);
+    *out = f32val_;
+  }
+  void f64val(double* out) const {
+    MOZ_ASSERT(kind_ == ConstF64);
+    *out = f64val_;
+  }
+
+  uint32_t slot() const {
+    MOZ_ASSERT(kind_ > MemLast && kind_ <= LocalLast);
+    return slot_;
+  }
+  uint32_t offs() const {
+    MOZ_ASSERT(isMem());
+    return offs_;
+  }
+};
+
+typedef Vector<Stk, 8, SystemAllocPolicy> StkVector;
+
+// MachineStackTracker, used for stack-slot pointerness tracking.
+
+class MachineStackTracker {
+  // Simulates the machine's stack, with one bool per word.  Index zero in
+  // this vector corresponds to the highest address in the machine stack.  The
+  // last entry corresponds to what SP currently points at.  This all assumes
+  // a grow-down stack.
+  //
+  // numPtrs_ contains the number of "true" values in vec_, and is therefore
+  // redundant.  But it serves as a constant-time way to detect the common
+  // case where vec_ holds no "true" values.
+  size_t numPtrs_;
+  Vector<bool, 64, SystemAllocPolicy> vec_;
+
+ public:
+  MachineStackTracker() : numPtrs_(0) {}
+
+  ~MachineStackTracker() {
+#ifdef DEBUG
+    size_t n = 0;
+    for (bool b : vec_) {
+      n += (b ? 1 : 0);
+    }
+    MOZ_ASSERT(n == numPtrs_);
+#endif
+  }
+
+  // Clone this MachineStackTracker, writing the result at |dst|.
+  MOZ_MUST_USE bool cloneTo(MachineStackTracker* dst) {
+    MOZ_ASSERT(dst->vec_.empty());
+    if (!dst->vec_.appendAll(vec_)) {
+      return false;
+    }
+    dst->numPtrs_ = numPtrs_;
+    return true;
+  }
+
+  // Notionally push |n| non-pointers on the stack.
+  MOZ_MUST_USE bool pushNonGCPointers(size_t n) {
+    return vec_.appendN(false, n);
+  }
+
+  // Mark the stack slot |offsetFromSP| up from the bottom as holding a
+  // pointer.
+  void setGCPointer(size_t offsetFromSP) {
+    // Offset 0 is the most recently pushed, offset 1 is the second most
+    // recently pushed item, etc.
+    MOZ_ASSERT(offsetFromSP < vec_.length());
+
+    size_t offsetFromTop = vec_.length() - 1 - offsetFromSP;
+    numPtrs_ = numPtrs_ + 1 - (vec_[offsetFromTop] ? 1 : 0);
+    vec_[offsetFromTop] = true;
+  }
+
+  // Query the pointerness of the slot |offsetFromSP| up from the bottom.
+  bool isGCPointer(size_t offsetFromSP) {
+    MOZ_ASSERT(offsetFromSP < vec_.length());
+    return vec_[offsetFromSP];
+  }
+
+  // Return the number of words tracked by this MachineStackTracker.
+  size_t length() { return vec_.length(); }
+
+  // Return the number of pointer-typed words tracked by this
+  // MachineStackTracker.
+  size_t numPtrs() {
+    MOZ_ASSERT(numPtrs_ <= length());
+    return numPtrs_;
+  }
+};
+
+// StackMapGenerator, which carries all state needed to create stack maps.
+
+enum class HasRefTypedDebugFrame { No, Yes };
+
+struct StackMapGenerator {
+ private:
+  // --- These are constant for the life of the function's compilation ---
+
+  // For generating stack maps, we'll need to know the offsets of registers
+  // as saved by the trap exit stub.
+  const MachineState& trapExitLayout_;
+  const size_t trapExitLayoutNumWords_;
+
+  // Completed stackmaps are added here
+  StackMaps* stackMaps_;
+
+  // So as to be able to get current offset when creating stack maps
+  const MacroAssembler& masm_;
+
+ public:
+  // --- These are constant once we've completed beginFunction() ---
+
+  // The number of words of arguments passed to this function in memory.
+  size_t numStackArgWords_;
+
+  MachineStackTracker mst_;  // tracks machine stack pointerness
+
+  // This holds masm.framePushed at entry to the function's body.  It is a
+  // Maybe because createStackMap needs to know whether or not we're still
+  // in the prologue.  It makes a Nothing-to-Some transition just once per
+  // function.
+  Maybe<uint32_t> framePushedAtEntryToBody_;
+
+  // --- These can change at any point ---
+
+  // This holds masm.framePushed immediately before we move the stack
+  // pointer down so as to reserve space, in a function call, for arguments
+  // passed in memory.  To be more precise: this holds the value
+  // masm.framePushed would have had after moving the stack pointer over any
+  // alignment padding pushed before the arguments proper, but before the
+  // downward movement of the stack pointer that allocates space for the
+  // arguments proper.
+  //
+  // When not inside a function call setup/teardown sequence, it is Nothing.
+  // It can make Nothing-to/from-Some transitions arbitrarily as we progress
+  // through the function body.
+  Maybe<uint32_t> framePushedBeforePushingCallArgs_;
+
+  // The number of memory-resident, ref-typed entries on the containing
+  // BaseCompiler::stk_.
+  size_t memRefsOnStk_;
+
+  StackMapGenerator(StackMaps* stackMaps, const MachineState& trapExitLayout,
+                    const size_t trapExitLayoutNumWords,
+                    const MacroAssembler& masm)
+      : trapExitLayout_(trapExitLayout),
+        trapExitLayoutNumWords_(trapExitLayoutNumWords),
+        stackMaps_(stackMaps),
+        masm_(masm),
+        memRefsOnStk_(0) {}
+
+  // At the beginning of a function, we may have live roots in registers (as
+  // arguments) at the point where we perform a stack overflow check.  This
+  // method generates the "extra" stackmap entries to describe that, in the
+  // case that the check fails and we wind up calling into the wasm exit
+  // stub, as generated by GenerateTrapExit().
+  //
+  // The resulting map must correspond precisely with the stack layout
+  // created for the integer registers as saved by (code generated by)
+  // GenerateTrapExit().  To do that we use trapExitLayout_ and
+  // trapExitLayoutNumWords_, which together comprise a description of the
+  // layout and are created by GenerateTrapExitMachineState().
+  MOZ_MUST_USE bool generateStackmapEntriesForTrapExit(
+      const ValTypeVector& args, ExitStubMapVector& extras) {
+    MOZ_ASSERT(extras.empty());
+
+    // If this doesn't hold, we can't distinguish saved and not-saved
+    // registers in the MachineState.  See MachineState::MachineState().
+    MOZ_ASSERT(trapExitLayoutNumWords_ < 0x100);
+
+    if (!extras.appendN(false, trapExitLayoutNumWords_)) {
+      return false;
+    }
+
+    for (ABIArgIter<const ValTypeVector> i(args); !i.done(); i++) {
+      if (!i->argInRegister() || i.mirType() != MIRType::Pointer) {
+        continue;
+      }
+
+      size_t offsetFromTop =
+          reinterpret_cast<size_t>(trapExitLayout_.address(i->gpr()));
+
+      // If this doesn't hold, the associated register wasn't saved by
+      // the trap exit stub.  Better to crash now than much later, in
+      // some obscure place, and possibly with security consequences.
+      MOZ_RELEASE_ASSERT(offsetFromTop < trapExitLayoutNumWords_);
+
+      // offsetFromTop is an offset in words down from the highest
+      // address in the exit stub save area.  Switch it around to be an
+      // offset up from the bottom of the (integer register) save area.
+      size_t offsetFromBottom = trapExitLayoutNumWords_ - 1 - offsetFromTop;
+
+      extras[offsetFromBottom] = true;
+    }
+
+    return true;
+  }
+
+  // Creates a stackmap associated with the instruction denoted by
+  // |assemblerOffset|, incorporating pointers from the current operand
+  // stack |stk|, incorporating possible extra pointers in |extra| at the
+  // lower addressed end, and possibly with the associated frame having a
+  // ref-typed DebugFrame as indicated by |refDebugFrame|.
+  MOZ_MUST_USE bool createStackMap(const char* who,
+                                   const ExitStubMapVector& extras,
+                                   uint32_t assemblerOffset,
+                                   HasRefTypedDebugFrame refDebugFrame,
+                                   const StkVector& stk) {
+    size_t countedPointers = mst_.numPtrs() + memRefsOnStk_;
+#ifndef DEBUG
+    // An important optimization.  If there are obviously no pointers, as
+    // we expect in the majority of cases, exit quickly.
+    if (countedPointers == 0 && extras.empty() &&
+        refDebugFrame == HasRefTypedDebugFrame::No) {
+      return true;
+    }
+#else
+    // In the debug case, create the stack map regardless, and cross-check
+    // the pointer-counting below.  We expect the final map to have
+    // |countedPointers| in total.  This doesn't include those in the
+    // DebugFrame, but they do not appear in the map's bitmap.  Note that
+    // |countedPointers| is debug-only from this point onwards.
+    for (bool b : extras) {
+      countedPointers += (b ? 1 : 0);
+    }
+#endif
+
+    // Start with the frame-setup map, and add operand-stack information
+    // to that.
+    MachineStackTracker augmentedMst;
+    if (!mst_.cloneTo(&augmentedMst)) {
+      return false;
+    }
+
+    // At this point, augmentedMst only contains entries covering the
+    // incoming argument area (if any) and for the area allocated by this
+    // function's prologue.  We now need to calculate how far the machine's
+    // stack pointer is below where it was at the start of the body.  But we
+    // must take care not to include any words pushed as arguments to an
+    // upcoming function call, since those words "belong" to the stackmap of
+    // the callee, not to the stackmap of this function.  Note however that
+    // any alignment padding pushed prior to pushing the args *does* belong to
+    // this function.  That padding is taken into account at the point where
+    // framePushedBeforePushingCallArgs_ is set.
+    Maybe<uint32_t> framePushedExcludingArgs;
+    if (framePushedAtEntryToBody_.isNothing()) {
+      // Still in the prologue.  framePushedExcludingArgs remains Nothing.
+      MOZ_ASSERT(framePushedBeforePushingCallArgs_.isNothing());
+    } else {
+      // In the body.
+      MOZ_ASSERT(masm_.framePushed() >= framePushedAtEntryToBody_.value());
+      if (framePushedBeforePushingCallArgs_.isSome()) {
+        // In the body, and we've potentially pushed some args onto the stack.
+        // We must ignore them when sizing the stackmap.
+        MOZ_ASSERT(masm_.framePushed() >=
+                   framePushedBeforePushingCallArgs_.value());
+        MOZ_ASSERT(framePushedBeforePushingCallArgs_.value() >=
+                   framePushedAtEntryToBody_.value());
+        framePushedExcludingArgs =
+            Some(framePushedBeforePushingCallArgs_.value());
+      } else {
+        // In the body, but not with call args on the stack.  The stackmap
+        // must be sized so as to extend all the way "down" to
+        // masm_.framePushed().
+        framePushedExcludingArgs = Some(masm_.framePushed());
+      }
+    }
+
+    if (framePushedExcludingArgs.isSome()) {
+      uint32_t bodyPushedBytes =
+          framePushedExcludingArgs.value() - framePushedAtEntryToBody_.value();
+      MOZ_ASSERT(0 == bodyPushedBytes % sizeof(void*));
+      if (!augmentedMst.pushNonGCPointers(bodyPushedBytes / sizeof(void*))) {
+        return false;
+      }
+    }
+
+    // Scan the operand stack, marking pointers in the just-added new
+    // section.
+    MOZ_ASSERT_IF(framePushedAtEntryToBody_.isNothing(), stk.empty());
+    MOZ_ASSERT_IF(framePushedExcludingArgs.isNothing(), stk.empty());
+
+    for (const Stk& v : stk) {
+#ifndef DEBUG
+      // We don't track roots in registers, per rationale below, so if this
+      // doesn't hold, something is seriously wrong, and we're likely to get a
+      // GC-related crash.
+      MOZ_RELEASE_ASSERT(v.kind() != Stk::RegisterRef);
+      if (v.kind() != Stk::MemRef) {
+        continue;
+      }
+#else
+      // Take the opportunity to check everything we reasonably can about
+      // operand stack elements.
+      switch (v.kind()) {
+        case Stk::MemI32:
+        case Stk::MemI64:
+        case Stk::MemF32:
+        case Stk::MemF64:
+        case Stk::ConstI32:
+        case Stk::ConstI64:
+        case Stk::ConstF32:
+        case Stk::ConstF64:
+          // All of these have uninteresting type.
+          continue;
+        case Stk::LocalI32:
+        case Stk::LocalI64:
+        case Stk::LocalF32:
+        case Stk::LocalF64:
+          // These also have uninteresting type.  Check that they live in the
+          // section of stack set up by beginFunction().  The unguarded use of
+          // |value()| here is safe due to the assertion above this loop.
+          MOZ_ASSERT(v.offs() <= framePushedAtEntryToBody_.value());
+          continue;
+        case Stk::RegisterI32:
+        case Stk::RegisterI64:
+        case Stk::RegisterF32:
+        case Stk::RegisterF64:
+          // These also have uninteresting type, but more to the point: all
+          // registers holding live values should have been flushed to the
+          // machine stack immediately prior to the instruction to which this
+          // stackmap pertains.  So these can't happen.
+          MOZ_CRASH("createStackMap: operand stack has Register-non-Ref");
+        case Stk::MemRef:
+          // This is the only case we care about.  We'll handle it after the
+          // switch.
+          break;
+        case Stk::LocalRef:
+          // We need the stackmap to mention this pointer, but it should
+          // already be in the mst_ section created by beginFunction().
+          MOZ_ASSERT(v.offs() <= framePushedAtEntryToBody_.value());
+          continue;
+        case Stk::ConstRef:
+          // This can currently only be a null pointer.
+          MOZ_ASSERT(v.refval() == 0);
+          continue;
+        case Stk::RegisterRef:
+          // This can't happen, per rationale above.
+          MOZ_CRASH("createStackMap: operand stack contains RegisterRef");
+        default:
+          MOZ_CRASH("createStackMap: unknown operand stack element");
+      }
+#endif
+      // v.offs() holds masm.framePushed() at the point immediately after it
+      // was pushed on the stack.  Since it's still on the stack,
+      // masm.framePushed() can't be less.
+      MOZ_ASSERT(v.offs() <= framePushedExcludingArgs.value());
+      uint32_t offsFromMapLowest = framePushedExcludingArgs.value() - v.offs();
+      MOZ_ASSERT(0 == offsFromMapLowest % sizeof(void*));
+      augmentedMst.setGCPointer(offsFromMapLowest / sizeof(void*));
+    }
+
+    // Create the final StackMap.  The initial map is zeroed out, so there's
+    // no need to write zero bits in it.
+    const uint32_t extraWords = extras.length();
+    const uint32_t augmentedMstWords = augmentedMst.length();
+    const uint32_t numMappedWords = extraWords + augmentedMstWords;
+    StackMap* stackMap = StackMap::create(numMappedWords);
+    if (!stackMap) {
+      return false;
+    }
+
+    {
+      // First the exit stub extra words, if any.
+      uint32_t i = 0;
+      for (bool b : extras) {
+        if (b) {
+          stackMap->setBit(i);
+        }
+        i++;
+      }
+    }
+    // Followed by the "main" part of the map.
+    for (uint32_t i = 0; i < augmentedMstWords; i++) {
+      if (augmentedMst.isGCPointer(i)) {
+        stackMap->setBit(numMappedWords - 1 - i);
+      }
+    }
+
+    stackMap->setExitStubWords(extraWords);
+
+    // Record in the map, how far down from the highest address the Frame* is.
+    // Take the opportunity to check that we haven't marked any part of the
+    // Frame itself as a pointer.
+    stackMap->setFrameOffsetFromTop(numStackArgWords_ +
+                                    sizeof(Frame) / sizeof(void*));
+#ifdef DEBUG
+    for (uint32_t i = 0; i < sizeof(Frame) / sizeof(void*); i++) {
+      MOZ_ASSERT(stackMap->getBit(stackMap->numMappedWords -
+                                  stackMap->frameOffsetFromTop + i) == 0);
+    }
+#endif
+
+    // Note the presence of a ref-typed DebugFrame, if any.
+    if (refDebugFrame == HasRefTypedDebugFrame::Yes) {
+      stackMap->setHasRefTypedDebugFrame();
+    }
+
+    // Add the completed map to the running collection thereof.
+    if (!stackMaps_->add((uint8_t*)(uintptr_t)assemblerOffset, stackMap)) {
+      return false;
+    }
+
+#ifdef DEBUG
+    {
+      // Crosscheck the map pointer counting.
+      uint32_t nw = stackMap->numMappedWords;
+      uint32_t np = 0;
+      for (uint32_t i = 0; i < nw; i++) {
+        np += stackMap->getBit(i);
+      }
+      MOZ_ASSERT(size_t(np) == countedPointers);
+    }
+#endif
+
+    return true;
+  }
+};
+
 // The baseline compiler proper.
 
 class BaseCompiler final : public BaseCompilerInterface {
   using Local = BaseStackFrame::Local;
   using LabelVector = Vector<NonAssertingLabel, 8, SystemAllocPolicy>;
   using MIRTypeVector = Vector<MIRType, 8, SystemAllocPolicy>;
 
   // Bit set used for simple bounds check elimination.  Capping this at 64
@@ -1974,16 +2521,18 @@ class BaseCompiler final : public BaseCo
       latentDoubleCmp_;  // Comparison operator, if latentOp_ == Compare, float
                          // types
 
   FuncOffsets offsets_;
   MacroAssembler& masm;  // No '_' suffix - too tedious...
   BaseRegAlloc ra;       // Ditto
   BaseStackFrame fr;
 
+  StackMapGenerator smgen_;
+
   BaseStackFrame::LocalVector localInfo_;
   Vector<OutOfLineCode*, 8, SystemAllocPolicy> outOfLine_;
 
   // On specific platforms we sometimes need to use specific registers.
 
   SpecificRegs specific_;
 
   // The join registers are used to carry values out of blocks.
@@ -1995,19 +2544,20 @@ class BaseCompiler final : public BaseCo
   RegPtr joinRegPtr_;
   RegF32 joinRegF32_;
   RegF64 joinRegF64_;
 
   // There are more members scattered throughout.
 
  public:
   BaseCompiler(const ModuleEnvironment& env, const FuncCompileInput& input,
-               const ValTypeVector& locals, Decoder& decoder,
+               const ValTypeVector& locals, const MachineState& trapExitLayout,
+               size_t trapExitLayoutNumWords, Decoder& decoder,
                ExclusiveDeferredValidationState& dvs, TempAllocator* alloc,
-               MacroAssembler* masm);
+               MacroAssembler* masm, StackMaps* stackMaps);
 
   MOZ_MUST_USE bool init();
 
   FuncOffsets finish();
 
   MOZ_MUST_USE bool emitFunction();
   void emitInitStackLocals();
 
@@ -2296,163 +2846,35 @@ class BaseCompiler final : public BaseCo
   // and immediate-constant use.  It tracks constants, latent
   // references to locals, register contents, and values on the CPU
   // stack.
   //
   // The stack can be flushed to memory using sync().  This is handy
   // to avoid problems with control flow and messy register usage
   // patterns.
 
-  struct Stk {
-   private:
-    Stk() : kind_(Unknown), i64val_(0) {}
-
-   public:
-    enum Kind {
-      // The Mem opcodes are all clustered at the beginning to
-      // allow for a quick test within sync().
-      MemI32,  // 32-bit integer stack value ("offs")
-      MemI64,  // 64-bit integer stack value ("offs")
-      MemF32,  // 32-bit floating stack value ("offs")
-      MemF64,  // 64-bit floating stack value ("offs")
-      MemRef,  // reftype (pointer wide) stack value ("offs")
-
-      // The Local opcodes follow the Mem opcodes for a similar
-      // quick test within hasLocal().
-      LocalI32,  // Local int32 var ("slot")
-      LocalI64,  // Local int64 var ("slot")
-      LocalF32,  // Local float32 var ("slot")
-      LocalF64,  // Local double var ("slot")
-      LocalRef,  // Local reftype (pointer wide) var ("slot")
-
-      RegisterI32,  // 32-bit integer register ("i32reg")
-      RegisterI64,  // 64-bit integer register ("i64reg")
-      RegisterF32,  // 32-bit floating register ("f32reg")
-      RegisterF64,  // 64-bit floating register ("f64reg")
-      RegisterRef,  // reftype (pointer wide) register ("refReg")
-
-      ConstI32,  // 32-bit integer constant ("i32val")
-      ConstI64,  // 64-bit integer constant ("i64val")
-      ConstF32,  // 32-bit floating constant ("f32val")
-      ConstF64,  // 64-bit floating constant ("f64val")
-      ConstRef,  // reftype (pointer wide) constant ("refval")
-
-      Unknown,
-    };
-
-    Kind kind_;
-
-    static const Kind MemLast = MemRef;
-    static const Kind LocalLast = LocalRef;
-
-    union {
-      RegI32 i32reg_;
-      RegI64 i64reg_;
-      RegPtr refReg_;
-      RegF32 f32reg_;
-      RegF64 f64reg_;
-      int32_t i32val_;
-      int64_t i64val_;
-      intptr_t refval_;
-      float f32val_;
-      double f64val_;
-      uint32_t slot_;
-      uint32_t offs_;
-    };
-
-    explicit Stk(RegI32 r) : kind_(RegisterI32), i32reg_(r) {}
-    explicit Stk(RegI64 r) : kind_(RegisterI64), i64reg_(r) {}
-    explicit Stk(RegPtr r) : kind_(RegisterRef), refReg_(r) {}
-    explicit Stk(RegF32 r) : kind_(RegisterF32), f32reg_(r) {}
-    explicit Stk(RegF64 r) : kind_(RegisterF64), f64reg_(r) {}
-    explicit Stk(int32_t v) : kind_(ConstI32), i32val_(v) {}
-    explicit Stk(int64_t v) : kind_(ConstI64), i64val_(v) {}
-    explicit Stk(float v) : kind_(ConstF32), f32val_(v) {}
-    explicit Stk(double v) : kind_(ConstF64), f64val_(v) {}
-    explicit Stk(Kind k, uint32_t v) : kind_(k), slot_(v) {
-      MOZ_ASSERT(k > MemLast && k <= LocalLast);
-    }
-    static Stk StkRef(intptr_t v) {
-      Stk s;
-      s.kind_ = ConstRef;
-      s.refval_ = v;
-      return s;
-    }
-
-    void setOffs(Kind k, uint32_t v) {
-      MOZ_ASSERT(k <= MemLast);
-      kind_ = k;
-      offs_ = v;
-    }
-
-    Kind kind() const { return kind_; }
-    bool isMem() const { return kind_ <= MemLast; }
-
-    RegI32 i32reg() const {
-      MOZ_ASSERT(kind_ == RegisterI32);
-      return i32reg_;
-    }
-    RegI64 i64reg() const {
-      MOZ_ASSERT(kind_ == RegisterI64);
-      return i64reg_;
-    }
-    RegPtr refReg() const {
-      MOZ_ASSERT(kind_ == RegisterRef);
-      return refReg_;
-    }
-    RegF32 f32reg() const {
-      MOZ_ASSERT(kind_ == RegisterF32);
-      return f32reg_;
-    }
-    RegF64 f64reg() const {
-      MOZ_ASSERT(kind_ == RegisterF64);
-      return f64reg_;
-    }
-
-    int32_t i32val() const {
-      MOZ_ASSERT(kind_ == ConstI32);
-      return i32val_;
-    }
-    int64_t i64val() const {
-      MOZ_ASSERT(kind_ == ConstI64);
-      return i64val_;
-    }
-    intptr_t refval() const {
-      MOZ_ASSERT(kind_ == ConstRef);
-      return refval_;
-    }
-
-    // For these two, use an out-param instead of simply returning, to
-    // use the normal stack and not the x87 FP stack (which has effect on
-    // NaNs with the signaling bit set).
-
-    void f32val(float* out) const {
-      MOZ_ASSERT(kind_ == ConstF32);
-      *out = f32val_;
-    }
-    void f64val(double* out) const {
-      MOZ_ASSERT(kind_ == ConstF64);
-      *out = f64val_;
-    }
-
-    uint32_t slot() const {
-      MOZ_ASSERT(kind_ > MemLast && kind_ <= LocalLast);
-      return slot_;
-    }
-    uint32_t offs() const {
-      MOZ_ASSERT(isMem());
-      return offs_;
-    }
-  };
-
-  Vector<Stk, 8, SystemAllocPolicy> stk_;
-
-  template <typename... Args>
-  void push(Args&&... args) {
-    stk_.infallibleEmplaceBack(Stk(std::forward<Args>(args)...));
+  StkVector stk_;
+
+#ifdef DEBUG
+  size_t countMemRefsOnStk() {
+    size_t nRefs = 0;
+    for (Stk& v : stk_) {
+      if (v.kind() == Stk::MemRef) {
+        nRefs++;
+      }
+    }
+    return nRefs;
+  }
+#endif
+
+  template <typename T>
+  void push(T item) {
+    // None of the single-arg Stk constructors create a Stk::MemRef, so
+    // there's no need to increment smgen_.memRefsOnStk_ here.
+    stk_.infallibleEmplaceBack(Stk(item));
   }
 
   void pushConstRef(intptr_t v) { stk_.infallibleEmplaceBack(Stk::StkRef(v)); }
 
   void loadConstI32(const Stk& src, RegI32 dest) {
     moveImm32(src.i32val(), dest);
   }
 
@@ -2774,29 +3196,71 @@ class BaseCompiler final : public BaseCo
           v.setOffs(Stk::MemF32, offs);
           break;
         }
         case Stk::LocalRef: {
           ScratchPtr scratch(*this);
           loadLocalRef(v, scratch);
           uint32_t offs = fr.pushPtr(scratch);
           v.setOffs(Stk::MemRef, offs);
+          smgen_.memRefsOnStk_++;
           break;
         }
         case Stk::RegisterRef: {
           uint32_t offs = fr.pushPtr(v.refReg());
           freeRef(v.refReg());
           v.setOffs(Stk::MemRef, offs);
+          smgen_.memRefsOnStk_++;
           break;
         }
         default: { break; }
       }
     }
   }
 
+  // Various methods for creating a stack map.  Stack maps are indexed by the
+  // lowest address of the instruction immediately *after* the instruction of
+  // interest.  In practice that means either: the return point of a call, the
+  // instruction immediately after a trap instruction (the "resume"
+  // instruction), or the instruction immediately following a no-op (when
+  // debugging is enabled).
+
+  // Create a vanilla stack map.
+  MOZ_MUST_USE bool createStackMap(const char* who) {
+    const ExitStubMapVector noExtras;
+    return smgen_.createStackMap(who, noExtras, masm.currentOffset(),
+                                 HasRefTypedDebugFrame::No, stk_);
+  }
+
+  // Create a stack map as vanilla, but for a custom assembler offset.
+  MOZ_MUST_USE bool createStackMap(const char* who,
+                                   CodeOffset assemblerOffset) {
+    const ExitStubMapVector noExtras;
+    return smgen_.createStackMap(who, noExtras, assemblerOffset.offset(),
+                                 HasRefTypedDebugFrame::No, stk_);
+  }
+
+  // Create a stack map as vanilla, and note the presence of a ref-typed
+  // DebugFrame on the stack.
+  MOZ_MUST_USE bool createStackMap(const char* who,
+                                   HasRefTypedDebugFrame refDebugFrame) {
+    const ExitStubMapVector noExtras;
+    return smgen_.createStackMap(who, noExtras, masm.currentOffset(),
+                                 refDebugFrame, stk_);
+  }
+
+  // The most general stack map construction.
+  MOZ_MUST_USE bool createStackMap(const char* who,
+                                   const ExitStubMapVector& extras,
+                                   uint32_t assemblerOffset,
+                                   HasRefTypedDebugFrame refDebugFrame) {
+    return smgen_.createStackMap(who, extras, assemblerOffset, refDebugFrame,
+                                 stk_);
+  }
+
   // This is an optimization used to avoid calling sync() for
   // setLocal(): if the local does not exist unresolved on the stack
   // then we can skip the sync.
 
   bool hasLocal(uint32_t slot) {
     for (size_t i = stk_.length(); i > 0; i--) {
       // Memory opcodes are first in the enum, single check against MemLast is
       // fine.
@@ -2819,64 +3283,74 @@ class BaseCompiler final : public BaseCo
       sync();  // TODO / OPTIMIZE: Improve this?  (Bug 1316817)
     }
   }
 
   // Push the register r onto the stack.
 
   void pushI32(RegI32 r) {
     MOZ_ASSERT(!isAvailableI32(r));
-    push(r);
+    push(Stk(r));
   }
 
   void pushI64(RegI64 r) {
     MOZ_ASSERT(!isAvailableI64(r));
-    push(r);
+    push(Stk(r));
   }
 
   void pushRef(RegPtr r) {
     MOZ_ASSERT(!isAvailableRef(r));
-    push(r);
+    push(Stk(r));
   }
 
   void pushF64(RegF64 r) {
     MOZ_ASSERT(!isAvailableF64(r));
-    push(r);
+    push(Stk(r));
   }
 
   void pushF32(RegF32 r) {
     MOZ_ASSERT(!isAvailableF32(r));
-    push(r);
+    push(Stk(r));
   }
 
   // Push the value onto the stack.
 
-  void pushI32(int32_t v) { push(v); }
-
-  void pushI64(int64_t v) { push(v); }
+  void pushI32(int32_t v) { push(Stk(v)); }
+
+  void pushI64(int64_t v) { push(Stk(v)); }
 
   void pushRef(intptr_t v) { pushConstRef(v); }
 
-  void pushF64(double v) { push(v); }
-
-  void pushF32(float v) { push(v); }
+  void pushF64(double v) { push(Stk(v)); }
+
+  void pushF32(float v) { push(Stk(v)); }
 
   // Push the local slot onto the stack.  The slot will not be read
   // here; it will be read when it is consumed, or when a side
   // effect to the slot forces its value to be saved.
 
-  void pushLocalI32(uint32_t slot) { push(Stk::LocalI32, slot); }
-
-  void pushLocalI64(uint32_t slot) { push(Stk::LocalI64, slot); }
-
-  void pushLocalRef(uint32_t slot) { push(Stk::LocalRef, slot); }
-
-  void pushLocalF64(uint32_t slot) { push(Stk::LocalF64, slot); }
-
-  void pushLocalF32(uint32_t slot) { push(Stk::LocalF32, slot); }
+  void pushLocalI32(uint32_t slot) {
+    stk_.infallibleEmplaceBack(Stk(Stk::LocalI32, slot));
+  }
+
+  void pushLocalI64(uint32_t slot) {
+    stk_.infallibleEmplaceBack(Stk(Stk::LocalI64, slot));
+  }
+
+  void pushLocalRef(uint32_t slot) {
+    stk_.infallibleEmplaceBack(Stk(Stk::LocalRef, slot));
+  }
+
+  void pushLocalF64(uint32_t slot) {
+    stk_.infallibleEmplaceBack(Stk(Stk::LocalF64, slot));
+  }
+
+  void pushLocalF32(uint32_t slot) {
+    stk_.infallibleEmplaceBack(Stk(Stk::LocalF32, slot));
+  }
 
   // Call only from other popI32() variants.
   // v must be the stack top.  May pop the CPU stack.
 
   void popI32(const Stk& v, RegI32 dest) {
     MOZ_ASSERT(&v == &stk_.back());
     switch (v.kind()) {
       case Stk::ConstI32:
@@ -3013,28 +3487,34 @@ class BaseCompiler final : public BaseCo
       needRef(specific);
       popRef(v, specific);
       if (v.kind() == Stk::RegisterRef) {
         freeRef(v.refReg());
       }
     }
 
     stk_.popBack();
+    if (v.kind() == Stk::MemRef) {
+      smgen_.memRefsOnStk_--;
+    }
     return specific;
   }
 
   MOZ_MUST_USE RegPtr popRef() {
     Stk& v = stk_.back();
     RegPtr r;
     if (v.kind() == Stk::RegisterRef) {
       r = v.refReg();
     } else {
       popRef(v, (r = needRef()));
     }
     stk_.popBack();
+    if (v.kind() == Stk::MemRef) {
+      smgen_.memRefsOnStk_--;
+    }
     return r;
   }
 
   // Call only from other popF64() variants.
   // v must be the stack top.  May pop the CPU stack.
 
   void popF64(const Stk& v, RegF64 dest) {
     MOZ_ASSERT(&v == &stk_.back());
@@ -3400,16 +3880,19 @@ class BaseCompiler final : public BaseCo
           freeF64(v.f64reg());
           break;
         case Stk::RegisterF32:
           freeF32(v.f32reg());
           break;
         case Stk::RegisterRef:
           freeRef(v.refReg());
           break;
+        case Stk::MemRef:
+          smgen_.memRefsOnStk_--;
+          break;
         default:
           break;
       }
     }
     stk_.shrinkTo(stackSize);
   }
 
   void popValueStackBy(uint32_t items) {
@@ -3522,79 +4005,166 @@ class BaseCompiler final : public BaseCo
     // loading of TLS into the FarJumpIsland created by linkCallSites.
     masm.nopPatchableToCall(CallSiteDesc(iter_.lastOpcodeOffset(), kind));
   }
 
   //////////////////////////////////////////////////////////////////////
   //
   // Function prologue and epilogue.
 
-  void beginFunction() {
+  MOZ_MUST_USE bool beginFunction() {
+    JitSpew(JitSpew_Codegen, "# ========================================");
     JitSpew(JitSpew_Codegen, "# Emitting wasm baseline code");
+    JitSpew(JitSpew_Codegen,
+            "# beginFunction: start of function prologue for index %d",
+            (int)func_.index);
+
+    // Make a start on the stack map for this function.  Inspect the args so
+    // as to determine which of them are both in-memory and pointer-typed, and
+    // add entries to mst_ as appropriate.
+
+    const ValTypeVector& argTys = env_.funcTypes[func_.index]->args();
+
+    size_t nStackArgBytes = stackArgAreaSize(argTys);
+    MOZ_ASSERT(nStackArgBytes % sizeof(void*) == 0);
+    smgen_.numStackArgWords_ = nStackArgBytes / sizeof(void*);
+
+    MOZ_ASSERT(smgen_.mst_.length() == 0);
+    if (!smgen_.mst_.pushNonGCPointers(smgen_.numStackArgWords_)) {
+      return false;
+    }
+
+    for (ABIArgIter<const ValTypeVector> i(argTys); !i.done(); i++) {
+      ABIArg argLoc = *i;
+      if (argLoc.kind() != ABIArg::Stack) {
+        continue;
+      }
+      const ValType& ty = argTys[i.index()];
+      if (!ty.isReference()) {
+        continue;
+      }
+      uint32_t offset = argLoc.offsetFromArgBase();
+      MOZ_ASSERT(offset < nStackArgBytes);
+      MOZ_ASSERT(offset % sizeof(void*) == 0);
+      smgen_.mst_.setGCPointer(offset / sizeof(void*));
+    }
 
     GenerateFunctionPrologue(
         masm, env_.funcTypes[func_.index]->id,
         env_.mode() == CompileMode::Tier1 ? Some(func_.index) : Nothing(),
         &offsets_);
 
+    // GenerateFunctionPrologue pushes exactly one wasm::Frame's worth of
+    // stuff, and none of the values are GC pointers.  Hence:
+    if (!smgen_.mst_.pushNonGCPointers(sizeof(Frame) / sizeof(void*))) {
+      return false;
+    }
+
     // Initialize DebugFrame fields before the stack overflow trap so that
     // we have the invariant that all observable Frames in a debugEnabled
     // Module have valid DebugFrames.
     if (env_.debugEnabled()) {
 #ifdef JS_CODEGEN_ARM64
       static_assert(DebugFrame::offsetOfFrame() % WasmStackAlignment == 0,
                     "aligned");
 #endif
       masm.reserveStack(DebugFrame::offsetOfFrame());
+      if (!smgen_.mst_.pushNonGCPointers(DebugFrame::offsetOfFrame() /
+                                         sizeof(void*))) {
+        return false;
+      }
+
       masm.store32(
           Imm32(func_.index),
           Address(masm.getStackPointer(), DebugFrame::offsetOfFuncIndex()));
       masm.storePtr(ImmWord(0), Address(masm.getStackPointer(),
                                         DebugFrame::offsetOfFlagsWord()));
-    }
+      // Zero out DebugFrame::cachedReturnJSValue_ and ::resultRef_ for
+      // safety, since it's not easy to establish whether they will always be
+      // defined before a GC.
+      masm.storePtr(ImmWord(0), Address(masm.getStackPointer(),
+                                        DebugFrame::offsetOfResults()));
+      for (size_t i = 0; i < sizeof(js::Value) / sizeof(void*); i++) {
+        masm.storePtr(ImmWord(0),
+                      Address(masm.getStackPointer(),
+                              DebugFrame::offsetOfCachedReturnJSValue() +
+                                  i * sizeof(void*)));
+      }
+    }
+
+    // Generate a stack-overflow check and its associated stack map.
 
     fr.checkStack(ABINonArgReg0, BytecodeOffset(func_.lineOrBytecode));
-    masm.reserveStack(fr.fixedSize() - masm.framePushed());
+
+    const ValTypeVector& args = funcType().args();
+    ExitStubMapVector extras;
+    if (!smgen_.generateStackmapEntriesForTrapExit(args, extras)) {
+      return false;
+    }
+    if (!createStackMap("stack check", extras, masm.currentOffset(),
+                        HasRefTypedDebugFrame::No)) {
+      return false;
+    }
+
+    size_t reservedBytes = fr.fixedSize() - masm.framePushed();
+    MOZ_ASSERT(0 == (reservedBytes % sizeof(void*)));
+
+    masm.reserveStack(reservedBytes);
     fr.onFixedStackAllocated();
+    if (!smgen_.mst_.pushNonGCPointers(reservedBytes / sizeof(void*))) {
+      return false;
+    }
 
     // Copy arguments from registers to stack.
-
-    const ValTypeVector& args = funcType().args();
-
     for (ABIArgIter<const ValTypeVector> i(args); !i.done(); i++) {
       if (!i->argInRegister()) {
         continue;
       }
       Local& l = localInfo_[i.index()];
       switch (i.mirType()) {
         case MIRType::Int32:
           fr.storeLocalI32(RegI32(i->gpr()), l);
           break;
         case MIRType::Int64:
           fr.storeLocalI64(RegI64(i->gpr64()), l);
           break;
-        case MIRType::Pointer:
+        case MIRType::Pointer: {
+          uint32_t offs = fr.localOffset(l);
+          MOZ_ASSERT(0 == (offs % sizeof(void*)));
           fr.storeLocalPtr(RegPtr(i->gpr()), l);
+          smgen_.mst_.setGCPointer(offs / sizeof(void*));
           break;
+        }
         case MIRType::Double:
           fr.storeLocalF64(RegF64(i->fpu()), l);
           break;
         case MIRType::Float32:
           fr.storeLocalF32(RegF32(i->fpu()), l);
           break;
         default:
           MOZ_CRASH("Function argument type");
       }
     }
 
     fr.zeroLocals(&ra);
 
     if (env_.debugEnabled()) {
       insertBreakablePoint(CallSiteDesc::EnterFrame);
-    }
+      if (!createStackMap("debug: breakable point")) {
+        return false;
+      }
+    }
+
+    JitSpew(JitSpew_Codegen,
+            "# beginFunction: enter body with masm.framePushed = %u",
+            masm.framePushed());
+    MOZ_ASSERT(smgen_.framePushedAtEntryToBody_.isNothing());
+    smgen_.framePushedAtEntryToBody_.emplace(masm.framePushed());
+
+    return true;
   }
 
   void saveResult() {
     MOZ_ASSERT(env_.debugEnabled());
     size_t debugFrameOffset = masm.framePushed() - DebugFrame::offsetOfFrame();
     Address resultsAddress(masm.getStackPointer(),
                            debugFrameOffset + DebugFrame::offsetOfResults());
     switch (funcType().ret().code()) {
@@ -3647,17 +4217,19 @@ class BaseCompiler final : public BaseCo
         masm.loadPtr(resultsAddress, RegPtr(ReturnReg));
         break;
       case ExprType::NullRef:
       default:
         MOZ_CRASH("Function return type");
     }
   }
 
-  bool endFunction() {
+  MOZ_MUST_USE bool endFunction() {
+    JitSpew(JitSpew_Codegen, "# endFunction: start of function epilogue");
+
     // Always branch to returnLabel_.
     masm.breakpoint();
 
     // Patch the add in the prologue so that it checks against the correct
     // frame size. Flush the constant pool in case it needs to be patched.
     masm.flush();
 
     // Precondition for patching.
@@ -3665,43 +4237,62 @@ class BaseCompiler final : public BaseCo
       return false;
     }
 
     fr.patchCheckStack();
 
     masm.bind(&returnLabel_);
 
     if (env_.debugEnabled()) {
+      // If the return type is a ref, we need to note that in the stack maps
+      // generated here.  Note that this assumes that DebugFrame::result* and
+      // DebugFrame::cachedReturnJSValue_ are either both ref-typed or they
+      // are both not ref-typed.  It can't represent the situation where one
+      // is and the other isn't.
+      HasRefTypedDebugFrame refDebugFrame = funcType().ret().isReference()
+                                                ? HasRefTypedDebugFrame::Yes
+                                                : HasRefTypedDebugFrame::No;
+
       // Store and reload the return value from DebugFrame::return so that
       // it can be clobbered, and/or modified by the debug trap.
       saveResult();
       insertBreakablePoint(CallSiteDesc::Breakpoint);
+      if (!createStackMap("debug: breakpoint", refDebugFrame)) {
+        return false;
+      }
       insertBreakablePoint(CallSiteDesc::LeaveFrame);
+      if (!createStackMap("debug: leave frame", refDebugFrame)) {
+        return false;
+      }
       restoreResult();
     }
 
     GenerateFunctionEpilogue(masm, fr.fixedSize(), &offsets_);
 
 #if defined(JS_ION_PERF)
     // FIXME - profiling code missing.  No bug for this.
 
     // Note the end of the inline code and start of the OOL code.
     // gen->perfSpewer().noteEndInlineCode(masm);
 #endif
 
+    JitSpew(JitSpew_Codegen, "# endFunction: end of function epilogue");
+    JitSpew(JitSpew_Codegen, "# endFunction: start of OOL code");
     if (!generateOutOfLineCode()) {
       return false;
     }
 
     offsets_.end = masm.currentOffset();
 
     if (!fr.checkStackHeight()) {
       return false;
     }
 
+    JitSpew(JitSpew_Codegen, "# endFunction: end of OOL code for index %d",
+            (int)func_.index);
     return !masm.oom();
   }
 
   //////////////////////////////////////////////////////////////////////
   //
   // Calls.
 
   struct FunctionCall {
@@ -3754,16 +4345,19 @@ class BaseCompiler final : public BaseCo
     call.frameAlignAdjustment = ComputeByteAlignment(
         masm.framePushed() + sizeof(Frame), JitStackAlignment);
   }
 
   void endCall(FunctionCall& call, size_t stackSpace) {
     size_t adjustment = call.stackArgAreaSize + call.frameAlignAdjustment;
     fr.freeArgAreaAndPopBytes(adjustment, stackSpace);
 
+    MOZ_ASSERT(smgen_.framePushedBeforePushingCallArgs_.isSome());
+    smgen_.framePushedBeforePushingCallArgs_.reset();
+
     if (call.isInterModule) {
       masm.loadWasmTlsRegFromFrame();
       masm.loadWasmPinnedRegsFromTls();
       masm.switchToWasmTlsRealm(ABINonArgReturnReg0, ABINonArgReturnReg1);
     } else if (call.usesSystemAbi) {
       // On x86 there are no pinned registers, so don't waste time
       // reloading the Tls.
 #ifndef JS_CODEGEN_X86
@@ -3785,16 +4379,24 @@ class BaseCompiler final : public BaseCo
     ABIArgIter<const T> i(args);
     while (!i.done()) {
       i++;
     }
     return AlignBytes(i.stackBytesConsumedSoFar(), 16u);
   }
 
   void startCallArgs(size_t stackArgAreaSize, FunctionCall* call) {
+    // Record the masm.framePushed() value at this point, before we push args
+    // for the call, but including the alignment space placed above the args.
+    // This defines the lower limit of the stackmap that will be created for
+    // this call.
+    MOZ_ASSERT(smgen_.framePushedBeforePushingCallArgs_.isNothing());
+    smgen_.framePushedBeforePushingCallArgs_.emplace(
+        masm.framePushed() + call->frameAlignAdjustment);
+
     call->stackArgAreaSize = stackArgAreaSize;
 
     size_t adjustment = call->stackArgAreaSize + call->frameAlignAdjustment;
     fr.allocArgArea(adjustment);
   }
 
   const ABIArg reservePointerArgument(FunctionCall* call) {
     return call->abi.next(MIRType::Pointer);
@@ -3942,62 +4544,62 @@ class BaseCompiler final : public BaseCo
       }
       case ValType::NullRef:
         MOZ_CRASH("NullRef not expressible");
       default:
         MOZ_CRASH("Function argument type");
     }
   }
 
-  void callDefinition(uint32_t funcIndex, const FunctionCall& call) {
+  CodeOffset callDefinition(uint32_t funcIndex, const FunctionCall& call) {
     CallSiteDesc desc(call.lineOrBytecode, CallSiteDesc::Func);
-    masm.call(desc, funcIndex);
-  }
-
-  void callSymbolic(SymbolicAddress callee, const FunctionCall& call) {
+    return masm.call(desc, funcIndex);
+  }
+
+  CodeOffset callSymbolic(SymbolicAddress callee, const FunctionCall& call) {
     CallSiteDesc desc(call.lineOrBytecode, CallSiteDesc::Symbolic);
-    masm.call(desc, callee);
+    return masm.call(desc, callee);
   }
 
   // Precondition: sync()
 
-  void callIndirect(uint32_t funcTypeIndex, uint32_t tableIndex,
-                    const Stk& indexVal, const FunctionCall& call) {
+  CodeOffset callIndirect(uint32_t funcTypeIndex, uint32_t tableIndex,
+                          const Stk& indexVal, const FunctionCall& call) {
     const FuncTypeWithId& funcType = env_.types[funcTypeIndex].funcType();
     MOZ_ASSERT(funcType.id.kind() != FuncTypeIdDescKind::None);
 
     const TableDesc& table = env_.tables[tableIndex];
 
     loadI32(indexVal, RegI32(WasmTableCallIndexReg));
 
     CallSiteDesc desc(call.lineOrBytecode, CallSiteDesc::Dynamic);
     CalleeDesc callee = CalleeDesc::wasmTable(table, funcType.id);
-    masm.wasmCallIndirect(desc, callee, NeedsBoundsCheck(true));
+    return masm.wasmCallIndirect(desc, callee, NeedsBoundsCheck(true));
   }
 
   // Precondition: sync()
 
-  void callImport(unsigned globalDataOffset, const FunctionCall& call) {
+  CodeOffset callImport(unsigned globalDataOffset, const FunctionCall& call) {
     CallSiteDesc desc(call.lineOrBytecode, CallSiteDesc::Dynamic);
     CalleeDesc callee = CalleeDesc::import(globalDataOffset);
-    masm.wasmCallImport(desc, callee);
-  }
-
-  void builtinCall(SymbolicAddress builtin, const FunctionCall& call) {
-    callSymbolic(builtin, call);
-  }
-
-  void builtinInstanceMethodCall(SymbolicAddress builtin,
-                                 const ABIArg& instanceArg,
-                                 const FunctionCall& call) {
+    return masm.wasmCallImport(desc, callee);
+  }
+
+  CodeOffset builtinCall(SymbolicAddress builtin, const FunctionCall& call) {
+    return callSymbolic(builtin, call);
+  }
+
+  CodeOffset builtinInstanceMethodCall(SymbolicAddress builtin,
+                                       const ABIArg& instanceArg,
+                                       const FunctionCall& call) {
     // Builtin method calls assume the TLS register has been set.
     masm.loadWasmTlsRegFromFrame();
 
     CallSiteDesc desc(call.lineOrBytecode, CallSiteDesc::Symbolic);
-    masm.wasmCallBuiltinInstanceMethod(desc, instanceArg, builtin);
+    return masm.wasmCallBuiltinInstanceMethod(desc, instanceArg, builtin);
   }
 
   //////////////////////////////////////////////////////////////////////
   //
   // Sundry low-level code generators.
 
   // The compiler depends on moveImm32() clearing the high bits of a 64-bit
   // register on 64-bit systems except MIPS64 where high bits are sign extended
@@ -4008,20 +4610,21 @@ class BaseCompiler final : public BaseCo
   void moveImm64(int64_t v, RegI64 dest) { masm.move64(Imm64(v), dest); }
 
   void moveImmRef(intptr_t v, RegPtr dest) { masm.movePtr(ImmWord(v), dest); }
 
   void moveImmF32(float f, RegF32 dest) { masm.loadConstantFloat32(f, dest); }
 
   void moveImmF64(double d, RegF64 dest) { masm.loadConstantDouble(d, dest); }
 
-  void addInterruptCheck() {
+  MOZ_MUST_USE bool addInterruptCheck() {
     ScratchI32 tmp(*this);
     masm.loadWasmTlsRegFromFrame(tmp);
     masm.wasmInterruptCheck(tmp, bytecodeOffset());
+    return createStackMap("addInterruptCheck");
   }
 
   void jumpTable(const LabelVector& labels, Label* theTable) {
     // Flush constant pools to ensure that the table is never interrupted by
     // constant pool entries.
     masm.flush();
 
 #if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
@@ -5867,47 +6470,55 @@ class BaseCompiler final : public BaseCo
 
     // If the pointer being stored is to a tenured object, no barrier.
     masm.branchPtrInNurseryChunk(Assembler::NotEqual, setValue, otherScratch,
                                  skipBarrier);
   }
 
   // This frees the register `valueAddr`.
 
-  void emitPostBarrier(RegPtr valueAddr) {
+  MOZ_MUST_USE bool emitPostBarrier(RegPtr valueAddr) {
     uint32_t bytecodeOffset = iter_.lastOpcodeOffset();
 
     // The `valueAddr` is a raw pointer to the cell within some GC object or
     // TLS area, and we guarantee that the GC will not run while the
     // postbarrier call is active, so push a uintptr_t value.
 #ifdef JS_64BIT
     pushI64(RegI64(Register64(valueAddr)));
-    emitInstanceCall(bytecodeOffset, SigPL_, ExprType::Void,
-                     SymbolicAddress::PostBarrier);
+    if (!emitInstanceCall(bytecodeOffset, SigPL_, ExprType::Void,
+                          SymbolicAddress::PostBarrier)) {
+      return false;
+    }
 #else
     pushI32(RegI32(valueAddr));
-    emitInstanceCall(bytecodeOffset, SigPI_, ExprType::Void,
-                     SymbolicAddress::PostBarrier);
-#endif
-  }
-
-  void emitBarrieredStore(const Maybe<RegPtr>& object, RegPtr valueAddr,
-                          RegPtr value) {
+    if (!emitInstanceCall(bytecodeOffset, SigPI_, ExprType::Void,
+                          SymbolicAddress::PostBarrier)) {
+      return false;
+    }
+#endif
+    return true;
+  }
+
+  MOZ_MUST_USE bool emitBarrieredStore(const Maybe<RegPtr>& object,
+                                       RegPtr valueAddr, RegPtr value) {
     emitPreBarrier(valueAddr);  // Preserves valueAddr
     masm.storePtr(value, Address(valueAddr, 0));
 
     Label skipBarrier;
     sync();
 
     RegPtr otherScratch = needRef();
     emitPostBarrierGuard(object, otherScratch, value, &skipBarrier);
     freeRef(otherScratch);
 
-    emitPostBarrier(valueAddr);
+    if (!emitPostBarrier(valueAddr)) {
+      return false;
+    }
     masm.bind(&skipBarrier);
+    return true;
   }
 
   ////////////////////////////////////////////////////////////
   //
   // Machinery for optimized conditional branches.
   //
   // To disable this optimization it is enough always to return false from
   // sniffConditionalControl{Cmp,Eqz}.
@@ -6116,17 +6727,18 @@ class BaseCompiler final : public BaseCo
   void emitMultiplyI64();
   void emitMultiplyF32();
   void emitMultiplyF64();
   void emitQuotientI32();
   void emitQuotientU32();
   void emitRemainderI32();
   void emitRemainderU32();
 #ifdef RABALDR_INT_DIV_I64_CALLOUT
-  void emitDivOrModI64BuiltinCall(SymbolicAddress callee, ValType operandType);
+  MOZ_MUST_USE bool emitDivOrModI64BuiltinCall(SymbolicAddress callee,
+                                               ValType operandType);
 #else
   void emitQuotientI64();
   void emitQuotientU64();
   void emitRemainderI64();
   void emitRemainderU64();
 #endif
   void emitDivideF32();
   void emitDivideF64();
@@ -6204,18 +6816,19 @@ class BaseCompiler final : public BaseCo
   void emitConvertI64ToF32();
   void emitConvertU64ToF32();
   void emitConvertI64ToF64();
   void emitConvertU64ToF64();
 #endif
   void emitReinterpretI32AsF32();
   void emitReinterpretI64AsF64();
   void emitRound(RoundingMode roundingMode, ValType operandType);
-  void emitInstanceCall(uint32_t lineOrBytecode, const MIRTypeVector& sig,
-                        ExprType retType, SymbolicAddress builtin);
+  MOZ_MUST_USE bool emitInstanceCall(uint32_t lineOrBytecode,
+                                     const MIRTypeVector& sig, ExprType retType,
+                                     SymbolicAddress builtin);
   MOZ_MUST_USE bool emitGrowMemory();
   MOZ_MUST_USE bool emitCurrentMemory();
 
   MOZ_MUST_USE bool emitRefNull();
   void emitRefIsNull();
 
   MOZ_MUST_USE bool emitAtomicCmpXchg(ValType type, Scalar::Type viewType);
   MOZ_MUST_USE bool emitAtomicLoad(ValType type, Scalar::Type viewType);
@@ -7526,17 +8139,19 @@ bool BaseCompiler::emitLoop() {
   }
 
   initControl(controlItem());
   bceSafe_ = 0;
 
   if (!deadCode_) {
     masm.nopAlign(CodeAlignment);
     masm.bind(&controlItem(0).label);
-    addInterruptCheck();
+    if (!addInterruptCheck()) {
+      return false;
+    }
   }
 
   return true;
 }
 
 void BaseCompiler::endLoop(ExprType type) {
   Control& block = controlItem();
 
@@ -8026,20 +8641,26 @@ bool BaseCompiler::emitCall() {
   FunctionCall baselineCall(lineOrBytecode);
   beginCall(baselineCall, UseABI::Wasm,
             import ? InterModule::True : InterModule::False);
 
   if (!emitCallArgs(funcType.args(), &baselineCall)) {
     return false;
   }
 
+  CodeOffset raOffset;
   if (import) {
-    callImport(env_.funcImportGlobalDataOffsets[funcIndex], baselineCall);
+    raOffset =
+        callImport(env_.funcImportGlobalDataOffsets[funcIndex], baselineCall);
   } else {
-    callDefinition(funcIndex, baselineCall);
+    raOffset = callDefinition(funcIndex, baselineCall);
+  }
+
+  if (!createStackMap("emitCall", raOffset)) {
+    return false;
   }
 
   endCall(baselineCall, stackSpace);
 
   popValueStackBy(numArgs);
 
   pushReturnedIfNonVoid(baselineCall, funcType.ret());
 
@@ -8078,17 +8699,21 @@ bool BaseCompiler::emitCallIndirect() {
 
   FunctionCall baselineCall(lineOrBytecode);
   beginCall(baselineCall, UseABI::Wasm, InterModule::True);
 
   if (!emitCallArgs(funcType.args(), &baselineCall)) {
     return false;
   }
 
-  callIndirect(funcTypeIndex, tableIndex, callee, baselineCall);
+  CodeOffset raOffset =
+      callIndirect(funcTypeIndex, tableIndex, callee, baselineCall);
+  if (!createStackMap("emitCallIndirect", raOffset)) {
+    return false;
+  }
 
   endCall(baselineCall, stackSpace);
 
   popValueStackBy(numArgs);
 
   pushReturnedIfNonVoid(baselineCall, funcType.ret());
 
   return true;
@@ -8138,29 +8763,32 @@ bool BaseCompiler::emitUnaryMathBuiltinC
 
   FunctionCall baselineCall(lineOrBytecode);
   beginCall(baselineCall, UseABI::Builtin, InterModule::False);
 
   if (!emitCallArgs(signature, &baselineCall)) {
     return false;
   }
 
-  builtinCall(callee, baselineCall);
+  CodeOffset raOffset = builtinCall(callee, baselineCall);
+  if (!createStackMap("emitUnaryMathBuiltin[..]", raOffset)) {
+    return false;
+  }
 
   endCall(baselineCall, stackSpace);
 
   popValueStackBy(numArgs);
 
   pushReturnedIfNonVoid(baselineCall, retType);
 
   return true;
 }
 
 #ifdef RABALDR_INT_DIV_I64_CALLOUT
-void BaseCompiler::emitDivOrModI64BuiltinCall(SymbolicAddress callee,
+bool BaseCompiler::emitDivOrModI64BuiltinCall(SymbolicAddress callee,
                                               ValType operandType) {
   MOZ_ASSERT(operandType == ValType::I64);
   MOZ_ASSERT(!deadCode_);
 
   sync();
 
   needI64(specific_.abiReturnRegI64);
 
@@ -8177,22 +8805,26 @@ void BaseCompiler::emitDivOrModI64Builti
     checkDivideSignedOverflowI64(rhs, srcDest, &done, ZeroOnOverflow(true));
   }
 
   masm.setupWasmABICall();
   masm.passABIArg(srcDest.high);
   masm.passABIArg(srcDest.low);
   masm.passABIArg(rhs.high);
   masm.passABIArg(rhs.low);
-  masm.callWithABI(bytecodeOffset(), callee);
+  CodeOffset raOffset = masm.callWithABI(bytecodeOffset(), callee);
+  if (!createStackMap("emitDivOrModI64Bui[..]", raOffset)) {
+    return false;
+  }
 
   masm.bind(&done);
 
   freeI64(rhs);
   pushI64(srcDest);
+  return true;
 }
 #endif  // RABALDR_INT_DIV_I64_CALLOUT
 
 #ifdef RABALDR_I64_TO_FLOAT_CALLOUT
 bool BaseCompiler::emitConvertInt64ToFloatingCallout(SymbolicAddress callee,
                                                      ValType operandType,
                                                      ValType resultType) {
   sync();
@@ -8203,19 +8835,22 @@ bool BaseCompiler::emitConvertInt64ToFlo
 
   masm.setupWasmABICall();
 #ifdef JS_PUNBOX64
   MOZ_CRASH("BaseCompiler platform hook: emitConvertInt64ToFloatingCallout");
 #else
   masm.passABIArg(input.high);
   masm.passABIArg(input.low);
 #endif
-  masm.callWithABI(
+  CodeOffset raOffset = masm.callWithABI(
       bytecodeOffset(), callee,
       resultType == ValType::F32 ? MoveOp::FLOAT32 : MoveOp::DOUBLE);
+  if (!createStackMap("emitConvertInt64To[..]", raOffset)) {
+    return false;
+  }
 
   freeI64(input);
 
   if (resultType == ValType::F32) {
     pushF32(captureReturnedF32(call));
   } else {
     pushF64(captureReturnedF64(call));
   }
@@ -8245,17 +8880,20 @@ bool BaseCompiler::emitConvertFloatingTo
   pushF64(otherReg);
 
   sync();
 
   FunctionCall call(0);
 
   masm.setupWasmABICall();
   masm.passABIArg(doubleInput, MoveOp::DOUBLE);
-  masm.callWithABI(bytecodeOffset(), callee);
+  CodeOffset raOffset = masm.callWithABI(bytecodeOffset(), callee);
+  if (!createStackMap("emitConvertFloatin[..]", raOffset)) {
+    return false;
+  }
 
   freeF64(doubleInput);
 
   RegI64 rv = captureReturnedI64();
 
   RegF64 inputVal = popF64();
 
   TruncFlags flags = 0;
@@ -8551,17 +9189,20 @@ bool BaseCompiler::emitSetGlobal() {
       RegPtr valueAddr(PreBarrierReg);
       needRef(valueAddr);
       {
         ScratchI32 tmp(*this);
         masm.computeEffectiveAddress(addressOfGlobalVar(global, tmp),
                                      valueAddr);
       }
       RegPtr rv = popRef();
-      emitBarrieredStore(Nothing(), valueAddr, rv);  // Consumes valueAddr
+      if (!emitBarrieredStore(Nothing(), valueAddr,
+                              rv)) {  // Consumes valueAddr
+        return false;
+      }
       freeRef(rv);
       break;
     }
     case ValType::NullRef:
       MOZ_CRASH("NullRef not expressible");
     default:
       MOZ_CRASH("Global variable type");
       break;
@@ -9056,17 +9697,17 @@ void BaseCompiler::emitCompareRef(Assemb
   pop2xRef(&rs1, &rs2);
   RegI32 rd = needI32();
   masm.cmpPtrSet(compareOp, rs1, rs2, rd);
   freeRef(rs1);
   freeRef(rs2);
   pushI32(rd);
 }
 
-void BaseCompiler::emitInstanceCall(uint32_t lineOrBytecode,
+bool BaseCompiler::emitInstanceCall(uint32_t lineOrBytecode,
                                     const MIRTypeVector& sig, ExprType retType,
                                     SymbolicAddress builtin) {
   MOZ_ASSERT(sig[0] == MIRType::Pointer);
 
   sync();
 
   uint32_t numArgs = sig.length() - 1 /* instance */;
   size_t stackSpace = stackConsumed(numArgs);
@@ -9089,67 +9730,69 @@ void BaseCompiler::emitInstanceCall(uint
       case MIRType::Pointer:
         t = ValType::AnyRef;
         break;
       default:
         MOZ_CRASH("Unexpected type");
     }
     passArg(t, peek(numArgs - i), &baselineCall);
   }
-  builtinInstanceMethodCall(builtin, instanceArg, baselineCall);
+  CodeOffset raOffset =
+      builtinInstanceMethodCall(builtin, instanceArg, baselineCall);
+  if (!createStackMap("emitInstanceCall", raOffset)) {
+    return false;
+  }
+
   endCall(baselineCall, stackSpace);
 
   popValueStackBy(numArgs);
 
   // Note, many clients of emitInstanceCall currently assume that pushing the
   // result here does not destroy ReturnReg.
   //
   // Furthermore, clients assume that even if retType == ExprType::Void, the
   // callee may have returned a status result and left it in ReturnReg for us
   // to find, and that that register will not be destroyed here (or above).
   // In this case the callee will have a C++ declaration stating that there is
   // a return value.  Examples include memory and table operations that are
   // implemented as callouts.
 
   pushReturnedIfNonVoid(baselineCall, retType);
+  return true;
 }
 
 bool BaseCompiler::emitGrowMemory() {
   uint32_t lineOrBytecode = readCallSiteLineOrBytecode();
 
   Nothing arg;
   if (!iter_.readGrowMemory(&arg)) {
     return false;
   }
 
   if (deadCode_) {
     return true;
   }
 
-  // infallible
-  emitInstanceCall(lineOrBytecode, SigPI_, ExprType::I32,
-                   SymbolicAddress::GrowMemory);
-  return true;
+  return emitInstanceCall(lineOrBytecode, SigPI_, ExprType::I32,
+                          SymbolicAddress::GrowMemory);
 }
 
 bool BaseCompiler::emitCurrentMemory() {
   uint32_t lineOrBytecode = readCallSiteLineOrBytecode();
 
   if (!iter_.readCurrentMemory()) {
     return false;
   }
 
   if (deadCode_) {
     return true;
   }
 
-  // infallible
-  emitInstanceCall(lineOrBytecode, SigP_, ExprType::I32,
-                   SymbolicAddress::CurrentMemory);
-  return true;
+  return emitInstanceCall(lineOrBytecode, SigP_, ExprType::I32,
+                          SymbolicAddress::CurrentMemory);
 }
 
 bool BaseCompiler::emitRefNull() {
   if (!iter_.readRefNull()) {
     return false;
   }
 
   if (deadCode_) {
@@ -9453,22 +10096,26 @@ bool BaseCompiler::emitWait(ValType type
 
   if (deadCode_) {
     return true;
   }
 
   // Returns -1 on trap, otherwise nonnegative result.
   switch (type.code()) {
     case ValType::I32:
-      emitInstanceCall(lineOrBytecode, SigPIIL_, ExprType::I32,
-                       SymbolicAddress::WaitI32);
+      if (!emitInstanceCall(lineOrBytecode, SigPIIL_, ExprType::I32,
+                            SymbolicAddress::WaitI32)) {
+        return false;
+      }
       break;
     case ValType::I64:
-      emitInstanceCall(lineOrBytecode, SigPILL_, ExprType::I32,
-                       SymbolicAddress::WaitI64);
+      if (!emitInstanceCall(lineOrBytecode, SigPILL_, ExprType::I32,
+                            SymbolicAddress::WaitI64)) {
+        return false;
+      }
       break;
     default:
       MOZ_CRASH();
   }
 
   Label ok;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
@@ -9486,18 +10133,20 @@ bool BaseCompiler::emitWake() {
     return false;
   }
 
   if (deadCode_) {
     return true;
   }
 
   // Returns -1 on trap, otherwise nonnegative result.
-  emitInstanceCall(lineOrBytecode, SigPII_, ExprType::I32,
-                   SymbolicAddress::Wake);
+  if (!emitInstanceCall(lineOrBytecode, SigPII_, ExprType::I32,
+                        SymbolicAddress::Wake)) {
+    return false;
+  }
 
   Label ok;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
   masm.bind(&ok);
 
   return true;
 }
@@ -9517,23 +10166,27 @@ bool BaseCompiler::emitMemOrTableCopy(bo
   if (deadCode_) {
     return true;
   }
 
   // Returns -1 on trap, otherwise 0.
   if (isMem) {
     MOZ_ASSERT(srcMemOrTableIndex == 0);
     MOZ_ASSERT(dstMemOrTableIndex == 0);
-    emitInstanceCall(lineOrBytecode, SigPIII_, ExprType::Void,
-                     SymbolicAddress::MemCopy);
+    if (!emitInstanceCall(lineOrBytecode, SigPIII_, ExprType::Void,
+                          SymbolicAddress::MemCopy)) {
+      return false;
+    }
   } else {
     pushI32(dstMemOrTableIndex);
     pushI32(srcMemOrTableIndex);
-    emitInstanceCall(lineOrBytecode, SigPIIIII_, ExprType::Void,
-                     SymbolicAddress::TableCopy);
+    if (!emitInstanceCall(lineOrBytecode, SigPIIIII_, ExprType::Void,
+                          SymbolicAddress::TableCopy)) {
+      return false;
+    }
   }
 
   Label ok;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
   masm.bind(&ok);
 
   return true;
@@ -9552,17 +10205,19 @@ bool BaseCompiler::emitMemOrTableDrop(bo
   }
 
   // Despite the cast to int32_t, the callee regards the value as unsigned.
   //
   // Returns -1 on trap, otherwise 0.
   pushI32(int32_t(segIndex));
   SymbolicAddress callee =
       isMem ? SymbolicAddress::MemDrop : SymbolicAddress::TableDrop;
-  emitInstanceCall(lineOrBytecode, SigPI_, ExprType::Void, callee);
+  if (!emitInstanceCall(lineOrBytecode, SigPI_, ExprType::Void, callee)) {
+    return false;
+  }
 
   Label ok;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
   masm.bind(&ok);
 
   return true;
 }
@@ -9575,18 +10230,20 @@ bool BaseCompiler::emitMemFill() {
     return false;
   }
 
   if (deadCode_) {
     return true;
   }
 
   // Returns -1 on trap, otherwise 0.
-  emitInstanceCall(lineOrBytecode, SigPIII_, ExprType::Void,
-                   SymbolicAddress::MemFill);
+  if (!emitInstanceCall(lineOrBytecode, SigPIII_, ExprType::Void,
+                        SymbolicAddress::MemFill)) {
+    return false;
+  }
 
   Label ok;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
   masm.bind(&ok);
 
   return true;
 }
@@ -9604,22 +10261,26 @@ bool BaseCompiler::emitMemOrTableInit(bo
 
   if (deadCode_) {
     return true;
   }
 
   // Returns -1 on trap, otherwise 0.
   pushI32(int32_t(segIndex));
   if (isMem) {
-    emitInstanceCall(lineOrBytecode, SigPIIII_, ExprType::Void,
-                     SymbolicAddress::MemInit);
+    if (!emitInstanceCall(lineOrBytecode, SigPIIII_, ExprType::Void,
+                          SymbolicAddress::MemInit)) {
+      return false;
+    }
   } else {
     pushI32(dstTableIndex);
-    emitInstanceCall(lineOrBytecode, SigPIIIII_, ExprType::Void,
-                     SymbolicAddress::TableInit);
+    if (!emitInstanceCall(lineOrBytecode, SigPIIIII_, ExprType::Void,
+                          SymbolicAddress::TableInit)) {
+      return false;
+    }
   }
 
   Label ok;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
   masm.bind(&ok);
 
   return true;
@@ -9637,18 +10298,20 @@ bool BaseCompiler::emitTableGet() {
   if (deadCode_) {
     return true;
   }
   // get(index:u32, table:u32) -> anyref
   //
   // Returns (void*)-1 for error, this will not be confused with a real ref
   // value.
   pushI32(tableIndex);
-  emitInstanceCall(lineOrBytecode, SigPII_, ExprType::AnyRef,
-                   SymbolicAddress::TableGet);
+  if (!emitInstanceCall(lineOrBytecode, SigPII_, ExprType::AnyRef,
+                        SymbolicAddress::TableGet)) {
+    return false;
+  }
   Label noTrap;
   masm.branchPtr(Assembler::NotEqual, ReturnReg, Imm32(-1), &noTrap);
   trap(Trap::ThrowReported);
   masm.bind(&noTrap);
 
   return true;
 }
 
@@ -9663,20 +10326,18 @@ bool BaseCompiler::emitTableGrow() {
   }
   if (deadCode_) {
     return true;
   }
   // grow(delta:u32, initValue:anyref, table:u32) -> u32
   //
   // infallible.
   pushI32(tableIndex);
-  emitInstanceCall(lineOrBytecode, SigPIPI_, ExprType::I32,
-                   SymbolicAddress::TableGrow);
-
-  return true;
+  return emitInstanceCall(lineOrBytecode, SigPIPI_, ExprType::I32,
+                          SymbolicAddress::TableGrow);
 }
 
 MOZ_MUST_USE
 bool BaseCompiler::emitTableSet() {
   uint32_t lineOrBytecode = readCallSiteLineOrBytecode();
   Nothing index, value;
   uint32_t tableIndex;
   if (!iter_.readTableSet(&tableIndex, &index, &value)) {
@@ -9684,18 +10345,20 @@ bool BaseCompiler::emitTableSet() {
   }
   if (deadCode_) {
     return true;
   }
   // set(index:u32, value:ref, table:u32) -> i32
   //
   // Returns -1 on range error, otherwise 0 (which is then ignored).
   pushI32(tableIndex);
-  emitInstanceCall(lineOrBytecode, SigPIPI_, ExprType::Void,
-                   SymbolicAddress::TableSet);
+  if (!emitInstanceCall(lineOrBytecode, SigPIPI_, ExprType::Void,
+                        SymbolicAddress::TableSet)) {
+    return false;
+  }
   Label noTrap;
   masm.branchTest32(Assembler::NotSigned, ReturnReg, ReturnReg, &noTrap);
   trap(Trap::ThrowReported);
   masm.bind(&noTrap);
   return true;
 }
 
 MOZ_MUST_USE
@@ -9707,19 +10370,18 @@ bool BaseCompiler::emitTableSize() {
   }
   if (deadCode_) {
     return true;
   }
   // size(table:u32) -> u32
   //
   // infallible.
   pushI32(tableIndex);
-  emitInstanceCall(lineOrBytecode, SigPI_, ExprType::I32,
-                   SymbolicAddress::TableSize);
-  return true;
+  return emitInstanceCall(lineOrBytecode, SigPI_, ExprType::I32,
+                          SymbolicAddress::TableSize);
 }
 
 bool BaseCompiler::emitStructNew() {
   uint32_t lineOrBytecode = readCallSiteLineOrBytecode();
 
   uint32_t typeIndex;
   BaseOpIter::ValueVector args;
   if (!iter_.readStructNew(&typeIndex, &args)) {
@@ -9733,18 +10395,20 @@ bool BaseCompiler::emitStructNew() {
   // Allocate zeroed storage.  The parameter to StructNew is an index into a
   // descriptor table that the instance has.
   //
   // Returns null on OOM.
 
   const StructType& structType = env_.types[typeIndex].structType();
 
   pushI32(structType.moduleIndex_);
-  emitInstanceCall(lineOrBytecode, SigPI_, ExprType::AnyRef,
-                   SymbolicAddress::StructNew);
+  if (!emitInstanceCall(lineOrBytecode, SigPI_, ExprType::AnyRef,
+                        SymbolicAddress::StructNew)) {
+    return false;
+  }
 
   // Null pointer check.
 
   Label ok;
   masm.branchTestPtr(Assembler::NonZero, ReturnReg, ReturnReg, &ok);
   trap(Trap::ThrowReported);
   masm.bind(&ok);
 
@@ -9826,18 +10490,20 @@ bool BaseCompiler::emitStructNew() {
           freeRef(rowner);
         }
 
         freeRef(value);
 
         pushRef(rp);  // Save rp across the call
         RegPtr valueAddr = needRef();
         masm.computeEffectiveAddress(Address(rdata, offs), valueAddr);
-        emitPostBarrier(valueAddr);  // Consumes valueAddr
-        popRef(rp);                  // Restore rp
+        if (!emitPostBarrier(valueAddr)) {  // Consumes valueAddr
+          return false;
+        }
+        popRef(rp);  // Restore rp
         if (!structType.isInline_) {
           masm.loadPtr(Address(rp, OutlineTypedObject::offsetOfData()), rdata);
         }
 
         masm.bind(&skipBarrier);
         break;
       }
       case ValType::NullRef:
@@ -10006,17 +10672,19 @@ bool BaseCompiler::emitStructSet() {
     case ValType::F64: {
       masm.storeDouble(rd, Address(rp, offs));
       freeF64(rd);
       break;
     }
     case ValType::Ref:
     case ValType::AnyRef: {
       masm.computeEffectiveAddress(Address(rp, offs), valueAddr);
-      emitBarrieredStore(Some(rp), valueAddr, rr);  // Consumes valueAddr
+      if (!emitBarrieredStore(Some(rp), valueAddr, rr)) {  // Consumes valueAddr
+        return false;
+      }
       freeRef(rr);
       break;
     }
     case ValType::NullRef: {
       MOZ_CRASH("NullRef not expressible");
     }
     default: { MOZ_CRASH("Unexpected field type"); }
   }
@@ -10055,23 +10723,23 @@ bool BaseCompiler::emitStructNarrow() {
   //
   // Infallible.
   const StructType& outputStruct =
       env_.types[outputType.refTypeIndex()].structType();
 
   pushI32(mustUnboxAnyref);
   pushI32(outputStruct.moduleIndex_);
   pushRef(rp);
-  emitInstanceCall(lineOrBytecode, SigPIIP_, ExprType::AnyRef,
-                   SymbolicAddress::StructNarrow);
-
-  return true;
+  return emitInstanceCall(lineOrBytecode, SigPIIP_, ExprType::AnyRef,
+                          SymbolicAddress::StructNarrow);
 }
 
 bool BaseCompiler::emitBody() {
+  MOZ_ASSERT(smgen_.framePushedAtEntryToBody_.isSome());
+
   if (!iter_.readFunctionStart(funcType().ret())) {
     return false;
   }
 
   initControl(controlItem());
 
   uint32_t overhead = 0;
 
@@ -10102,24 +10770,44 @@ bool BaseCompiler::emitBody() {
   iter_.readConversion(inType, outType, &unused_a) && (deadCode_ || doEmit())
 
 #define emitCalloutConversionOOM(doEmit, symbol, inType, outType) \
   iter_.readConversion(inType, outType, &unused_a) &&             \
       (deadCode_ || doEmit(symbol, inType, outType))
 
 #define emitIntDivCallout(doEmit, symbol, type)   \
   iter_.readBinary(type, &unused_a, &unused_b) && \
-      (deadCode_ || (doEmit(symbol, type), true))
+      (deadCode_ || doEmit(symbol, type))
+
+#ifdef DEBUG
+    // Check that the number of ref-typed entries in the operand stack matches
+    // reality.
+#define CHECK_POINTER_COUNT                                  \
+  do {                                                       \
+    MOZ_ASSERT(countMemRefsOnStk() == smgen_.memRefsOnStk_); \
+  } while (0)
+#else
+#define CHECK_POINTER_COUNT \
+  do {                      \
+  } while (0)
+#endif
 
 #define CHECK(E) \
   if (!(E)) return false
-#define NEXT() continue
+#define NEXT()           \
+  {                      \
+    CHECK_POINTER_COUNT; \
+    continue;            \
+  }
 #define CHECK_NEXT(E)     \
   if (!(E)) return false; \
-  continue
+  {                       \
+    CHECK_POINTER_COUNT;  \
+    continue;             \
+  }
 
     // TODO / EVALUATE (bug 1316845): Not obvious that this attempt at
     // reducing overhead is really paying off relative to making the check
     // every iteration.
 
     if (overhead == 0) {
       // Check every 50 expressions -- a happy medium between
       // memory usage and checking overhead.
@@ -10142,17 +10830,28 @@ bool BaseCompiler::emitBody() {
     // When env_.debugEnabled(), every operator has breakpoint site but Op::End.
     if (env_.debugEnabled() && op.b0 != (uint16_t)Op::End) {
       // TODO sync only registers that can be clobbered by the exit
       // prologue/epilogue or disable these registers for use in
       // baseline compiler when env_.debugEnabled() is set.
       sync();
 
       insertBreakablePoint(CallSiteDesc::Breakpoint);
-    }
+      if (!createStackMap("debug: per insn")) {
+        return false;
+      }
+    }
+
+    // Going below framePushedAtEntryToBody_ would imply that we've
+    // popped off the machine stack, part of the frame created by
+    // beginFunction().
+    MOZ_ASSERT(masm.framePushed() >= smgen_.framePushedAtEntryToBody_.value());
+
+    // At this point we're definitely not generating code for a function call.
+    MOZ_ASSERT(smgen_.framePushedBeforePushingCallArgs_.isNothing());
 
     switch (op.b0) {
       case uint16_t(Op::End):
         if (!emitEnd()) {
           return false;
         }
 
         if (iter_.controlStackEmpty()) {
@@ -11034,63 +11733,70 @@ bool BaseCompiler::emitBody() {
 
       default:
         return iter_.unrecognizedOpcode(&op);
     }
 
 #undef CHECK
 #undef NEXT
 #undef CHECK_NEXT
+#undef CHECK_POINTER_COUNT
 #undef emitBinary
 #undef emitUnary
 #undef emitComparison
 #undef emitConversion
 #undef emitConversionOOM
 #undef emitCalloutConversionOOM
 
     MOZ_CRASH("unreachable");
   }
 
   MOZ_CRASH("unreachable");
 }
 
 bool BaseCompiler::emitFunction() {
-  beginFunction();
+  if (!beginFunction()) {
+    return false;
+  }
 
   if (!emitBody()) {
     return false;
   }
 
   if (!endFunction()) {
     return false;
   }
 
   return true;
 }
 
 BaseCompiler::BaseCompiler(const ModuleEnvironment& env,
                            const FuncCompileInput& func,
-                           const ValTypeVector& locals, Decoder& decoder,
+                           const ValTypeVector& locals,
+                           const MachineState& trapExitLayout,
+                           size_t trapExitLayoutNumWords, Decoder& decoder,
                            ExclusiveDeferredValidationState& dvs,
-                           TempAllocator* alloc, MacroAssembler* masm)
+                           TempAllocator* alloc, MacroAssembler* masm,
+                           StackMaps* stackMaps)
     : env_(env),
       iter_(env, decoder, dvs),
       func_(func),
       lastReadCallSite_(0),
       alloc_(*alloc),
       locals_(locals),
       deadCode_(false),
       bceSafe_(0),
       latentOp_(LatentOp::None),
       latentType_(ValType::I32),
       latentIntCmp_(Assembler::Equal),
       latentDoubleCmp_(Assembler::DoubleEqual),
       masm(*masm),
       ra(*this),
       fr(*masm),
+      smgen_(stackMaps, trapExitLayout, trapExitLayoutNumWords, *masm),
       joinRegI32_(RegI32(ReturnReg)),
       joinRegI64_(RegI64(ReturnReg64)),
       joinRegPtr_(RegPtr(ReturnReg)),
       joinRegF32_(RegF32(ReturnFloat32Reg)),
       joinRegF64_(RegF64(ReturnDoubleReg)) {}
 
 bool BaseCompiler::init() {
   if (!SigD_.append(ValType::F64)) {
@@ -11153,16 +11859,19 @@ bool BaseCompiler::init() {
 
   return true;
 }
 
 FuncOffsets BaseCompiler::finish() {
   MOZ_ASSERT(done(), "all bytes must be consumed");
   MOZ_ASSERT(func_.callSiteLineNums.length() == lastReadCallSite_);
 
+  MOZ_ASSERT(stk_.empty());
+  MOZ_ASSERT(smgen_.memRefsOnStk_ == 0);
+
   masm.flushBuffer();
 
   return offsets_;
 }
 
 }  // namespace wasm
 }  // namespace js
 
@@ -11205,33 +11914,39 @@ bool js::wasm::BaselineCompileFunctions(
   WasmMacroAssembler masm(alloc);
 
   // Swap in already-allocated empty vectors to avoid malloc/free.
   MOZ_ASSERT(code->empty());
   if (!code->swap(masm)) {
     return false;
   }
 
+  // Create a description of the stack layout created by GenerateTrapExit().
+  MachineState trapExitLayout;
+  size_t trapExitLayoutNumWords;
+  GenerateTrapExitMachineState(&trapExitLayout, &trapExitLayoutNumWords);
+
   for (const FuncCompileInput& func : inputs) {
     Decoder d(func.begin, func.end, func.lineOrBytecode, error);
 
     // Build the local types vector.
 
     ValTypeVector locals;
     if (!locals.appendAll(env.funcTypes[func.index]->args())) {
       return false;
     }
     if (!DecodeLocalEntries(d, env.kind, env.types, env.gcTypesEnabled(),
                             &locals)) {
       return false;
     }
 
     // One-pass baseline compilation.
 
-    BaseCompiler f(env, func, locals, d, dvs, &alloc, &masm);
+    BaseCompiler f(env, func, locals, trapExitLayout, trapExitLayoutNumWords, d,
+                   dvs, &alloc, &masm, &code->stackMaps);
     if (!f.init()) {
       return false;
     }
     if (!f.emitFunction()) {
       return false;
     }
     if (!code->codeRanges.emplaceBack(func.index, func.lineOrBytecode,
                                       f.finish())) {
@@ -11242,11 +11957,49 @@ bool js::wasm::BaselineCompileFunctions(
   masm.finish();
   if (masm.oom()) {
     return false;
   }
 
   return code->swap(masm);
 }
 
+#ifdef DEBUG
+bool js::wasm::IsValidStackMapKey(bool debugEnabled, const uint8_t* nextPC) {
+#if defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_X86)
+  const uint8_t* insn = nextPC;
+  return (insn[-2] == 0x0F && insn[-1] == 0x0B) ||  // ud2
+         (insn[-2] == 0xFF && insn[-1] == 0xD0) ||  // call *%{rax,eax}
+         insn[-5] == 0xE8 ||                        // call simm32
+         (debugEnabled && insn[-5] == 0x0F && insn[-4] == 0x1F &&
+          insn[-3] == 0x44 && insn[-2] == 0x00 &&
+          insn[-1] == 0x00);  // nop_five
+
+#elif defined(JS_CODEGEN_ARM)
+  const uint32_t* insn = (const uint32_t*)nextPC;
+  return ((uintptr_t(insn) & 3) == 0) &&              // must be ARM, not Thumb
+         (insn[-1] == 0xe7f000f0 ||                   // udf
+          (insn[-1] & 0xfffffff0) == 0xe12fff30 ||    // blx reg (ARM, enc A1)
+          (insn[-1] & 0xff000000) == 0xeb000000 ||    // bl simm24 (ARM, enc A1)
+          (debugEnabled && insn[-1] == 0xe320f000));  // "as_nop"
+
+#elif defined(JS_CODEGEN_ARM64)
+#ifdef JS_SIMULATOR_ARM64
+  const uint32_t hltInsn = 0xd45bd600;
+#else
+  const uint32_t hltInsn = 0xd4a00000;
+#endif
+  const uint32_t* insn = (const uint32_t*)nextPC;
+  return ((uintptr_t(insn) & 3) == 0) &&
+         (insn[-1] == hltInsn ||                      // hlt
+          (insn[-1] & 0xfffffc1f) == 0xd63f0000 ||    // blr reg
+          (insn[-1] & 0xfc000000) == 0x94000000 ||    // bl simm26
+          (debugEnabled && insn[-1] == 0xd503201f));  // nop
+
+#else
+  MOZ_CRASH("IsValidStackMapKey: requires implementation on this platform");
+#endif
+}
+#endif
+
 #undef RABALDR_INT_DIV_I64_CALLOUT
 #undef RABALDR_I64_TO_FLOAT_CALLOUT
 #undef RABALDR_FLOAT_TO_I64_CALLOUT
--- a/js/src/wasm/WasmBaselineCompile.h
+++ b/js/src/wasm/WasmBaselineCompile.h
@@ -76,12 +76,18 @@ class BaseLocalIter {
 #ifdef DEBUG
   bool isArg() const {
     MOZ_ASSERT(!done_);
     return !argsIter_.done();
   }
 #endif
 };
 
+#ifdef DEBUG
+// Check whether |nextPC| is a valid code address for a stackmap created by
+// this compiler.
+bool IsValidStackMapKey(bool debugEnabled, const uint8_t* nextPC);
+#endif
+
 }  // namespace wasm
 }  // namespace js
 
 #endif  // asmjs_wasm_baseline_compile_h
--- a/js/src/wasm/WasmCode.cpp
+++ b/js/src/wasm/WasmCode.cpp
@@ -1256,16 +1256,26 @@ const CodeRange* Code::lookupFuncRange(v
     const CodeRange* result = codeTier(t).lookupRange(pc);
     if (result && result->isFunction()) {
       return result;
     }
   }
   return nullptr;
 }
 
+const StackMap* Code::lookupStackMap(uint8_t* nextPC) const {
+  for (Tier t : tiers()) {
+    const StackMap* result = metadata(t).stackMaps.findMap(nextPC);
+    if (result) {
+      return result;
+    }
+  }
+  return nullptr;
+}
+
 struct TrapSitePCOffset {
   const TrapSiteVector& trapSites;
   explicit TrapSitePCOffset(const TrapSiteVector& trapSites)
       : trapSites(trapSites) {}
   uint32_t operator[](size_t index) const { return trapSites[index].pcOffset; }
 };
 
 bool Code::lookupTrap(void* pc, Trap* trapOut, BytecodeOffset* bytecode) const {
--- a/js/src/wasm/WasmCode.h
+++ b/js/src/wasm/WasmCode.h
@@ -406,16 +406,17 @@ struct MetadataTier {
   const Tier tier;
 
   Uint32Vector funcToCodeRange;
   CodeRangeVector codeRanges;
   CallSiteVector callSites;
   TrapSiteVectorArray trapSites;
   FuncImportVector funcImports;
   FuncExportVector funcExports;
+  StackMaps stackMaps;
 
   // Debug information, not serialized.
   Uint32Vector debugTrapFarJumpOffsets;
 
   FuncExport& lookupFuncExport(uint32_t funcIndex,
                                size_t* funcExportIndex = nullptr);
   const FuncExport& lookupFuncExport(uint32_t funcIndex,
                                      size_t* funcExportIndex = nullptr) const;
@@ -694,16 +695,17 @@ class Code : public ShareableBase<Code> 
   const MetadataTier& metadata(Tier iter) const {
     return codeTier(iter).metadata();
   }
 
   // Metadata lookup functions:
 
   const CallSite* lookupCallSite(void* returnAddress) const;
   const CodeRange* lookupFuncRange(void* pc) const;
+  const StackMap* lookupStackMap(uint8_t* nextPC) const;
   bool containsCodePC(const void* pc) const;
   bool lookupTrap(void* pc, Trap* trap, BytecodeOffset* bytecode) const;
 
   // To save memory, profilingLabels_ are generated lazily when profiling mode
   // is enabled.
 
   void ensureProfilingLabels(bool profilingEnabled) const;
   const char* profilingLabel(uint32_t funcIndex) const;
--- a/js/src/wasm/WasmFrameIter.cpp
+++ b/js/src/wasm/WasmFrameIter.cpp
@@ -37,17 +37,18 @@ WasmFrameIter::WasmFrameIter(JitActivati
     : activation_(activation),
       code_(nullptr),
       codeRange_(nullptr),
       lineOrBytecode_(0),
       fp_(fp ? fp : activation->wasmExitFP()),
       unwoundIonCallerFP_(nullptr),
       unwoundIonFrameType_(jit::FrameType(-1)),
       unwind_(Unwind::False),
-      unwoundAddressOfReturnAddress_(nullptr) {
+      unwoundAddressOfReturnAddress_(nullptr),
+      returnAddressToFp_(nullptr) {
   MOZ_ASSERT(fp_);
 
   // When the stack is captured during a trap (viz., to create the .stack
   // for an Error object), use the pc/bytecode information captured by the
   // signal handler in the runtime.
 
   if (activation->isWasmTrapping()) {
     const TrapData& trapData = activation->wasmTrapData();
@@ -102,16 +103,17 @@ void WasmFrameIter::operator++() {
   }
 
   popFrame();
 }
 
 void WasmFrameIter::popFrame() {
   Frame* prevFP = fp_;
   fp_ = prevFP->callerFP;
+  returnAddressToFp_ = (uint8_t*)prevFP->returnAddress;
 
   if (uintptr_t(fp_) & ExitOrJitEntryFPTag) {
     // We just unwound a frame pointer which has the low bit set,
     // indicating this is a direct call from the jit into the wasm
     // function's body. The call stack resembles this at this point:
     //
     // |---------------------|
     // |      JIT FRAME      |
@@ -298,16 +300,25 @@ DebugFrame* WasmFrameIter::debugFrame() 
 }
 
 jit::FrameType WasmFrameIter::unwoundIonFrameType() const {
   MOZ_ASSERT(unwoundIonCallerFP_);
   MOZ_ASSERT(unwoundIonFrameType_ != jit::FrameType(-1));
   return unwoundIonFrameType_;
 }
 
+uint8_t* WasmFrameIter::returnAddressToFp() const {
+  if (returnAddressToFp_) {
+    return returnAddressToFp_;
+  }
+  MOZ_ASSERT(activation_->isWasmTrapping());
+  // The next instruction is the instruction following the trap instruction.
+  return (uint8_t*)activation_->wasmTrapData().resumePC;
+}
+
 /*****************************************************************************/
 // Prologue/epilogue code generation
 
 // These constants reflect statically-determined offsets in the
 // prologue/epilogue. The offsets are dynamically asserted during code
 // generation.
 #if defined(JS_CODEGEN_X64)
 static const unsigned PushedRetAddr = 0;
--- a/js/src/wasm/WasmFrameIter.h
+++ b/js/src/wasm/WasmFrameIter.h
@@ -64,16 +64,17 @@ class WasmFrameIter {
   const Code* code_;
   const CodeRange* codeRange_;
   unsigned lineOrBytecode_;
   Frame* fp_;
   uint8_t* unwoundIonCallerFP_;
   jit::FrameType unwoundIonFrameType_;
   Unwind unwind_;
   void** unwoundAddressOfReturnAddress_;
+  uint8_t* returnAddressToFp_;
 
   void popFrame();
 
  public:
   // See comment above this class definition.
   explicit WasmFrameIter(jit::JitActivation* activation, Frame* fp = nullptr);
   const jit::JitActivation* activation() const { return activation_; }
   void setUnwind(Unwind unwind) { unwind_ = unwind; }
@@ -88,16 +89,21 @@ class WasmFrameIter {
   unsigned computeLine(uint32_t* column) const;
   const CodeRange* codeRange() const { return codeRange_; }
   Instance* instance() const;
   void** unwoundAddressOfReturnAddress() const;
   bool debugEnabled() const;
   DebugFrame* debugFrame() const;
   jit::FrameType unwoundIonFrameType() const;
   uint8_t* unwoundIonCallerFP() const { return unwoundIonCallerFP_; }
+  Frame* frame() const { return fp_; }
+
+  // Returns the return address of the frame above this one (that is, the
+  // return address that returns back to the current frame).
+  uint8_t* returnAddressToFp() const;
 };
 
 enum class SymbolicAddress;
 
 // An ExitReason describes the possible reasons for leaving compiled wasm
 // code or the state of not having left compiled wasm code
 // (ExitReason::None). It is either a known reason, or a enumeration to a native
 // function that is used for better display in the profiler.
--- a/js/src/wasm/WasmGenerator.cpp
+++ b/js/src/wasm/WasmGenerator.cpp
@@ -614,17 +614,17 @@ static bool AppendForEach(Vec* dstVec, c
   for (T* dst = dstStart; dst != dstEnd; dst++, src++) {
     new (dst) T(*src);
     op(dst - dstBegin, dst);
   }
 
   return true;
 }
 
-bool ModuleGenerator::linkCompiledCode(const CompiledCode& code) {
+bool ModuleGenerator::linkCompiledCode(CompiledCode& code) {
   // All code offsets in 'code' must be incremented by their position in the
   // overall module when the code was appended.
 
   masm_.haltingAlign(CodeAlignment);
   const size_t offsetInModule = masm_.size();
   if (!masm_.appendRawCode(code.bytes.begin(), code.bytes.length())) {
     return false;
   }
@@ -680,16 +680,27 @@ bool ModuleGenerator::linkCompiledCode(c
 #ifdef JS_CODELABEL_LINKMODE
     link.mode = codeLabel.linkMode();
 #endif
     if (!linkData_->internalLinks.append(link)) {
       return false;
     }
   }
 
+  for (size_t i = 0; i < code.stackMaps.length(); i++) {
+    StackMaps::Maplet maplet = code.stackMaps.move(i);
+    maplet.offsetBy(offsetInModule);
+    if (!metadataTier_->stackMaps.add(maplet)) {
+      // This function is now the only owner of maplet.map, so we'd better
+      // free it right now.
+      maplet.map->destroy();
+      return false;
+    }
+  }
+
   return true;
 }
 
 static bool ExecuteCompileTask(CompileTask* task, UniqueChars* error) {
   MOZ_ASSERT(task->lifo.isEmpty());
   MOZ_ASSERT(task->output.empty());
 
   switch (task->env.tier()) {
@@ -905,18 +916,32 @@ bool ModuleGenerator::finishCodegen() {
   MOZ_ASSERT(masm_.symbolicAccesses().empty());
   MOZ_ASSERT(masm_.codeLabels().empty());
 
   masm_.finish();
   return !masm_.oom();
 }
 
 bool ModuleGenerator::finishMetadataTier() {
+  // The stack maps aren't yet sorted.  Do so now, since we'll need to
+  // binary-search them at GC time.
+  metadataTier_->stackMaps.sort();
+
+#ifdef DEBUG
+  // Check that the stack map contains no duplicates, since that could lead to
+  // ambiguities about stack slot pointerness.
+  uint8_t* previousNextInsnAddr = nullptr;
+  for (size_t i = 0; i < metadataTier_->stackMaps.length(); i++) {
+    const StackMaps::Maplet& maplet = metadataTier_->stackMaps.get(i);
+    MOZ_ASSERT_IF(i > 0, uintptr_t(maplet.nextInsnAddr) >
+                             uintptr_t(previousNextInsnAddr));
+    previousNextInsnAddr = maplet.nextInsnAddr;
+  }
+
   // Assert all sorted metadata is sorted.
-#ifdef DEBUG
   uint32_t last = 0;
   for (const CodeRange& codeRange : metadataTier_->codeRanges) {
     MOZ_ASSERT(codeRange.begin() >= last);
     last = codeRange.end();
   }
 
   last = 0;
   for (const CallSite& callSite : metadataTier_->callSites) {
@@ -1004,16 +1029,27 @@ UniqueCodeTier ModuleGenerator::finishCo
   }
 
   UniqueModuleSegment segment =
       ModuleSegment::create(tier(), masm_, *linkData_);
   if (!segment) {
     return nullptr;
   }
 
+  metadataTier_->stackMaps.offsetBy(uintptr_t(segment->base()));
+
+#ifdef DEBUG
+  // Check that each stack map is associated with a plausible instruction.
+  for (size_t i = 0; i < metadataTier_->stackMaps.length(); i++) {
+    MOZ_ASSERT(IsValidStackMapKey(env_->debugEnabled(),
+                                  metadataTier_->stackMaps.get(i).nextInsnAddr),
+               "wasm stack map does not reference a valid insn");
+  }
+#endif
+
   return js::MakeUnique<CodeTier>(std::move(metadataTier_), std::move(segment));
 }
 
 SharedMetadata ModuleGenerator::finishMetadata(const Bytes& bytecode) {
   // Finish initialization of Metadata, which is only needed for constructing
   // the initial Module, not for tier-2 compilation.
   MOZ_ASSERT(mode() != CompileMode::Tier2);
 
--- a/js/src/wasm/WasmGenerator.h
+++ b/js/src/wasm/WasmGenerator.h
@@ -60,36 +60,38 @@ struct CompiledCode {
   Bytes bytes;
   CodeRangeVector codeRanges;
   CallSiteVector callSites;
   CallSiteTargetVector callSiteTargets;
   TrapSiteVectorArray trapSites;
   CallFarJumpVector callFarJumps;
   SymbolicAccessVector symbolicAccesses;
   jit::CodeLabelVector codeLabels;
+  StackMaps stackMaps;
 
   MOZ_MUST_USE bool swap(jit::MacroAssembler& masm);
 
   void clear() {
     bytes.clear();
     codeRanges.clear();
     callSites.clear();
     callSiteTargets.clear();
     trapSites.clear();
     callFarJumps.clear();
     symbolicAccesses.clear();
     codeLabels.clear();
+    stackMaps.clear();
     MOZ_ASSERT(empty());
   }
 
   bool empty() {
     return bytes.empty() && codeRanges.empty() && callSites.empty() &&
            callSiteTargets.empty() && trapSites.empty() &&
            callFarJumps.empty() && symbolicAccesses.empty() &&
-           codeLabels.empty();
+           codeLabels.empty() && stackMaps.empty();
   }
 
   size_t sizeOfExcludingThis(mozilla::MallocSizeOf mallocSizeOf) const;
 };
 
 // The CompileTaskState of a ModuleGenerator contains the mutable state shared
 // between helper threads executing CompileTasks. Each CompileTask started on a
 // helper thread eventually either ends up in the 'finished' list or increments
@@ -178,17 +180,17 @@ class MOZ_STACK_CLASS ModuleGenerator {
 
   bool allocateGlobalBytes(uint32_t bytes, uint32_t align,
                            uint32_t* globalDataOff);
 
   bool funcIsCompiled(uint32_t funcIndex) const;
   const CodeRange& funcCodeRange(uint32_t funcIndex) const;
   bool linkCallSites();
   void noteCodeRange(uint32_t codeRangeIndex, const CodeRange& codeRange);
-  bool linkCompiledCode(const CompiledCode& code);
+  bool linkCompiledCode(CompiledCode& code);
   bool locallyCompileCurrentTask();
   bool finishTask(CompileTask* task);
   bool launchBatchCompile();
   bool finishOutstandingTask();
   bool finishCodegen();
   bool finishMetadataTier();
   UniqueCodeTier finishCodeTier();
   SharedMetadata finishMetadata(const Bytes& bytecode);
--- a/js/src/wasm/WasmInstance.cpp
+++ b/js/src/wasm/WasmInstance.cpp
@@ -22,16 +22,17 @@
 #include "jit/BaselineJIT.h"
 #include "jit/InlinableNatives.h"
 #include "jit/JitCommon.h"
 #include "jit/JitRealm.h"
 #include "util/StringBuffer.h"
 #include "util/Text.h"
 #include "wasm/WasmBuiltins.h"
 #include "wasm/WasmModule.h"
+#include "wasm/WasmStubs.h"
 
 #include "gc/StoreBuffer-inl.h"
 #include "vm/ArrayBufferObject-inl.h"
 #include "vm/JSObject-inl.h"
 
 using namespace js;
 using namespace js::jit;
 using namespace js::wasm;
@@ -1117,16 +1118,90 @@ void Instance::trace(JSTracer* trc) {
   // Technically, instead of having this method, the caller could use
   // Instance::object() to get the owning WasmInstanceObject to mark,
   // but this method is simpler and more efficient. The trace hook of
   // WasmInstanceObject will call Instance::tracePrivate at which point we
   // can mark the rest of the children.
   TraceEdge(trc, &object_, "wasm instance object");
 }
 
+uintptr_t Instance::traceFrame(JSTracer* trc, const wasm::WasmFrameIter& wfi,
+                               uint8_t* nextPC,
+                               uintptr_t highestByteVisitedInPrevFrame) {
+  const StackMap* map = code().lookupStackMap(nextPC);
+  if (!map) {
+    return 0;
+  }
+
+  Frame* frame = wfi.frame();
+
+  // |frame| points somewhere in the middle of the area described by |map|.
+  // We have to calculate |scanStart|, the lowest address that is described by
+  // |map|, by consulting |map->frameOffsetFromTop|.
+
+  const size_t numMappedBytes = map->numMappedWords * sizeof(void*);
+  const uintptr_t scanStart = uintptr_t(frame) +
+                              (map->frameOffsetFromTop * sizeof(void*)) -
+                              numMappedBytes;
+  MOZ_ASSERT(0 == scanStart % sizeof(void*));
+
+  // Do what we can to assert that, for consecutive wasm frames, their stack
+  // maps also abut exactly.  This is a useful sanity check on the sizing of
+  // stack maps.
+  MOZ_ASSERT_IF(highestByteVisitedInPrevFrame != 0,
+                highestByteVisitedInPrevFrame + 1 == scanStart);
+
+  uintptr_t* stackWords = (uintptr_t*)scanStart;
+
+  // If we have some exit stub words, this means the map also covers an area
+  // created by a exit stub, and so the highest word of that should be a
+  // constant created by (code created by) GenerateTrapExit.
+  MOZ_ASSERT_IF(
+      map->numExitStubWords > 0,
+      stackWords[map->numExitStubWords - 1 - TrapExitDummyValueOffsetFromTop] ==
+          TrapExitDummyValue);
+
+  // And actually hand them off to the GC.
+  for (uint32_t i = 0; i < map->numMappedWords; i++) {
+    if (map->getBit(i) == 0) {
+      continue;
+    }
+
+    // This assertion seems at least moderately effective in detecting
+    // discrepancies or misalignments between the map and reality.
+    MOZ_ASSERT(js::gc::IsCellPointerValidOrNull((const void*)stackWords[i]));
+
+    if (stackWords[i]) {
+      TraceRoot(trc, (JSObject**)&stackWords[i],
+                "Instance::traceWasmFrame: normal word");
+    }
+  }
+
+  // Finally, deal with a ref-typed DebugFrame if it is present.
+  if (map->hasRefTypedDebugFrame) {
+    DebugFrame* debugFrame = DebugFrame::from(frame);
+    char* debugFrameP = (char*)debugFrame;
+
+    char* resultRefP = debugFrameP + DebugFrame::offsetOfResults();
+    if (*(intptr_t*)resultRefP) {
+      TraceRoot(trc, (JSObject**)resultRefP,
+                "Instance::traceWasmFrame: DebugFrame::resultRef_");
+    }
+
+    if (debugFrame->hasCachedReturnJSValue()) {
+      char* cachedReturnJSValueP =
+          debugFrameP + DebugFrame::offsetOfCachedReturnJSValue();
+      TraceRoot(trc, (js::Value*)cachedReturnJSValueP,
+                "Instance::traceWasmFrame: DebugFrame::cachedReturnJSValue_");
+    }
+  }
+
+  return scanStart + numMappedBytes - 1;
+}
+
 WasmMemoryObject* Instance::memory() const { return memory_; }
 
 SharedMem<uint8_t*> Instance::memoryBase() const {
   MOZ_ASSERT(metadata().usesMemory());
   MOZ_ASSERT(tlsData()->memoryBase == memory_->buffer().dataPointerEither());
   return memory_->buffer().dataPointerEither();
 }
 
--- a/js/src/wasm/WasmInstance.h
+++ b/js/src/wasm/WasmInstance.h
@@ -79,16 +79,27 @@ class Instance {
            HandleValVector globalImportValues,
            const WasmGlobalObjectVector& globalObjs,
            UniqueDebugState maybeDebug);
   ~Instance();
   bool init(JSContext* cx, const DataSegmentVector& dataSegments,
             const ElemSegmentVector& elemSegments);
   void trace(JSTracer* trc);
 
+  // Trace any GC roots on the stack, for the frame associated with |wfi|,
+  // whose next instruction to execute is |nextPC|.
+  //
+  // For consistency checking of StackMap sizes in debug builds, this also
+  // takes |highestByteVisitedInPrevFrame|, which is the address of the
+  // highest byte scanned in the frame below this one on the stack, and in
+  // turn it returns the address of the highest byte scanned in this frame.
+  uintptr_t traceFrame(JSTracer* trc, const wasm::WasmFrameIter& wfi,
+                       uint8_t* nextPC,
+                       uintptr_t highestByteVisitedInPrevFrame);
+
   JS::Realm* realm() const { return realm_; }
   const Code& code() const { return *code_; }
   const CodeTier& code(Tier t) const { return code_->codeTier(t); }
   bool debugEnabled() const { return !!maybeDebug_; }
   DebugState& debug() { return *maybeDebug_; }
   const ModuleSegment& moduleSegment(Tier t) const { return code_->segment(t); }
   TlsData* tlsData() const { return tlsData_.get(); }
   uint8_t* globalData() const { return (uint8_t*)&tlsData_->globalArea; }
--- a/js/src/wasm/WasmStubs.cpp
+++ b/js/src/wasm/WasmStubs.cpp
@@ -241,17 +241,18 @@ static void AssertExpectedSP(const Macro
 #ifdef JS_CODEGEN_ARM64
   MOZ_ASSERT(sp.Is(masm.GetStackPointer64()));
 #endif
 }
 
 template <class Operand>
 static void WasmPush(MacroAssembler& masm, const Operand& op) {
 #ifdef JS_CODEGEN_ARM64
-  // Allocate a pad word so that SP can remain properly aligned.
+  // Allocate a pad word so that SP can remain properly aligned.  |op| will be
+  // written at the lower-addressed of the two words pushed here.
   masm.reserveStack(WasmPushSize);
   masm.storePtr(op, Address(masm.getStackPointer(), 0));
 #else
   masm.Push(op);
 #endif
 }
 
 static void WasmPop(MacroAssembler& masm, Register r) {
@@ -268,20 +269,20 @@ static void MoveSPForJitABI(MacroAssembl
 #ifdef JS_CODEGEN_ARM64
   masm.moveStackPtrTo(PseudoStackPointer);
 #endif
 }
 
 #ifdef ENABLE_WASM_GC
 static void SuppressGC(MacroAssembler& masm, int32_t increment,
                        Register scratch) {
-  masm.loadPtr(Address(WasmTlsReg, offsetof(TlsData, cx)), scratch);
-  masm.add32(Imm32(increment),
-             Address(scratch, offsetof(JSContext, suppressGC) +
-                                  js::ThreadData<int32_t>::offsetOfValue()));
+  // masm.loadPtr(Address(WasmTlsReg, offsetof(TlsData, cx)), scratch);
+  // masm.add32(Imm32(increment),
+  //           Address(scratch, offsetof(JSContext, suppressGC) +
+  //                                js::ThreadData<int32_t>::offsetOfValue()));
 }
 #endif
 
 static void CallFuncExport(MacroAssembler& masm, const FuncExport& fe,
                            const Maybe<ImmPtr>& funcPtr) {
   MOZ_ASSERT(fe.hasEagerStubs() == !funcPtr);
   if (funcPtr) {
     masm.call(*funcPtr);
@@ -1773,33 +1774,52 @@ static_assert(!SupportsSimd,
 static const LiveRegisterSet RegsToPreserve(
     GeneralRegisterSet(Registers::AllMask &
                        ~(uint32_t(1) << Registers::StackPointer)),
     FloatRegisterSet(FloatRegisters::AllDoubleMask));
 static_assert(!SupportsSimd,
               "high lanes of SIMD registers need to be saved too");
 #endif
 
+// Generate a MachineState which describes the locations of the GPRs as saved
+// by GenerateTrapExit.  FP registers are ignored.  Note that the values
+// stored in the MachineState are offsets in words downwards from the top of
+// the save area.  That is, a higher value implies a lower address.
+void wasm::GenerateTrapExitMachineState(MachineState* machine,
+                                        size_t* numWords) {
+  // This is the number of words pushed by the initial WasmPush().
+  *numWords = WasmPushSize / sizeof(void*);
+  MOZ_ASSERT(*numWords == TrapExitDummyValueOffsetFromTop + 1);
+
+  // And these correspond to the PushRegsInMask() that immediately follows.
+  for (GeneralRegisterBackwardIterator iter(RegsToPreserve.gprs()); iter.more();
+       ++iter) {
+    machine->setRegisterLocation(*iter,
+                                 reinterpret_cast<uintptr_t*>(*numWords));
+    (*numWords)++;
+  }
+}
+
 // Generate a stub which calls WasmReportTrap() and can be executed by having
 // the signal handler redirect PC from any trapping instruction.
 static bool GenerateTrapExit(MacroAssembler& masm, Label* throwLabel,
                              Offsets* offsets) {
   AssertExpectedSP(masm);
   masm.haltingAlign(CodeAlignment);
 
   masm.setFramePushed(0);
 
   offsets->begin = masm.currentOffset();
 
   // Traps can only happen at well-defined program points. However, since
   // traps may resume and the optimal assumption for the surrounding code is
   // that registers are not clobbered, we need to preserve all registers in
   // the trap exit. One simplifying assumption is that flags may be clobbered.
   // Push a dummy word to use as return address below.
-  WasmPush(masm, ImmWord(0));
+  WasmPush(masm, ImmWord(TrapExitDummyValue));
   unsigned framePushedBeforePreserve = masm.framePushed();
   masm.PushRegsInMask(RegsToPreserve);
   unsigned offsetOfReturnWord = masm.framePushed() - framePushedBeforePreserve;
 
   // We know that StackPointer is word-aligned, but not necessarily
   // stack-aligned, so we need to align it dynamically.
   Register preAlignStackPointer = ABINonVolatileReg;
   masm.moveStackPtrTo(preAlignStackPointer);
--- a/js/src/wasm/WasmStubs.h
+++ b/js/src/wasm/WasmStubs.h
@@ -39,16 +39,33 @@ extern bool GenerateStubs(const ModuleEn
 
 extern bool GenerateEntryStubs(jit::MacroAssembler& masm,
                                size_t funcExportIndex,
                                const FuncExport& funcExport,
                                const Maybe<jit::ImmPtr>& callee, bool isAsmJS,
                                HasGcTypes gcTypesConfigured,
                                CodeRangeVector* codeRanges);
 
+extern void GenerateTrapExitMachineState(jit::MachineState* machine,
+                                         size_t* numWords);
+
+// A value that is written into the trap exit frame, which is useful for
+// cross-checking during garbage collection.
+static constexpr uintptr_t TrapExitDummyValue = 1337;
+
+// And its offset, in words, down from the highest-addressed word of the trap
+// exit frame.  The value is written into the frame using WasmPush.  In the
+// case where WasmPush allocates more than one word, the value will therefore
+// be written at the lowest-addressed word.
+#ifdef JS_CODEGEN_ARM64
+static constexpr size_t TrapExitDummyValueOffsetFromTop = 1;
+#else
+static constexpr size_t TrapExitDummyValueOffsetFromTop = 0;
+#endif
+
 // An argument that will end up on the stack according to the system ABI, to be
 // passed to GenerateDirectCallFromJit. Since the direct JIT call creates its
 // own frame, it is its responsibility to put stack arguments to their expected
 // locations; so the caller of GenerateDirectCallFromJit can put them anywhere.
 
 class JitCallStackArg {
  public:
   enum class Tag {
--- a/js/src/wasm/WasmTypes.h
+++ b/js/src/wasm/WasmTypes.h
@@ -17,16 +17,17 @@
  */
 
 #ifndef wasm_types_h
 #define wasm_types_h
 
 #include "mozilla/Alignment.h"
 #include "mozilla/ArrayUtils.h"
 #include "mozilla/Atomics.h"
+#include "mozilla/BinarySearch.h"
 #include "mozilla/EnumeratedArray.h"
 #include "mozilla/HashFunctions.h"
 #include "mozilla/Maybe.h"
 #include "mozilla/RefPtr.h"
 #include "mozilla/Unused.h"
 
 #include "NamespaceImports.h"
 
@@ -1704,16 +1705,222 @@ class CallSiteTarget {
     MOZ_ASSERT(kind_ == TrapExit);
     MOZ_ASSERT(packed_ < uint32_t(Trap::Limit));
     return Trap(packed_);
   }
 };
 
 typedef Vector<CallSiteTarget, 0, SystemAllocPolicy> CallSiteTargetVector;
 
+typedef Vector<bool, 32, SystemAllocPolicy> ExitStubMapVector;
+
+struct StackMap final {
+  // A StackMap is a bit-array containing numMappedWords bits, one bit per
+  // word of stack.  Bit index zero is for the lowest addressed word in the
+  // range.
+  //
+  // This is a variable-length structure whose size must be known at creation
+  // time.
+  //
+  // Users of the map will know the address of the wasm::Frame that is covered
+  // by this map.  In order that they can calculate the exact address range
+  // covered by the map, the map also stores the offset, from the highest
+  // addressed word of the map, of the embedded wasm::Frame.  This is an
+  // offset down from the highest address, rather than up from the lowest, so
+  // as to limit its range to 11 bits, where
+  // 11 == ceil(log2(MaxParams * sizeof-biggest-param-type-in-words))
+  //
+  // The map may also cover a ref-typed DebugFrame.  If so that can be noted,
+  // since users of the map need to trace pointers in such a DebugFrame.
+  //
+  // Finally, for sanity checking only, for stack maps associated with a wasm
+  // trap exit stub, the number of words used by the trap exit stub save area
+  // is also noted.  This is used in Instance::traceFrame to check that the
+  // TrapExitDummyValue is in the expected place in the frame.
+
+  // The total number of stack words covered by the map ..
+  uint32_t numMappedWords : 30;
+
+  // .. of which this many are "exit stub" extras
+  uint32_t numExitStubWords : 6;
+
+  // Where is Frame* relative to the top?  This is an offset in words.
+  uint32_t frameOffsetFromTop : 11;
+
+  // Notes the presence of a ref-typed DebugFrame.
+  uint32_t hasRefTypedDebugFrame : 1;
+
+ private:
+  static constexpr uint32_t maxMappedWords = (1 << 30) - 1;
+  static constexpr uint32_t maxExitStubWords = (1 << 6) - 1;
+  static constexpr uint32_t maxFrameOffsetFromTop = (1 << 11) - 1;
+
+  uint32_t bitmap[1];
+
+  explicit StackMap(uint32_t numMappedWords)
+      : numMappedWords(numMappedWords),
+        numExitStubWords(0),
+        frameOffsetFromTop(0),
+        hasRefTypedDebugFrame(0) {
+    const uint32_t nBitmap = calcNBitmap(numMappedWords);
+    memset(bitmap, 0, nBitmap * sizeof(bitmap[0]));
+  }
+
+ public:
+  static StackMap* create(uint32_t numMappedWords) {
+    uint32_t nBitmap = calcNBitmap(numMappedWords);
+    char* buf =
+        (char*)js_malloc(sizeof(StackMap) + (nBitmap - 1) * sizeof(bitmap[0]));
+    if (!buf) {
+      return nullptr;
+    }
+    return ::new (buf) StackMap(numMappedWords);
+  }
+
+  void destroy() { js_free((char*)this); }
+
+  // Record the number of words in the map used as a wasm trap exit stub
+  // save area.  See comment above.
+  void setExitStubWords(uint32_t nWords) {
+    MOZ_ASSERT(numExitStubWords == 0);
+    MOZ_RELEASE_ASSERT(nWords <= maxExitStubWords);
+    MOZ_ASSERT(nWords <= numMappedWords);
+    numExitStubWords = nWords;
+  }
+
+  // Record the offset from the highest-addressed word of the map, that the
+  // wasm::Frame lives at.  See comment above.
+  void setFrameOffsetFromTop(uint32_t nWords) {
+    MOZ_ASSERT(frameOffsetFromTop == 0);
+    MOZ_RELEASE_ASSERT(nWords <= maxFrameOffsetFromTop);
+    MOZ_ASSERT(frameOffsetFromTop < numMappedWords);
+    frameOffsetFromTop = nWords;
+  }
+
+  // If the frame described by this StackMap includes a DebugFrame for a
+  // ref-typed return value, call here to record that fact.
+  void setHasRefTypedDebugFrame() {
+    MOZ_ASSERT(hasRefTypedDebugFrame == 0);
+    hasRefTypedDebugFrame = 1;
+  }
+
+  inline void setBit(uint32_t bitIndex) {
+    MOZ_ASSERT(bitIndex < numMappedWords);
+    uint32_t wordIndex = bitIndex / wordsPerBitmapElem;
+    uint32_t wordOffset = bitIndex % wordsPerBitmapElem;
+    bitmap[wordIndex] |= (1 << wordOffset);
+  }
+
+  inline uint32_t getBit(uint32_t bitIndex) const {
+    MOZ_ASSERT(bitIndex < numMappedWords);
+    uint32_t wordIndex = bitIndex / wordsPerBitmapElem;
+    uint32_t wordOffset = bitIndex % wordsPerBitmapElem;
+    return (bitmap[wordIndex] >> wordOffset) & 1;
+  }
+
+ private:
+  static constexpr uint32_t wordsPerBitmapElem = sizeof(bitmap[0]) * 8;
+
+  static uint32_t calcNBitmap(uint32_t numMappedWords) {
+    MOZ_RELEASE_ASSERT(numMappedWords <= maxMappedWords);
+    uint32_t nBitmap =
+        (numMappedWords + wordsPerBitmapElem - 1) / wordsPerBitmapElem;
+    return nBitmap == 0 ? 1 : nBitmap;
+  }
+};
+
+// This is the expected size for a map that covers 32 or fewer words.
+static_assert(sizeof(StackMap) == 12, "wasm::StackMap has unexpected size");
+
+class StackMaps {
+ public:
+  // A Maplet holds a single code-address-to-map binding.  Note that the
+  // code address is the lowest address of the instruction immediately
+  // following the instruction of interest, not of the instruction of
+  // interest itself.  In practice (at least for the Wasm Baseline compiler)
+  // this means that |nextInsnAddr| points either immediately after a call
+  // instruction, after a trap instruction or after a no-op.
+  struct Maplet {
+    uint8_t* nextInsnAddr;
+    StackMap* map;
+    Maplet(uint8_t* nextInsnAddr, StackMap* map)
+        : nextInsnAddr(nextInsnAddr), map(map) {}
+    void offsetBy(uintptr_t delta) { nextInsnAddr += delta; }
+    bool operator<(const Maplet& other) const {
+      return uintptr_t(nextInsnAddr) < uintptr_t(other.nextInsnAddr);
+    }
+  };
+
+ private:
+  bool sorted_;
+  Vector<Maplet, 0, SystemAllocPolicy> mapping_;
+
+ public:
+  StackMaps() : sorted_(false) {}
+  ~StackMaps() {
+    for (size_t i = 0; i < mapping_.length(); i++) {
+      mapping_[i].map->destroy();
+      mapping_[i].map = nullptr;
+    }
+  }
+  MOZ_MUST_USE bool add(uint8_t* nextInsnAddr, StackMap* map) {
+    MOZ_ASSERT(!sorted_);
+    return mapping_.append(Maplet(nextInsnAddr, map));
+  }
+  MOZ_MUST_USE bool add(const Maplet& maplet) {
+    return add(maplet.nextInsnAddr, maplet.map);
+  }
+  void clear() {
+    for (size_t i = 0; i < mapping_.length(); i++) {
+      mapping_[i].nextInsnAddr = nullptr;
+      mapping_[i].map = nullptr;
+    }
+    mapping_.clear();
+  }
+  bool empty() const { return mapping_.empty(); }
+  size_t length() const { return mapping_.length(); }
+  Maplet get(size_t i) const { return mapping_[i]; }
+  Maplet move(size_t i) {
+    Maplet m = mapping_[i];
+    mapping_[i].map = nullptr;
+    return m;
+  }
+  void offsetBy(uintptr_t delta) {
+    for (size_t i = 0; i < mapping_.length(); i++) mapping_[i].offsetBy(delta);
+  }
+  void sort() {
+    MOZ_ASSERT(!sorted_);
+    std::sort(mapping_.begin(), mapping_.end());
+    sorted_ = true;
+  }
+  const StackMap* findMap(uint8_t* nextInsnAddr) const {
+    struct Comparator {
+      int operator()(Maplet aVal) const {
+        if (uintptr_t(mTarget) < uintptr_t(aVal.nextInsnAddr)) {
+          return -1;
+        }
+        if (uintptr_t(mTarget) > uintptr_t(aVal.nextInsnAddr)) {
+          return 1;
+        }
+        return 0;
+      }
+      explicit Comparator(uint8_t* aTarget) : mTarget(aTarget) {}
+      const uint8_t* mTarget;
+    };
+
+    size_t result;
+    if (BinarySearchIf(mapping_, 0, mapping_.length(), Comparator(nextInsnAddr),
+                       &result)) {
+      return mapping_[result].map;
+    }
+
+    return nullptr;
+  }
+};
+
 // A wasm::SymbolicAddress represents a pointer to a well-known function that is
 // embedded in wasm code. Since wasm code is serialized and later deserialized
 // into a different address space, symbolic addresses must be used for *all*
 // pointers into the address space. The MacroAssembler records a list of all
 // SymbolicAddresses and the offsets of their use in the code for later patching
 // during static linking.
 
 enum class SymbolicAddress {
@@ -2248,16 +2455,17 @@ class DebugFrame {
   GlobalObject* global() const;
   JSObject* environmentChain() const;
   bool getLocal(uint32_t localIndex, MutableHandleValue vp);
 
   // The return value must be written from the unboxed representation in the
   // results union into cachedReturnJSValue_ by updateReturnJSValue() before
   // returnValue() can return a Handle to it.
 
+  bool hasCachedReturnJSValue() const { return hasCachedReturnJSValue_; }
   void updateReturnJSValue();
   HandleValue returnValue() const;
   void clearReturnJSValue();
 
   // Once the debugger observes a frame, it must be notified via
   // onLeaveFrame() before the frame is popped. Calling observe() ensures the
   // leave frame traps are enabled. Both methods are idempotent so the caller
   // doesn't have to worry about calling them more than once.
@@ -2286,16 +2494,19 @@ class DebugFrame {
   void setHasCachedSavedFrame() { hasCachedSavedFrame_ = true; }
   void clearHasCachedSavedFrame() { hasCachedSavedFrame_ = false; }
 
   // DebugFrame is accessed directly by JIT code.
 
   static constexpr size_t offsetOfResults() {
     return offsetof(DebugFrame, resultI32_);
   }
+  static constexpr size_t offsetOfCachedReturnJSValue() {
+    return offsetof(DebugFrame, cachedReturnJSValue_);
+  }
   static constexpr size_t offsetOfFlagsWord() {
     return offsetof(DebugFrame, flagsWord_);
   }
   static constexpr size_t offsetOfFuncIndex() {
     return offsetof(DebugFrame, funcIndex_);
   }
   static constexpr size_t offsetOfFrame() {
     return offsetof(DebugFrame, frame_);