Bug 1316843 - annotate TODO items with bugzilla references. r=me DONTBUILD
authorLars T Hansen <lhansen@mozilla.com>
Fri, 11 Nov 2016 12:47:27 +0100
changeset 352191 3f1ec7d616d64068b447ede8d6fe2d59d2cf09b2
parent 352190 fb5ae665310e4f60464ccd43b2626fdfc2087a67
child 352192 f0103ae8bdd0200622f03235c02c3101bc9610a5
push id6795
push userjlund@mozilla.com
push dateMon, 23 Jan 2017 14:19:46 +0000
treeherdermozilla-esr52@76101b503191 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersme
bugs1316843
milestone52.0a1
Bug 1316843 - annotate TODO items with bugzilla references. r=me DONTBUILD
js/src/wasm/WasmBaselineCompile.cpp
--- a/js/src/wasm/WasmBaselineCompile.cpp
+++ b/js/src/wasm/WasmBaselineCompile.cpp
@@ -15,87 +15,89 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 
 /* WebAssembly baseline compiler ("RabaldrMonkey")
  *
  * General status notes:
  *
- * "FIXME" indicates a known or suspected bug.
+ * "FIXME" indicates a known or suspected bug.  Always has a bug#.
  *
- * "TODO" indicates an opportunity for a general improvement, with an
- * additional tag to indicate the area of improvement.  These are
- * generally not filed as bugs.
+ * "TODO" indicates an opportunity for a general improvement, with an additional
+ * tag to indicate the area of improvement.  Usually has a bug#.
  *
  * Unimplemented functionality:
  *
  *  - Tiered compilation (bug 1277562)
  *  - profiler support / devtools (bug 1286948)
  *  - SIMD
  *  - Atomics
  *
- * There are lots of machine dependencies here but they are pretty
- * well isolated to a segment of the compiler.  Many dependencies
- * will eventually be factored into the MacroAssembler layer and shared
- * with other code generators.
+ * There are lots of machine dependencies here but they are pretty well isolated
+ * to a segment of the compiler.  Many dependencies will eventually be factored
+ * into the MacroAssembler layer and shared with other code generators.
  *
  *
  * High-value compiler performance improvements:
  *
- * - The specific-register allocator (the needI32(r), needI64(r) etc
- *   methods) can avoid syncing the value stack if the specific
- *   register is in use but there is a free register to shuffle the
- *   specific register into.  (This will also improve the generated
- *   code.)  The sync happens often enough here to show up in
- *   profiles, because it is triggered by integer multiply and divide.
+ * - (Bug 1316802) The specific-register allocator (the needI32(r), needI64(r)
+ *   etc methods) can avoid syncing the value stack if the specific register is
+ *   in use but there is a free register to shuffle the specific register into.
+ *   (This will also improve the generated code.)  The sync happens often enough
+ *   here to show up in profiles, because it is triggered by integer multiply
+ *   and divide.
  *
  *
  * High-value code generation improvements:
  *
- * - Many opportunities for cheaply folding in a constant rhs, we do
- *   this already for I32 add and shift operators, this reduces
- *   register pressure and instruction count.
+ * - (Bug 1316803) Opportunities for cheaply folding in a constant rhs to
+ *   arithmetic operations, we do this already for I32 add and shift operators,
+ *   this reduces register pressure and instruction count.
+ *
+ * - (Bug 1286816) Opportunities for cheaply folding in a constant rhs to
+ *   conditionals.
  *
- * - Boolean evaluation for control can be optimized by pushing a
- *   bool-generating operation onto the value stack in the same way
- *   that we now push latent constants and local lookups, or (easier)
- *   by remembering the operation in a side location if the next Expr
- *   will consume it.
+ * - (Bug 1286816) Boolean evaluation for control can be optimized by pushing a
+ *   bool-generating operation onto the value stack in the same way that we now
+ *   push latent constants and local lookups, or (easier) by remembering the
+ *   operation in a side location if the next Expr will consume it.
  *
- * - Conditional branches (br_if and br_table) pessimize by branching
- *   over code that performs stack cleanup and a branch.  But if no
- *   cleanup is needed we could just branch conditionally to the
- *   target.
+ * - (Bug 1286816) brIf pessimizes by branching over code that performs stack
+ *   cleanup and a branch.  If no cleanup is needed we can just branch
+ *   conditionally to the target.
+ *
+ * - (Bug 1316804) brTable pessimizes by always dispatching to code that pops
+ *   the stack and then jumps to the code for the target case.  If no cleanup is
+ *   needed we could just branch conditionally to the target; if the same amount
+ *   of cleanup is needed for all cases then the cleanup can be done before the
+ *   dispatch.  Both are highly likely.
  *
- * - Register management around calls: At the moment we sync the value
- *   stack unconditionally (this is simple) but there are probably
- *   many common cases where we could instead save/restore live
- *   caller-saves registers and perform parallel assignment into
- *   argument registers.  This may be important if we keep some locals
- *   in registers.
+ * - (Bug 1316806) Register management around calls: At the moment we sync the
+ *   value stack unconditionally (this is simple) but there are probably many
+ *   common cases where we could instead save/restore live caller-saves
+ *   registers and perform parallel assignment into argument registers.  This
+ *   may be important if we keep some locals in registers.
  *
- * - Allocate some locals to registers on machines where there are
- *   enough registers.  This is probably hard to do well in a one-pass
- *   compiler but it might be that just keeping register arguments and
- *   the first few locals in registers is a viable strategy; another
- *   (more general) strategy is caching locals in registers in
- *   straight-line code.  Such caching could also track constant
- *   values in registers, if that is deemed valuable.  A combination
- *   of techniques may be desirable: parameters and the first few
- *   locals could be cached on entry to the function but not
- *   statically assigned to registers throughout.
+ * - (Bug 1316808) Allocate some locals to registers on machines where there are
+ *   enough registers.  This is probably hard to do well in a one-pass compiler
+ *   but it might be that just keeping register arguments and the first few
+ *   locals in registers is a viable strategy; another (more general) strategy
+ *   is caching locals in registers in straight-line code.  Such caching could
+ *   also track constant values in registers, if that is deemed valuable.  A
+ *   combination of techniques may be desirable: parameters and the first few
+ *   locals could be cached on entry to the function but not statically assigned
+ *   to registers throughout.
  *
- *   (On a large corpus of code it should be possible to compute, for
- *   every signature comprising the types of parameters and locals,
- *   and using a static weight for loops, a list in priority order of
- *   which parameters and locals that should be assigned to registers.
- *   Or something like that.  Wasm makes this simple.  Static
- *   assignments are desirable because they are not flushed to memory
- *   by the pre-block sync() call.)
+ *   (On a large corpus of code it should be possible to compute, for every
+ *   signature comprising the types of parameters and locals, and using a static
+ *   weight for loops, a list in priority order of which parameters and locals
+ *   that should be assigned to registers.  Or something like that.  Wasm makes
+ *   this simple.  Static assignments are desirable because they are not flushed
+ *   to memory by the pre-block sync() call.)
  */
 
 #include "wasm/WasmBaselineCompile.h"
 
 #include "mozilla/MathAlgorithms.h"
 
 #include "jit/AtomicOp.h"
 #include "jit/IonTypes.h"
@@ -137,18 +139,19 @@ struct BaseCompilePolicy : ExprIterPolic
 
     // The baseline compiler tracks values on a stack of its own -- it
     // needs to scan that stack for spilling -- and thus has no need
     // for the values maintained by the iterator.
     //
     // The baseline compiler tracks control items on a stack of its
     // own as well.
     //
-    // TODO / REDUNDANT: It would be nice if we could make use of the
-    // iterator's ControlItems and not require our own stack for that.
+    // TODO / REDUNDANT (Bug 1316814): It would be nice if we could
+    // make use of the iterator's ControlItems and not require our own
+    // stack for that.
 };
 
 typedef ExprIter<BaseCompilePolicy> BaseExprIter;
 
 typedef bool IsUnsigned;
 typedef bool IsSigned;
 typedef bool ZeroOnOverflow;
 typedef bool IsKnownNotZero;
@@ -300,18 +303,19 @@ class BaseCompiler
         }
     };
 
     typedef UniquePtr<PooledLabel, UniquePooledLabelFreePolicy> UniquePooledLabel;
 
     // The strongly typed register wrappers have saved my bacon a few
     // times; though they are largely redundant they stay, for now.
 
-    // TODO / INVESTIGATE: Things would probably be simpler if these
-    // inherited from Register, Register64, and FloatRegister.
+    // TODO / INVESTIGATE (Bug 1316815): Things would probably be
+    // simpler if these inherited from Register, Register64, and
+    // FloatRegister.
 
     struct RegI32
     {
         RegI32() : reg(Register::Invalid()) {}
         explicit RegI32(Register reg) : reg(reg) {}
         Register reg;
         bool operator==(const RegI32& that) { return reg == that.reg; }
         bool operator!=(const RegI32& that) { return reg != that.reg; }
@@ -1004,72 +1008,72 @@ class BaseCompiler
     }
 
     void freeF32(RegF32 r) {
         freeFPU(r.reg);
     }
 
     MOZ_MUST_USE RegI32 needI32() {
         if (!hasGPR())
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         return RegI32(allocGPR());
     }
 
     void needI32(RegI32 specific) {
         if (!isAvailable(specific.reg))
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         allocGPR(specific.reg);
     }
 
     // TODO / OPTIMIZE: need2xI32() can be optimized along with needI32()
-    // to avoid sync().
+    // to avoid sync(). (Bug 1316802)
 
     void need2xI32(RegI32 r0, RegI32 r1) {
         needI32(r0);
         needI32(r1);
     }
 
     MOZ_MUST_USE RegI64 needI64() {
         if (!hasInt64())
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         return RegI64(allocInt64());
     }
 
     void needI64(RegI64 specific) {
         if (!isAvailable(specific.reg))
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         allocInt64(specific.reg);
     }
 
     void need2xI64(RegI64 r0, RegI64 r1) {
         needI64(r0);
         needI64(r1);
     }
 
     MOZ_MUST_USE RegF32 needF32() {
         if (!hasFPU<MIRType::Float32>())
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         return RegF32(allocFPU<MIRType::Float32>());
     }
 
     void needF32(RegF32 specific) {
         if (!isAvailable(specific.reg))
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         allocFPU(specific.reg);
     }
 
     MOZ_MUST_USE RegF64 needF64() {
         if (!hasFPU<MIRType::Double>())
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         return RegF64(allocFPU<MIRType::Double>());
     }
 
     void needF64(RegF64 specific) {
         if (!isAvailable(specific.reg))
-            sync();            // TODO / OPTIMIZE: improve this
+            sync();            // TODO / OPTIMIZE: improve this (Bug 1316802)
         allocFPU(specific.reg);
     }
 
     void moveI32(RegI32 src, RegI32 dest) {
         if (src != dest)
             masm.move32(src.reg, dest.reg);
     }
 
@@ -1127,17 +1131,18 @@ class BaseCompiler
             break;
           default:
             MOZ_CRASH("Compiler bug: Expected int on stack");
         }
     }
 
     // TODO / OPTIMIZE: Refactor loadI64, loadF64, and loadF32 in the
     // same way as loadI32 to avoid redundant dispatch in callers of
-    // these load() functions.
+    // these load() functions.  (Bug 1316816, also see annotations on
+    // popI64 et al below.)
 
     void loadI64(Register64 r, Stk& src) {
         switch (src.kind()) {
           case Stk::ConstI64:
             masm.move64(Imm64(src.i64val()), r);
             break;
           case Stk::MemI64:
             loadFromFrameI64(r, src.offs());
@@ -1245,17 +1250,17 @@ class BaseCompiler
             MOZ_CRASH("Compiler bug: expected float on stack");
         }
     }
 
     // Flush all local and register value stack elements to memory.
     //
     // TODO / OPTIMIZE: As this is fairly expensive and causes worse
     // code to be emitted subsequently, it is useful to avoid calling
-    // it.
+    // it.  (Bug 1316802)
     //
     // Some optimization has been done already.  Remaining
     // opportunities:
     //
     //  - It would be interesting to see if we can specialize it
     //    before calls with particularly simple signatures, or where
     //    we can do parallel assignment of register arguments, or
     //    similar.  See notes in emitCall().
@@ -1374,17 +1379,17 @@ class BaseCompiler
             if (kind <= Stk::LocalLast && stk_[i-1].slot() == slot)
                 return true;
         }
         return false;
     }
 
     void syncLocal(uint32_t slot) {
         if (hasLocal(slot))
-            sync();            // TODO / OPTIMIZE: Improve this?
+            sync();            // TODO / OPTIMIZE: Improve this?  (Bug 1316817)
     }
 
     // Push the register r onto the stack.
 
     void pushI32(RegI32 r) {
         MOZ_ASSERT(!isAvailable(r.reg));
         Stk& x = push();
         x.setI32Reg(r);
@@ -1511,17 +1516,17 @@ class BaseCompiler
         stk_.popBack();
         return specific;
     }
 
     // PRIVATE.  Call only from other popI64() variants.
     // v must be the stack top.
 
     void popI64(Stk& v, RegI64 r) {
-        // TODO / OPTIMIZE: avoid loadI64() here
+        // TODO / OPTIMIZE: avoid loadI64() here.  (Bug 1316816)
         switch (v.kind()) {
           case Stk::ConstI64:
           case Stk::LocalI64:
             loadI64(r.reg, v);
             break;
           case Stk::MemI64:
 #ifdef JS_PUNBOX64
             masm.Pop(r.reg.reg);
@@ -1570,17 +1575,17 @@ class BaseCompiler
         stk_.popBack();
         return specific;
     }
 
     // PRIVATE.  Call only from other popF64() variants.
     // v must be the stack top.
 
     void popF64(Stk& v, RegF64 r) {
-        // TODO / OPTIMIZE: avoid loadF64 here
+        // TODO / OPTIMIZE: avoid loadF64 here.  (Bug 1316816)
         switch (v.kind()) {
           case Stk::ConstF64:
           case Stk::LocalF64:
             loadF64(r.reg, v);
             break;
           case Stk::MemF64:
             masm.Pop(r.reg);
             break;
@@ -1619,17 +1624,17 @@ class BaseCompiler
         stk_.popBack();
         return specific;
     }
 
     // PRIVATE.  Call only from other popF32() variants.
     // v must be the stack top.
 
     void popF32(Stk& v, RegF32 r) {
-        // TODO / OPTIMIZE: avoid loadF32 here
+        // TODO / OPTIMIZE: avoid loadF32 here.  (Bug 1316816)
         switch (v.kind()) {
           case Stk::ConstF32:
           case Stk::LocalF32:
             loadF32(r.reg, v);
             break;
           case Stk::MemF32:
             masm.Pop(r.reg);
             break;
@@ -1673,21 +1678,21 @@ class BaseCompiler
         Stk& v = stk_.back();
         if (v.kind() != Stk::ConstI32)
             return false;
         c = v.i32val();
         stk_.popBack();
         return true;
     }
 
-    // TODO / OPTIMIZE: At the moment we use ReturnReg for JoinReg.
-    // It is possible other choices would lead to better register
-    // allocation, as ReturnReg is often first in the register set and
-    // will be heavily wanted by the register allocator that uses
-    // takeFirst().
+    // TODO / OPTIMIZE (Bug 1316818): At the moment we use ReturnReg
+    // for JoinReg.  It is possible other choices would lead to better
+    // register allocation, as ReturnReg is often first in the
+    // register set and will be heavily wanted by the register
+    // allocator that uses takeFirst().
     //
     // Obvious options:
     //  - pick a register at the back of the register set
     //  - pick a random register per block (different blocks have
     //    different join regs)
     //
     // On the other hand, we sync() before every block and only the
     // JoinReg is live out of the block.  But on the way out, we
@@ -1948,19 +1953,19 @@ class BaseCompiler
             popValueStackTo(ctl_.back().stackSize);
     }
 
     Control& controlItem(uint32_t relativeDepth) {
         return ctl_[ctl_.length() - 1 - relativeDepth];
     }
 
     MOZ_MUST_USE PooledLabel* newLabel() {
-        // TODO / INVESTIGATE: allocate() is fallible, but we can
-        // probably rely on an infallible allocator here.  That would
-        // simplify code later.
+        // TODO / INVESTIGATE (Bug 1316819): allocate() is fallible, but we can
+        // probably rely on an infallible allocator here.  That would simplify
+        // code later.
         PooledLabel* candidate = labelPool_.allocate();
         if (!candidate)
             return nullptr;
         return new (candidate) PooledLabel(this);
     }
 
     void freeLabel(PooledLabel* label) {
         label->~PooledLabel();
@@ -2021,16 +2026,18 @@ class BaseCompiler
         }
 
         // The TLS pointer is always passed as a hidden argument in WasmTlsReg.
         // Save it into its assigned local slot.
         storeToFramePtr(WasmTlsReg, localInfo_[tlsSlot_].offs());
 
         // Initialize the stack locals to zero.
         //
+        // The following are all Bug 1316820:
+        //
         // TODO / OPTIMIZE: on x64, at least, scratch will be a 64-bit
         // register and we can move 64 bits at a time.
         //
         // TODO / OPTIMIZE: On SSE2 or better SIMD systems we may be
         // able to store 128 bits at a time.  (I suppose on some
         // systems we have 512-bit SIMD for that matter.)
         //
         // TODO / OPTIMIZE: if we have only one initializing store
@@ -2161,19 +2168,18 @@ class BaseCompiler
             masm.freeStack(adjustment);
 
         if (call.reloadMachineStateAfter) {
             loadFromFramePtr(WasmTlsReg, frameOffsetFromSlot(tlsSlot_, MIRType::Pointer));
             masm.loadWasmPinnedRegsFromTls();
         }
     }
 
-    // TODO / OPTIMIZE: This is expensive; let's roll the iterator
-    // walking into the walking done for passArg.  See comments in
-    // passArg.
+    // TODO / OPTIMIZE (Bug 1316820): This is expensive; let's roll the iterator
+    // walking into the walking done for passArg.  See comments in passArg.
 
     size_t stackArgAreaSize(const ValTypeVector& args) {
         ABIArgIter<const ValTypeVector> i(args);
         while (!i.done())
             i++;
         return AlignBytes(i.stackBytesConsumedSoFar(), 16u);
     }
 
@@ -2185,21 +2191,22 @@ class BaseCompiler
         if (adjustment)
             masm.reserveStack(adjustment);
     }
 
     const ABIArg reservePointerArgument(FunctionCall& call) {
         return call.abi.next(MIRType::Pointer);
     }
 
-    // TODO / OPTIMIZE: Note passArg is used only in one place.  (Or it was,
-    // until Luke wandered through, but that can be fixed again.)  I'm not
-    // saying we should manually inline it, but we could hoist the dispatch into
-    // the caller and have type-specific implementations of passArg:
-    // passArgI32(), etc.  Then those might be inlined, at least in PGO builds.
+    // TODO / OPTIMIZE (Bug 1316820): Note passArg is used only in one place.
+    // (Or it was, until Luke wandered through, but that can be fixed again.)
+    // I'm not saying we should manually inline it, but we could hoist the
+    // dispatch into the caller and have type-specific implementations of
+    // passArg: passArgI32(), etc.  Then those might be inlined, at least in PGO
+    // builds.
     //
     // The bulk of the work here (60%) is in the next() call, though.
     //
     // Notably, since next() is so expensive, stackArgAreaSize() becomes
     // expensive too.
     //
     // Somehow there could be a trick here where the sequence of
     // argument types (read from the input stream) leads to a cached
@@ -2958,17 +2965,18 @@ class BaseCompiler
     }
 #endif // I64_TO_FLOAT_CALLOUT
 
     void cmp64Set(Assembler::Condition cond, RegI64 lhs, RegI64 rhs, RegI32 dest) {
 #if defined(JS_CODEGEN_X64)
         masm.cmpq(rhs.reg.reg, lhs.reg.reg);
         masm.emitSet(cond, dest.reg);
 #elif defined(JS_CODEGEN_X86) || defined(JS_CODEGEN_ARM)
-        // TODO / OPTIMIZE: This is pretty branchy, we should be able to do better.
+        // TODO / OPTIMIZE (Bug 1316822): This is pretty branchy, we should be
+        // able to do better.
         Label done, condTrue;
         masm.branch64(cond, lhs.reg, rhs.reg, &condTrue);
         masm.move32(Imm32(0), dest.reg);
         masm.jump(&done);
         masm.bind(&condTrue);
         masm.move32(Imm32(1), dest.reg);
         masm.bind(&done);
 #else
@@ -3740,39 +3748,39 @@ BaseCompiler::emitAddI32()
         freeI32(r1);
         pushI32(r0);
     }
 }
 
 void
 BaseCompiler::emitAddI64()
 {
-    // TODO / OPTIMIZE: Ditto check for constant here
+    // TODO / OPTIMIZE: Ditto check for constant here (Bug 1316803)
     RegI64 r0, r1;
     pop2xI64(&r0, &r1);
     masm.add64(r1.reg, r0.reg);
     freeI64(r1);
     pushI64(r0);
 }
 
 void
 BaseCompiler::emitAddF64()
 {
-    // TODO / OPTIMIZE: Ditto check for constant here
+    // TODO / OPTIMIZE: Ditto check for constant here (Bug 1316803)
     RegF64 r0, r1;
     pop2xF64(&r0, &r1);
     masm.addDouble(r1.reg, r0.reg);
     freeF64(r1);
     pushF64(r0);
 }
 
 void
 BaseCompiler::emitAddF32()
 {
-    // TODO / OPTIMIZE: Ditto check for constant here
+    // TODO / OPTIMIZE: Ditto check for constant here (Bug 1316803)
     RegF32 r0, r1;
     pop2xF32(&r0, &r1);
     masm.addFloat32(r1.reg, r0.reg);
     freeF32(r1);
     pushF32(r0);
 }
 
 void
@@ -3813,28 +3821,28 @@ BaseCompiler::emitSubtractF64()
     masm.subDouble(r1.reg, r0.reg);
     freeF64(r1);
     pushF64(r0);
 }
 
 void
 BaseCompiler::emitMultiplyI32()
 {
-    // TODO / OPTIMIZE: Multiplication by constant is common (bug 1275442)
+    // TODO / OPTIMIZE: Multiplication by constant is common (Bug 1275442, 1316803)
     RegI32 r0, r1;
     pop2xI32ForIntMulDiv(&r0, &r1);
     masm.mul32(r1.reg, r0.reg);
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitMultiplyI64()
 {
-    // TODO / OPTIMIZE: Multiplication by constant is common (bug 1275442)
+    // TODO / OPTIMIZE: Multiplication by constant is common (Bug 1275442, 1316803)
     RegI64 r0, r1;
     RegI32 temp;
 #if defined(JS_CODEGEN_X64)
     // srcDest must be rax, and rdx will be clobbered.
     need2xI64(specific_rax, specific_rdx);
     r1 = popI64();
     r0 = popI64ToSpecific(specific_rax);
     freeI64(specific_rdx);
@@ -3872,67 +3880,67 @@ BaseCompiler::emitMultiplyF64()
     masm.mulDouble(r1.reg, r0.reg);
     freeF64(r1);
     pushF64(r0);
 }
 
 void
 BaseCompiler::emitQuotientI32()
 {
-    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two.
+    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two (Bug 1316803)
     RegI32 r0, r1;
     pop2xI32ForIntMulDiv(&r0, &r1);
 
     Label done;
     checkDivideByZeroI32(r1, r0, &done);
     checkDivideSignedOverflowI32(r1, r0, &done, ZeroOnOverflow(false));
     masm.quotient32(r1.reg, r0.reg, IsUnsigned(false));
     masm.bind(&done);
 
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitQuotientU32()
 {
-    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two.
+    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two (Bug 1316803)
     RegI32 r0, r1;
     pop2xI32ForIntMulDiv(&r0, &r1);
 
     Label done;
     checkDivideByZeroI32(r1, r0, &done);
     masm.quotient32(r1.reg, r0.reg, IsUnsigned(true));
     masm.bind(&done);
 
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitRemainderI32()
 {
-    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two.
+    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two (Bug 1316803)
     RegI32 r0, r1;
     pop2xI32ForIntMulDiv(&r0, &r1);
 
     Label done;
     checkDivideByZeroI32(r1, r0, &done);
     checkDivideSignedOverflowI32(r1, r0, &done, ZeroOnOverflow(true));
     masm.remainder32(r1.reg, r0.reg, IsUnsigned(false));
     masm.bind(&done);
 
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitRemainderU32()
 {
-    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two.
+    // TODO / OPTIMIZE: Fast case if lhs >= 0 and rhs is power of two (Bug 1316803)
     RegI32 r0, r1;
     pop2xI32ForIntMulDiv(&r0, &r1);
 
     Label done;
     checkDivideByZeroI32(r1, r0, &done);
     masm.remainder32(r1.reg, r0.reg, IsUnsigned(true));
     masm.bind(&done);
 
@@ -4031,33 +4039,34 @@ BaseCompiler::emitMaxI32()
 }
 
 void
 BaseCompiler::emitMinMaxI32(Assembler::Condition cond)
 {
     Label done;
     RegI32 r0, r1;
     pop2xI32(&r0, &r1);
-    // TODO / OPTIMIZE: Use conditional move on some platforms?
+    // TODO / OPTIMIZE (bug 1316823): Use conditional move on some platforms?
     masm.branch32(cond, r0.reg, r1.reg, &done);
     moveI32(r1, r0);
     masm.bind(&done);
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitMinF32()
 {
     RegF32 r0, r1;
     pop2xF32(&r0, &r1);
     if (!isCompilingAsmJS()) {
         // Convert signaling NaN to quiet NaNs.
-        // TODO / OPTIMIZE: Don't do this if one of the operands is known to
-        // be a constant.
+        //
+        // TODO / OPTIMIZE (bug 1316824): Don't do this if one of the operands
+        // is known to be a constant.
         ScratchF32 zero(*this);
         masm.loadConstantFloat32(0.f, zero);
         masm.subFloat32(zero, r0.reg);
         masm.subFloat32(zero, r1.reg);
     }
     masm.minFloat32(r1.reg, r0.reg, HandleNaNSpecially(true));
     freeF32(r1);
     pushF32(r0);
@@ -4065,17 +4074,18 @@ BaseCompiler::emitMinF32()
 
 void
 BaseCompiler::emitMaxF32()
 {
     RegF32 r0, r1;
     pop2xF32(&r0, &r1);
     if (!isCompilingAsmJS()) {
         // Convert signaling NaN to quiet NaNs.
-        // TODO / OPTIMIZE: see comment in emitMinF32.
+        //
+        // TODO / OPTIMIZE (bug 1316824): see comment in emitMinF32.
         ScratchF32 zero(*this);
         masm.loadConstantFloat32(0.f, zero);
         masm.subFloat32(zero, r0.reg);
         masm.subFloat32(zero, r1.reg);
     }
     masm.maxFloat32(r1.reg, r0.reg, HandleNaNSpecially(true));
     freeF32(r1);
     pushF32(r0);
@@ -4083,17 +4093,18 @@ BaseCompiler::emitMaxF32()
 
 void
 BaseCompiler::emitMinF64()
 {
     RegF64 r0, r1;
     pop2xF64(&r0, &r1);
     if (!isCompilingAsmJS()) {
         // Convert signaling NaN to quiet NaNs.
-        // TODO / OPTIMIZE: see comment in emitMinF32.
+        //
+        // TODO / OPTIMIZE (bug 1316824): see comment in emitMinF32.
         ScratchF64 zero(*this);
         masm.loadConstantDouble(0, zero);
         masm.subDouble(zero, r0.reg);
         masm.subDouble(zero, r1.reg);
     }
     masm.minDouble(r1.reg, r0.reg, HandleNaNSpecially(true));
     freeF64(r1);
     pushF64(r0);
@@ -4101,17 +4112,18 @@ BaseCompiler::emitMinF64()
 
 void
 BaseCompiler::emitMaxF64()
 {
     RegF64 r0, r1;
     pop2xF64(&r0, &r1);
     if (!isCompilingAsmJS()) {
         // Convert signaling NaN to quiet NaNs.
-        // TODO / OPTIMIZE: see comment in emitMinF32.
+        //
+        // TODO / OPTIMIZE (bug 1316824): see comment in emitMinF32.
         ScratchF64 zero(*this);
         masm.loadConstantDouble(0, zero);
         masm.subDouble(zero, r0.reg);
         masm.subDouble(zero, r1.reg);
     }
     masm.maxDouble(r1.reg, r0.reg, HandleNaNSpecially(true));
     freeF64(r1);
     pushF64(r0);
@@ -4231,17 +4243,17 @@ BaseCompiler::emitShlI32()
         freeI32(r1);
         pushI32(r0);
     }
 }
 
 void
 BaseCompiler::emitShlI64()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI64 r0, r1;
     pop2xI64ForShiftOrRotate(&r0, &r1);
     masm.lshift64(lowPart(r1), r0.reg);
     freeI64(r1);
     pushI64(r0);
 }
 
 void
@@ -4260,17 +4272,17 @@ BaseCompiler::emitShrI32()
         freeI32(r1);
         pushI32(r0);
     }
 }
 
 void
 BaseCompiler::emitShrI64()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI64 r0, r1;
     pop2xI64ForShiftOrRotate(&r0, &r1);
     masm.rshift64Arithmetic(lowPart(r1), r0.reg);
     freeI64(r1);
     pushI64(r0);
 }
 
 void
@@ -4289,82 +4301,82 @@ BaseCompiler::emitShrU32()
         freeI32(r1);
         pushI32(r0);
     }
 }
 
 void
 BaseCompiler::emitShrU64()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI64 r0, r1;
     pop2xI64ForShiftOrRotate(&r0, &r1);
     masm.rshift64(lowPart(r1), r0.reg);
     freeI64(r1);
     pushI64(r0);
 }
 
 void
 BaseCompiler::emitRotrI32()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI32 r0, r1;
     pop2xI32ForShiftOrRotate(&r0, &r1);
     masm.rotateRight(r1.reg, r0.reg, r0.reg);
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitRotrI64()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI64 r0, r1;
     pop2xI64ForShiftOrRotate(&r0, &r1);
     masm.rotateRight64(lowPart(r1), r0.reg, r0.reg, maybeHighPart(r1));
     freeI64(r1);
     pushI64(r0);
 }
 
 void
 BaseCompiler::emitRotlI32()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI32 r0, r1;
     pop2xI32ForShiftOrRotate(&r0, &r1);
     masm.rotateLeft(r1.reg, r0.reg, r0.reg);
     freeI32(r1);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitRotlI64()
 {
-    // TODO / OPTIMIZE: Constant rhs
+    // TODO / OPTIMIZE: Constant rhs (Bug 1316803)
     RegI64 r0, r1;
     pop2xI64ForShiftOrRotate(&r0, &r1);
     masm.rotateLeft64(lowPart(r1), r0.reg, r0.reg, maybeHighPart(r1));
     freeI64(r1);
     pushI64(r0);
 }
 
 void
 BaseCompiler::emitEqzI32()
 {
-    // TODO / OPTIMIZE: Boolean evaluation for control
+    // TODO / OPTIMIZE: Boolean evaluation for control (Bug 1286816)
     RegI32 r0 = popI32();
     masm.cmp32Set(Assembler::Equal, r0.reg, Imm32(0), r0.reg);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitEqzI64()
 {
-    // TODO / OPTIMIZE: Boolean evaluation for control
-    // TODO / OPTIMIZE: Avoid the temp register
+    // TODO / OPTIMIZE: Boolean evaluation for control (Bug 1286816)
+    // TODO / OPTIMIZE: Avoid the temp register (Bug 1316848)
     RegI64 r0 = popI64();
     RegI64 r1 = needI64();
     setI64(0, r1);
     RegI32 i0 = fromI64(r0);
     cmp64Set(Assembler::Equal, r0, r1, i0);
     freeI64(r1);
     freeI64Except(r0, i0);
     pushI32(i0);
@@ -4438,17 +4450,17 @@ BaseCompiler::emitBitNotI32()
     RegI32 r0 = popI32();
     masm.not32(r0.reg);
     pushI32(r0);
 }
 
 void
 BaseCompiler::emitAbsI32()
 {
-    // TODO / OPTIMIZE: Use conditional move on some platforms?
+    // TODO / OPTIMIZE (bug 1316823): Use conditional move on some platforms?
     Label nonnegative;
     RegI32 r0 = popI32();
     masm.branch32(Assembler::GreaterThanOrEqual, r0.reg, Imm32(0), &nonnegative);
     masm.neg32(r0.reg);
     masm.bind(&nonnegative);
     pushI32(r0);
 }
 
@@ -5075,19 +5087,19 @@ BaseCompiler::emitBrIf()
     if (!iter_.readBrIf(&relativeDepth, &type, &unused_value, &unused_condition))
         return false;
 
     if (deadCode_)
         return true;
 
     Control& target = controlItem(relativeDepth);
 
-    // TODO / OPTIMIZE: Optimize boolean evaluation for control by
-    // allowing a conditional expression to be left on the stack and
-    // reified here as part of the branch instruction.
+    // TODO / OPTIMIZE (Bug 1286816): Optimize boolean evaluation for control by
+    // allowing a conditional expression to be left on the stack and reified
+    // here as part of the branch instruction.
 
     // Don't use joinReg for rc
     maybeReserveJoinRegI(type);
 
     // Condition value is on top, always I32.
     RegI32 rc = popI32();
 
     maybeUnreserveJoinRegI(type);
@@ -5161,16 +5173,19 @@ BaseCompiler::emitBrTable()
     masm.branch32(Assembler::Below, rc.reg, Imm32(tableLength), &dispatchCode);
 
     // This is the out-of-range stub.  rc is dead here but we don't need it.
 
     popStackBeforeBranch(controlItem(defaultDepth).framePushed);
     masm.jump(controlItem(defaultDepth).label);
 
     // Emit stubs.  rc is dead in all of these but we don't need it.
+    //
+    // TODO / OPTIMIZE (Bug 1316804): Branch directly to the case code if we
+    // can, don't emit an intermediate stub.
 
     for (uint32_t i = 0; i < tableLength; i++) {
         PooledLabel* stubLabel = newLabel();
         // The labels in the vector are in the TempAllocator and will
         // be freed by and by.
         if (!stubLabel)
             return false;
         stubs.infallibleAppend(stubLabel);
@@ -5333,29 +5348,28 @@ BaseCompiler::pushReturned(const Functio
         pushF64(rv);
         break;
       }
       default:
         MOZ_CRASH("Function return type");
     }
 }
 
-// For now, always sync() at the beginning of the call to easily save
-// live values.
+// For now, always sync() at the beginning of the call to easily save live
+// values.
 //
-// TODO / OPTIMIZE: We may be able to avoid a full sync(), since all
-// we want is to save live registers that won't be saved by the callee
-// or that we need for outgoing args - we don't need to sync the
-// locals.  We can just push the necessary registers, it'll be like a
-// lightweight sync.
+// TODO / OPTIMIZE (Bug 1316806): We may be able to avoid a full sync(), since
+// all we want is to save live registers that won't be saved by the callee or
+// that we need for outgoing args - we don't need to sync the locals.  We can
+// just push the necessary registers, it'll be like a lightweight sync.
 //
-// Even some of the pushing may be unnecessary if the registers
-// will be consumed by the call, because then what we want is
-// parallel assignment to the argument registers or onto the stack
-// for outgoing arguments.  A sync() is just simpler.
+// Even some of the pushing may be unnecessary if the registers will be consumed
+// by the call, because then what we want is parallel assignment to the argument
+// registers or onto the stack for outgoing arguments.  A sync() is just
+// simpler.
 
 bool
 BaseCompiler::emitCall()
 {
     uint32_t lineOrBytecode = readCallSiteLineOrBytecode();
 
     uint32_t funcIndex;
     if (!iter_.readCall(&funcIndex))
@@ -5383,18 +5397,18 @@ BaseCompiler::emitCall()
 
     if (import)
         callImport(mg_.funcImportGlobalDataOffsets[funcIndex], baselineCall);
     else
         callDefinition(funcIndex, baselineCall);
 
     endCall(baselineCall);
 
-    // TODO / OPTIMIZE: It would be better to merge this freeStack()
-    // into the one in endCall, if we can.
+    // TODO / OPTIMIZE (bug 1316827): It would be better to merge this
+    // freeStack() into the one in endCall, if we can.
 
     popValueStackBy(numArgs);
     masm.freeStack(stackSpace);
 
     if (!IsVoid(sig.ret()))
         pushReturned(baselineCall, sig.ret());
 
     return true;
@@ -5452,18 +5466,18 @@ BaseCompiler::emitCallIndirect(bool oldS
 
     endCall(baselineCall);
 
     // For new style calls, the callee was popped off the compiler's
     // stack above.
 
     popValueStackBy(oldStyle ? numArgs + 1 : numArgs);
 
-    // TODO / OPTIMIZE: It would be better to merge this freeStack()
-    // into the one in endCall, if we can.
+    // TODO / OPTIMIZE (bug 1316827): It would be better to merge this
+    // freeStack() into the one in endCall, if we can.
 
     masm.freeStack(stackSpace);
 
     if (!IsVoid(sig.ret()))
         pushReturned(baselineCall, sig.ret());
 
     return true;
 }
@@ -5485,18 +5499,18 @@ BaseCompiler::emitCommonMathCall(uint32_
 
     if (!iter_.readCallReturn(retType))
       return false;
 
     builtinCall(callee, baselineCall);
 
     endCall(baselineCall);
 
-    // TODO / OPTIMIZE: It would be better to merge this freeStack()
-    // into the one in endCall, if we can.
+    // TODO / OPTIMIZE (bug 1316827): It would be better to merge this
+    // freeStack() into the one in endCall, if we can.
 
     popValueStackBy(numArgs);
     masm.freeStack(stackSpace);
 
     pushReturned(baselineCall, retType);
 
     return true;
 }
@@ -5959,18 +5973,18 @@ BaseCompiler::emitLoad(ValType type, Sca
 {
     LinearMemoryAddress<Nothing> addr;
     if (!iter_.readLoad(type, Scalar::byteSize(viewType), &addr))
         return false;
 
     if (deadCode_)
         return true;
 
-    // TODO / OPTIMIZE: Disable bounds checking on constant accesses
-    // below the minimum heap length.
+    // TODO / OPTIMIZE (bug 1316831): Disable bounds checking on constant
+    // accesses below the minimum heap length.
 
     MemoryAccessDesc access(viewType, addr.align, addr.offset, trapIfNotAsmJS());
 
     size_t temps = loadStoreTemps(access);
     RegI32 tmp1 = temps >= 1 ? needI32() : invalidI32();
     RegI32 tmp2 = temps >= 2 ? needI32() : invalidI32();
 
     switch (type) {
@@ -6042,18 +6056,18 @@ BaseCompiler::emitStore(ValType resultTy
     LinearMemoryAddress<Nothing> addr;
     Nothing unused_value;
     if (!iter_.readStore(resultType, Scalar::byteSize(viewType), &addr, &unused_value))
         return false;
 
     if (deadCode_)
         return true;
 
-    // TODO / OPTIMIZE: Disable bounds checking on constant accesses
-    // below the minimum heap length.
+    // TODO / OPTIMIZE (bug 1316831): Disable bounds checking on constant
+    // accesses below the minimum heap length.
 
     MemoryAccessDesc access(viewType, addr.align, addr.offset, trapIfNotAsmJS());
 
     size_t temps = loadStoreTemps(access);
     RegI32 tmp1 = temps >= 1 ? needI32() : invalidI32();
     RegI32 tmp2 = temps >= 2 ? needI32() : invalidI32();
 
     switch (resultType) {
@@ -6112,18 +6126,18 @@ BaseCompiler::emitTeeStore(ValType resul
     LinearMemoryAddress<Nothing> addr;
     Nothing unused_value;
     if (!iter_.readTeeStore(resultType, Scalar::byteSize(viewType), &addr, &unused_value))
         return false;
 
     if (deadCode_)
         return true;
 
-    // TODO / OPTIMIZE: Disable bounds checking on constant accesses
-    // below the minimum heap length.
+    // TODO / OPTIMIZE (bug 1316831): Disable bounds checking on constant
+    // accesses below the minimum heap length.
 
     MemoryAccessDesc access(viewType, addr.align, addr.offset, trapIfNotAsmJS());
 
     size_t temps = loadStoreTemps(access);
     RegI32 tmp1 = temps >= 1 ? needI32() : invalidI32();
     RegI32 tmp2 = temps >= 2 ? needI32() : invalidI32();
 
     switch (resultType) {
@@ -6244,24 +6258,24 @@ BaseCompiler::emitSelect()
     freeI32(rc);
 
     return true;
 }
 
 void
 BaseCompiler::emitCompareI32(JSOp compareOp, MCompare::CompareType compareType)
 {
-    // TODO / OPTIMIZE: if we want to generate good code for boolean
-    // operators for control it is possible to delay generating code
-    // here by pushing a compare operation on the stack, after all it
-    // is side-effect free.  The popping code for br_if will handle it
-    // differently, but other popI32() will just force code generation.
+    // TODO / OPTIMIZE (bug 1286816): if we want to generate good code for
+    // boolean operators for control it is possible to delay generating code
+    // here by pushing a compare operation on the stack, after all it is
+    // side-effect free.  The popping code for br_if will handle it differently,
+    // but other popI32() will just force code generation.
     //
-    // TODO / OPTIMIZE: Comparisons against constants using the same
-    // popConstant pattern as for add().
+    // TODO / OPTIMIZE (bug 1286816): Comparisons against constants using the
+    // same popConstant pattern as for add().
 
     MOZ_ASSERT(compareType == MCompare::Compare_Int32 || compareType == MCompare::Compare_UInt32);
     RegI32 r0, r1;
     pop2xI32(&r0, &r1);
     bool u = compareType == MCompare::Compare_UInt32;
     switch (compareOp) {
       case JSOP_EQ:
         masm.cmp32Set(Assembler::Equal, r0.reg, r1.reg, r0.reg);
@@ -6405,18 +6419,18 @@ BaseCompiler::emitTeeStoreWithCoercion(V
     LinearMemoryAddress<Nothing> addr;
     Nothing unused_value;
     if (!iter_.readTeeStore(resultType, Scalar::byteSize(viewType), &addr, &unused_value))
         return false;
 
     if (deadCode_)
         return true;
 
-    // TODO / OPTIMIZE: Disable bounds checking on constant accesses
-    // below the minimum heap length.
+    // TODO / OPTIMIZE (bug 1316831): Disable bounds checking on constant
+    // accesses below the minimum heap length.
 
     MemoryAccessDesc access(viewType, addr.align, addr.offset, trapIfNotAsmJS());
 
     size_t temps = loadStoreTemps(access);
     RegI32 tmp1 = temps >= 1 ? needI32() : invalidI32();
     RegI32 tmp2 = temps >= 2 ? needI32() : invalidI32();
 
     if (resultType == ValType::F32 && viewType == Scalar::Float64) {
@@ -6542,18 +6556,18 @@ BaseCompiler::emitBody()
 #define emitCalloutConversionOOM(doEmit, symbol, inType, outType) \
         iter_.readConversion(inType, outType, &unused_a) && \
             (deadCode_ || doEmit(symbol, inType, outType))
 
 #define CHECK(E)      if (!(E)) goto done
 #define NEXT()        continue
 #define CHECK_NEXT(E) if (!(E)) goto done; continue
 
-        // TODO / EVALUATE: Not obvious that this attempt at reducing
-        // overhead is really paying off relative to making the check
+        // TODO / EVALUATE (bug 1316845): Not obvious that this attempt at
+        // reducing overhead is really paying off relative to making the check
         // every iteration.
 
         if (overhead == 0) {
             // Check every 50 expressions -- a happy medium between
             // memory usage and checking overhead.
             overhead = 50;
 
             // Checking every 50 expressions should be safe, as the