Bug 1356215 - LUL: increase size of unwound stack to 160k. r=froydnj.
authorJulian Seward <jseward@acm.org>
Tue, 18 Apr 2017 10:30:14 +0200
changeset 353908 0510315b5641dbab766002a58b306eb8b880569c
parent 353907 58c4711b8ca2cf89a74368f93c93d8fcd2b5c6f0
child 353909 2eca4618f579c77f8fd3f5303b19b999fcc3a177
push id31681
push userkwierso@gmail.com
push dateThu, 20 Apr 2017 00:11:50 +0000
treeherdermozilla-central@20dff607fb88 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersfroydnj
bugs1356215
milestone55.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1356215 - LUL: increase size of unwound stack to 160k. r=froydnj. For reasons related to the architecture of the Gecko Profiler in previous years, which are no longer relevant, LUL will only unwind through the first 32KB of stack. This is mostly harmless, since most stacks are smaller than 4KB, per measurements today, but occasionally they go above 32KB, causing unwinding to stop prematurely. This patch changes the max size to 160KB, and documents the rationale for copying the stack and unwinding, rather than unwinding in place. 160KB is big enough for all stacks observed in several minutes of profiling all threads at 1KHz.
tools/profiler/core/platform.cpp
tools/profiler/lul/LulMain.h
--- a/tools/profiler/core/platform.cpp
+++ b/tools/profiler/core/platform.cpp
@@ -941,16 +941,42 @@ DoNativeBacktrace(PS::LockRef aLock, Pro
 # error "Unknown plat"
 #endif
 
   // Copy up to N_STACK_BYTES from rsp-REDZONE upwards, but not going past the
   // stack's registered top point.  Do some basic sanity checks too.  This
   // assumes that the TaggedUWord holding the stack pointer value is valid, but
   // it should be, since it was constructed that way in the code just above.
 
+  // We could construct |stackImg| so that LUL reads directly from the stack in
+  // question, rather than from a copy of it.  That would reduce overhead and
+  // space use a bit.  However, it gives a problem with dynamic analysis tools
+  // (ASan, TSan, Valgrind) which is that such tools will report invalid or
+  // racing memory accesses, and such accesses will be reported deep inside LUL.
+  // By taking a copy here, we can either sanitise the copy (for Valgrind) or
+  // copy it using an unchecked memcpy (for ASan, TSan).  That way we don't have
+  // to try and suppress errors inside LUL.
+  //
+  // N_STACK_BYTES is set to 160KB.  This is big enough to hold all stacks
+  // observed in some minutes of testing, whilst keeping the size of this
+  // function (DoNativeBacktrace)'s frame reasonable.  Most stacks observed in
+  // practice are small, 4KB or less, and so the copy costs are insignificant
+  // compared to other profiler overhead.
+  //
+  // |stackImg| is allocated on this (the sampling thread's) stack.  That
+  // implies that the frame for this function is at least N_STACK_BYTES large.
+  // In general it would be considered unacceptable to have such a large frame
+  // on a stack, but it only exists for the unwinder thread, and so is not
+  // expected to be a problem.  Allocating it on the heap is troublesome because
+  // this function runs whilst the sampled thread is suspended, so any heap
+  // allocation risks deadlock.  Allocating it as a global variable is not
+  // thread safe, which would be a problem if we ever allow multiple sampler
+  // threads.  Hence allocating it on the stack seems to be the least-worst
+  // option.
+
   lul::StackImage stackImg;
 
   {
 #if defined(GP_PLAT_amd64_linux)
     uintptr_t rEDZONE_SIZE = 128;
     uintptr_t start = startRegs.xsp.Value() - rEDZONE_SIZE;
 #elif defined(GP_PLAT_arm_android)
     uintptr_t rEDZONE_SIZE = 0;
--- a/tools/profiler/lul/LulMain.h
+++ b/tools/profiler/lul/LulMain.h
@@ -153,22 +153,23 @@ struct UnwindRegs {
   TaggedUWord xsp;
   TaggedUWord xip;
 #else
 # error "Unknown plat"
 #endif
 };
 
 
-// The maximum number of bytes in a stack snapshot.  This can be
-// increased if necessary, but larger values cost performance, since a
-// stack snapshot needs to be copied between sampling and worker
-// threads for each snapshot.  In practice 32k seems to be enough
-// to get good backtraces.
-static const size_t N_STACK_BYTES = 32768;
+// The maximum number of bytes in a stack snapshot.  This value can be increased
+// if necessary, but testing showed that 160k is enough to obtain good
+// backtraces on x86_64 Linux.  Most backtraces fit comfortably into 4-8k of
+// stack space, but we do have some very deep stacks occasionally.  Please see
+// the comments in DoNativeBacktrace as to why it's OK to have this value be so
+// large.
+static const size_t N_STACK_BYTES = 160*1024;
 
 // The stack chunk image that will be unwound.
 struct StackImage {
   // [start_avma, +len) specify the address range in the buffer.
   // Obviously we require 0 <= len <= N_STACK_BYTES.
   uintptr_t mStartAvma;
   size_t    mLen;
   uint8_t   mContents[N_STACK_BYTES];