Bug 674384: jprof - better thread support, support e10s profiling and profile-by-capture-section. r=dbaron DONTBUILD
authorRandell Jesup <rjesup@wgate.com>
Fri, 12 Aug 2011 11:59:20 -0400
changeset 74310 997256d4e070a07ea648ad5f554ae9302651ab01
parent 74309 71c422d27ed4ea3c3951ae23b134318310c8682e
child 74311 f1113713ce039ac0f0af397962d58d9b9674bb27
push id20974
push userrjesup@wgate.com
push dateFri, 12 Aug 2011 16:12:04 +0000
treeherdermozilla-central@997256d4e070 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersdbaron
bugs674384
milestone8.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 674384: jprof - better thread support, support e10s profiling and profile-by-capture-section. r=dbaron DONTBUILD
toolkit/xre/nsAppRunner.cpp
tools/jprof/README.html
tools/jprof/leaky.cpp
tools/jprof/leaky.h
tools/jprof/stub/libmalloc.cpp
--- a/toolkit/xre/nsAppRunner.cpp
+++ b/toolkit/xre/nsAppRunner.cpp
@@ -1553,17 +1553,22 @@ int OS2LaunchChild(const char *aExePath,
 // it was initially started with.
 static nsresult LaunchChild(nsINativeAppSupport* aNative,
                             PRBool aBlankCommandLine = PR_FALSE)
 {
   aNative->Quit(); // release DDE mutex, if we're holding it
 
   // Restart this process by exec'ing it into the current process
   // if supported by the platform.  Otherwise, use NSPR.
- 
+
+#ifdef MOZ_JPROF
+  // make sure JPROF doesn't think we're E10s
+  unsetenv("JPROF_SLAVE");
+#endif
+
   if (aBlankCommandLine) {
 #if defined(MOZ_WIDGET_QT)
     // Remove only arguments not given to Qt
     gRestartArgc = gQtOnlyArgc;
     gRestartArgv = gQtOnlyArgv;
 #else
     gRestartArgc = 1;
     gRestartArgv[gRestartArgc] = nsnull;
--- a/tools/jprof/README.html
+++ b/tools/jprof/README.html
@@ -43,28 +43,35 @@ making sure that you do <strong>not</str
 <code>--enable-strip</code> configure option set -- jprof needs symbols to
 operate.  On many architectures with GCC, you'll need to add
 <code>--enable-optimize="-O3 -fno-omit-frame-pointer"</code> or the
 equivalent to ensure frame pointer generation in the compiler you're using.</p>
 
 <p>Finally, build mozilla with your new configuration.  Now you can run jprof.</p>
 
 <h3><a name="usage">Usage</a></h3>
+<pre> jprof [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]</pre>
 Options:
 <ul>
   <li><b>-s depth</b> : Limit depth looked at from captured stack
       frames</li>
   <li><b>-v</b> : Output some information about the symbols, memory map, etc.</li>
-  <li><b>-t</b> : Group output according to thread.  Requires external
+  <li><b>-t or --threads</b> : Group output according to thread.  May require external
       LD_PRELOAD library to help force sampling of spawned threads; jprof
-       normally captures the main thread only.  See <a
+       may capture the main thread only.  See <a
        href="http://sam.zoy.org/writings/programming/gprof.html">gprof-helper</a>;
        it may need adaption for jprof.</li>
+  <li><b>--only-thread id</b> : Only output data for thread 'id'</li>
   <li><b>-e exclusion</b> : Allows excluding specific stack frames</li>
   <li><b>-i inclusion</b> : Allows including specific stack frames</li>
+  <li><b>--last</b> : Only process data from the last 'section' of sampling
+      (starting at the last PROF)</li>
+  <li><b>--start N</b> : Start processing data at 'section' N </li>
+  <li><b>--end N</b> : Stop processing data at 'section' N </li>
+  <li><b>--output-dir dir</b> : Store generated .html files in the given directory </li>
 </ul>
 The behavior of jprof is determined by the value of the JPROF_FLAGS environment
 variable.  This environment variable can be composed of several substrings
 which have the following meanings:
 <ul>
     <li> <b>JP_START</b> : Install the signal handler, and start sending the
     timer signals.
     
@@ -90,16 +97,22 @@ which have the following meanings:
     using its own timer.  This option, like JP_REALTIME, uses intervals of real
     time.  This option overrides JP_PERIOD.  <code>freq</code> is the frequency
     at which the timer should fire, measured in Hz. It must be a power of 2.
     The maximal frequency allowed by the kernel can be changed by writing to
     <code>/proc/sys/dev/rtc/max-user-freq</code>; the maximum value it can be
     set to is 8192.  Note that <code>/dev/rtc</code> will need to be readable
     by the Firefox process; making that file world-readable is a simple way to
     accomplish that.
+
+    <li> <b>JP_FILENAME=basefilename</b> : This is the filename used for
+    saving the log files to; the default is "jprof-log".  If Electrolysis
+    is used, each process after the first will have the process ID
+    added ("jprof-log-3212");
+
 </ul>
 
 <h4>Starting and stopping jprof from JavaScript</h4>
 <p>
 A build with jprof enabled adds two functions to the Window object:
 <code>JProfStartProfiling()</code> and <code>JProfStopProfiling()</code>.  When used with JP_DEFER, these
 allow one to start and stop the timer just around whatever critical section is
 being profiled
@@ -144,22 +157,29 @@ depending on the mode) can be used to co
 <h4>Looking at the results</h4>
 
 Now that we have <code>jprof-log</code> and <code>jprof-map</code> files, we
 can use the jprof executable is used to turn them into readable output.  To do
 this jprof needs the name of the mozilla binary and the log file.  It deduces
 the name of the map file:
 
 <pre>
-  ./jprof /home/user/mozilla/debug/dist/bin/mozilla-bin ./jprof-log > tmp.html
+  ./jprof /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log > tmp.html
 </pre>
 
 This will generate the file <code>tmp.html</code> which you should view in a
 web browser.
 
+<pre>
+  ./jprof --output-dir=/tmp /home/user/mozilla/objdir/dist/bin/firefox ./jprof-log*
+</pre>
+
+This will generate a set of files in /tmp for each process.
+
+
 <h3><a name="interpretation">Interpretation</a></h3>
 
 
 The Jprof output is split into a flat portion and a hierarchical portion.
 There are links to each section at the top of the page.  It is typically
 easier to analyze the profile by starting with the flat output and following
 the links contained in the flat output up to the hierarchical output.
 
--- a/tools/jprof/leaky.cpp
+++ b/tools/jprof/leaky.cpp
@@ -67,17 +67,40 @@ static const u_int MaxBuckets = 1000003;
 
 //----------------------------------------------------------------------
 
 int main(int argc, char** argv)
 {
   leaky* l = new leaky;
 
   l->initialize(argc, argv);
-  l->open();
+  l->outputfd = stdout;
+
+  for (int i = 0; i < l->numLogFiles; i++) {
+    if (l->output_dir || l->numLogFiles > 1) {
+      char name[2048]; // XXX fix
+      if (l->output_dir)
+        snprintf(name,sizeof(name),"%s/%s.html",l->output_dir,argv[l->logFileIndex + i]);
+      else
+        snprintf(name,sizeof(name),"%s.html",argv[l->logFileIndex + i]);
+
+      fprintf(stderr,"opening %s\n",name);
+      l->outputfd = fopen(name,"w");
+      // if an error we won't process the file
+    }
+    if (l->outputfd) { // paranoia
+      l->open(argv[l->logFileIndex + i]);
+
+      if (l->outputfd != stderr) {
+        fclose(l->outputfd);
+        l->outputfd = NULL;
+      }
+    }
+  }
+
   return 0;
 }
 
 char *
 htmlify(const char *in)
 {
   const char *p = in;
   char *out, *q;
@@ -120,22 +143,21 @@ htmlify(const char *in)
   *q = '\0';
 
   return out;
 }
 
 leaky::leaky()
 {
   applicationName = NULL;
-  logFile = NULL;
   progFile = NULL;
 
-  quiet = TRUE;
-  showAddress = FALSE;
-  showThreads = FALSE;
+  quiet = true;
+  showAddress = false;
+  showThreads = false;
   stackDepth = 100000;
   onlyThread = 0;
 
   mappedLogFile = -1;
   firstLogEntry = lastLogEntry = 0;
 
   sfd = -1;
   externalSymbols = 0;
@@ -152,66 +174,76 @@ leaky::leaky()
 }
 
 leaky::~leaky()
 {
 }
 
 void leaky::usageError()
 {
-  fprintf(stderr, "Usage: %s [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] prog log\n", (char*) applicationName);
+  fprintf(stderr, "Usage: %s [-v] [-t] [-e exclude] [-i include] [-s stackdepth] [--last] [--all] [--start n [--end m]] [--output-dir dir] prog log [log2 ...]\n", (char*) applicationName);
   fprintf(stderr, 
           "\t-v: verbose\n"
           "\t-t | --threads: split threads\n"
           "\t--only-thread n: only profile thread N\n"
           "\t-i include-id: stack must include specified id\n"
           "\t-e exclude-id: stack must NOT include specified id\n"
+          "\t-s stackdepth: Limit depth looked at from captured stack frames\n"
           "\t--last: only profile the last capture section\n"
           "\t--start n [--end m]: profile n to m (or end) capture sections\n"
+          "\t--output-dir dir: write output files to dir\n"
+          "\tIf there's one log, output goes to stdout unless --output-dir is set\n"
+          "\tIf there are more than one log, output files will be named with .html added\n"
           );
   exit(-1);
 }
 
 static struct option longopts[] = {
     { "threads", 0, NULL, 't' },
     { "only-thread", 1, NULL, 'T' },
     { "last", 0, NULL, 'l' },
     { "start", 1, NULL, 'x' },
     { "end", 1, NULL, 'n' },
+    { "output-dir", 1, NULL, 'd' },
     { NULL, 0, NULL, 0 },
 };
 
 void leaky::initialize(int argc, char** argv)
 {
   applicationName = argv[0];
   applicationName = strrchr(applicationName, '/');
   if (!applicationName) {
     applicationName = argv[0];
   } else {
     applicationName++;
   }
 
   int arg;
   int errflg = 0;
   int longindex = 0;
+
   onlyThread = 0;
+  output_dir = NULL;
+
   // XXX tons of cruft here left over from tracemalloc
+  // XXX The -- options shouldn't need short versions, or they should be documented
   while (((arg = getopt_long(argc, argv, "adEe:gh:i:r:Rs:tT:qvx:ln:",longopts,&longindex)) != -1)) {
     switch (arg) {
       case '?':
       default:
         fprintf(stderr,"error: unknown option %c\n",optopt);
 	errflg++;
 	break;
       case 'a':
 	break;
       case 'A': // not implemented
-	showAddress = TRUE;
+	showAddress = true;
 	break;
       case 'd':
+        output_dir = optarg; // reference to an argv pointer
 	break;
       case 'R':
 	break;
       case 'e':
 	exclusions.add(optarg);
 	break;
       case 'g':
 	break;
@@ -248,29 +280,32 @@ void leaky::initialize(int argc, char** 
         collect_last = true;
         break;
       case 'q':
         break;
       case 'v':
         quiet = !quiet;
         break;
       case 't':
-        showThreads = TRUE;
+        showThreads = true;
 	break;
       case 'T':
-        showThreads = TRUE;
+        showThreads = true;
         onlyThread = atoi(optarg);
 	break;
     }
   }
   if (errflg || ((argc - optind) < 2)) {
     usageError();
   }
   progFile = argv[optind++];
-  logFile = argv[optind];
+  logFileIndex = optind;
+  numLogFiles  = argc - optind;
+  if (!quiet)
+    fprintf(stderr,"numlogfiles = %d\n",numLogFiles);
 }
 
 static void* mapFile(int fd, u_int flags, off_t* sz)
 {
   struct stat sb;
   if (fstat(fd, &sb) < 0) {
     perror("fstat");
     exit(-1);
@@ -284,53 +319,59 @@ static void* mapFile(int fd, u_int flags
   return base;
 }
 
 void leaky::LoadMap()
 {
   malloc_map_entry mme;
   char name[1000];
 
-  int fd = ::open(M_MAPFILE, O_RDONLY);
-  if (fd < 0) {
-    perror("open: " M_MAPFILE);
-    exit(-1);
+  if (!loadMap) {
+    // all files use the same map
+    int fd = ::open(M_MAPFILE, O_RDONLY);
+    if (fd < 0) {
+      perror("open: " M_MAPFILE);
+      exit(-1);
+    }
+    for (;;) {
+      int nb = read(fd, &mme, sizeof(mme));
+      if (nb != sizeof(mme)) break;
+      nb = read(fd, name, mme.nameLen);
+      if (nb != (int)mme.nameLen) break;
+      name[mme.nameLen] = 0;
+      if (!quiet) {
+        fprintf(stderr,"%s @ %lx\n", name, mme.address);
+      }
+
+      LoadMapEntry* lme = new LoadMapEntry;
+      lme->address = mme.address;
+      lme->name = strdup(name);
+      lme->next = loadMap;
+      loadMap = lme;
+    }
+    close(fd);
   }
-  for (;;) {
-    int nb = read(fd, &mme, sizeof(mme));
-    if (nb != sizeof(mme)) break;
-    nb = read(fd, name, mme.nameLen);
-    if (nb != (int)mme.nameLen) break;
-    name[mme.nameLen] = 0;
-    if (!quiet) {
-      fprintf(stderr,"%s @ %lx\n", name, mme.address);
-    }
-
-    LoadMapEntry* lme = new LoadMapEntry;
-    lme->address = mme.address;
-    lme->name = strdup(name);
-    lme->next = loadMap;
-    loadMap = lme;
-  }
-  close(fd);
 }
 
-void leaky::open()
+void leaky::open(char *logFile)
 {
   int threadArray[100]; // should auto-expand
   int last_thread = -1;
   int numThreads = 0;
   int section = -1;
   bool collecting = false;
 
   LoadMap();
 
   setupSymbols(progFile);
 
   // open up the log file
+  if (mappedLogFile)
+    ::close(mappedLogFile);
+
   mappedLogFile = ::open(logFile, O_RDONLY);
   if (mappedLogFile < 0) {
     perror("open");
     exit(-1);
   }
   off_t size;
   firstLogEntry = (malloc_log_entry*) mapFile(mappedLogFile, PROT_READ, &size);
   lastLogEntry = (malloc_log_entry*)((char*)firstLogEntry + size);
@@ -355,18 +396,19 @@ void leaky::open()
       if (collect_start == section) {
         collecting = true;
         firstLogEntry = lep;
       }
       if (collect_end == section) {
         collecting = false;
         lastLogEntry = lep;
       }
-      fprintf(stderr,"New section %d: first=%x, last=%x, collecting=%d\n",
-              section,firstLogEntry,lastLogEntry,collecting);
+      if (!quiet)
+        fprintf(stderr,"New section %d: first=%p, last=%p, collecting=%d\n",
+                section,(void*)firstLogEntry,(void*)lastLogEntry,collecting);
     }
 
     // Capture thread info at the same time
 
     // Find all the threads captured
 
     // pthread/linux docs say the signal can be delivered to any thread in
     // the process.  In practice, it appears in Linux that it's always
@@ -388,48 +430,47 @@ void leaky::open()
           threadArray[i] = lep->thread;
           numThreads++;
           if (!quiet)
             fprintf(stderr,"new thread %d\n",lep->thread);
         }
       }
     }
   }  
-  fprintf(stderr,"Done collecting: sections %d: first=%x, last=%x, numThreads=%d\n",
-          section,firstLogEntry,lastLogEntry,numThreads);
+  if (!quiet)
+    fprintf(stderr,"Done collecting: sections %d: first=%p, last=%p, numThreads=%d\n",
+            section,(void*)firstLogEntry,(void*)lastLogEntry,numThreads);
 
-  fprintf(stdout,"<html><head><title>Jprof Profile Report</title></head><body>\n");
-  fprintf(stdout,"<h1><center>Jprof Profile Report</center></h1>\n");
+  fprintf(outputfd,"<html><head><title>Jprof Profile Report</title></head><body>\n");
+  fprintf(outputfd,"<h1><center>Jprof Profile Report</center></h1>\n");
 
   if (showThreads)
   {
     fprintf(stderr,"Num threads %d\n",numThreads);
 
-    fprintf(stdout,"<hr>Threads:<p><pre>\n");
+    fprintf(outputfd,"<hr>Threads:<p><pre>\n");
     for (int i=0; i<numThreads; i++)
     {
-      fprintf(stdout,"   <a href=\"#thread_%d\">%d</a>  ",
+      fprintf(outputfd,"   <a href=\"#thread_%d\">%d</a>  ",
               threadArray[i],threadArray[i]);
     }
-    fprintf(stdout,"</pre>");
+    fprintf(outputfd,"</pre>");
 
     for (int i=0; i<numThreads; i++)
     {
       if (!onlyThread || onlyThread == threadArray[i])
         analyze(threadArray[i]);
     }
   }
   else
   {
     analyze(0);
   }
 
-  fprintf(stdout,"</pre></body></html>\n");
-
-  exit(0);
+  fprintf(outputfd,"</pre></body></html>\n");
 }
 
 //----------------------------------------------------------------------
 
 
 static int symbolOrder(void const* a, void const* b)
 {
   Symbol const* ap = (Symbol const *)a;
@@ -444,30 +485,34 @@ void leaky::ReadSharedLibrarySymbols()
   while (NULL != lme) {
     ReadSymbols(lme->name, lme->address);
     lme = lme->next;
   }
 }
 
 void leaky::setupSymbols(const char *fileName)
 {
-  // Read in symbols from the program
-  ReadSymbols(fileName, 0);
+  if (usefulSymbols == 0) {
+    // only read once!
 
-  // Read in symbols from the .so's
-  ReadSharedLibrarySymbols();
+    // Read in symbols from the program
+    ReadSymbols(fileName, 0);
+
+    // Read in symbols from the .so's
+    ReadSharedLibrarySymbols();
 
-  if (!quiet) {
-    fprintf(stderr,"A total of %d symbols were loaded\n", usefulSymbols);
-  }
+    if (!quiet) {
+      fprintf(stderr,"A total of %d symbols were loaded\n", usefulSymbols);
+    }
 
-  // Now sort them
-  qsort(externalSymbols, usefulSymbols, sizeof(Symbol), symbolOrder);
-  lowestSymbolAddr = externalSymbols[0].address;
-  highestSymbolAddr = externalSymbols[usefulSymbols-1].address;
+    // Now sort them
+    qsort(externalSymbols, usefulSymbols, sizeof(Symbol), symbolOrder);
+    lowestSymbolAddr = externalSymbols[0].address;
+    highestSymbolAddr = externalSymbols[usefulSymbols-1].address;
+  }
 }
 
 // Binary search the table, looking for a symbol that covers this
 // address.
 int leaky::findSymbolIndex(u_long addr)
 {
   u_int base = 0;
   u_int limit = usefulSymbols - 1;
@@ -563,17 +608,17 @@ void leaky::displayStackTrace(FILE* out,
   }
   fputc('\n', out);
 }
 
 void leaky::dumpEntryToLog(malloc_log_entry* lep)
 {
   printf("%ld\t", lep->delTime);
   printf(" --> ");
-  displayStackTrace(stdout, lep);
+  displayStackTrace(outputfd, lep);
 }
 
 void leaky::generateReportHTML(FILE *fp, int *countArray, int count, int thread)
 {
   fprintf(fp,"<center>");
   if (showThreads)
   {
     fprintf(fp,"<hr><A NAME=thread_%d><b>Thread: %d</b></A><p>",
@@ -769,17 +814,17 @@ void leaky::analyze(int thread)
     }
 
     // idx should be the function that we were in when we received the signal.
     if(idx>=0) {
       ++externalSymbols[idx].timerHit;
     }
   }
 
-  generateReportHTML(stdout, countArray, stacks, thread);
+  generateReportHTML(outputfd, countArray, stacks, thread);
 }
 
 void FunctionCount::printReport(FILE *fp, leaky *lk, int parent, int total)
 {
     const char *fmt = "                      <A href=\"#%d\">%8d (%3.1f%%)%s %s</A>%s\n";
 
     int nmax, tmax=((~0U)>>1);
     
--- a/tools/jprof/leaky.h
+++ b/tools/jprof/leaky.h
@@ -78,27 +78,30 @@ struct LoadMapEntry {
   LoadMapEntry* next;
 };
 
 struct leaky {
   leaky();
   ~leaky();
 
   void initialize(int argc, char** argv);
-  void open();
+  void open(char *arg);
 
   char*  applicationName;
-  char*  logFile;
+  int    logFileIndex;
+  int    numLogFiles;
   char*  progFile;
+  FILE*  outputfd;
 
-  int   quiet;
-  int   showAddress;
-  int   showThreads;
-  u_int  stackDepth;
+  bool  quiet;
+  bool  showAddress;
+  bool  showThreads;
+  u_int stackDepth;
   int   onlyThread;
+  char* output_dir;
 
   int   mappedLogFile;
   malloc_log_entry* firstLogEntry;
   malloc_log_entry* lastLogEntry;
 
   int    stacks;
 
   int sfd;
--- a/tools/jprof/stub/libmalloc.cpp
+++ b/tools/jprof/stub/libmalloc.cpp
@@ -394,16 +394,18 @@ void *ucontext)
 
     if (!rtcHz)
         startSignalCounter(timerMiliSec);
 }
 
 NS_EXPORT_(void) setupProfilingStuff(void)
 {
     static int gFirstTime = 1;
+    char filename[2048]; // XXX fix
+
     if(gFirstTime && !(gFirstTime=0)) {
 	int  startTimer = 1;
 	int  doNotStart = 1;
 	int  firstDelay = 0;
         int  append = O_TRUNC;
         char *tst  = getenv("JPROF_FLAGS");
 
 	/* Options from JPROF_FLAGS environment variable:
@@ -415,16 +417,20 @@ NS_EXPORT_(void) setupProfilingStuff(voi
 	 *   JP_FIRST  -> Extra delay before starting
 	 *   JP_REALTIME -> Take stack traces in intervals of real time
 	 *               rather than time used by the process (and the
 	 *               system for the process).  This is useful for
 	 *               finding time spent by the X server.
          *   JP_APPEND -> Append to jprof-log rather than overwriting it.
          *               This is somewhat risky since it depends on the
          *               address map staying constant across multiple runs.
+         *   JP_FILENAME -> base filename to use when saving logs.  Note that
+         *               this does not affect the mapfile.
+         *
+         * JPROF_SLAVE is set if this is not the first process.
 	*/
 	if(tst) {
 	    if(strstr(tst, "JP_DEFER"))
 	    {
 		doNotStart = 0;
 		startTimer = 0;
 	    }
 	    if(strstr(tst, "JP_START")) doNotStart = 0;
@@ -465,22 +471,39 @@ NS_EXPORT_(void) setupProfilingStuff(voi
                     rtcHz = 2048;
                 }
 #else
                 fputs("JP_RTC_HZ found, but RTC profiling only supported on "
                       "Linux!\n", stderr);
                   
 #endif
             }
+            char *f = strstr(tst,"JP_FILENAME=");
+            if (f)
+                f = f + strlen("JP_FILENAME=");
+            else
+                f = M_LOGFILE;
+
+            char *is_slave = getenv("JPROF_SLAVE");
+            if (!is_slave)
+                setenv("JPROF_SLAVE","", 0);
+
+            int pid = syscall(SYS_gettid); //gettid();
+            if (is_slave)
+                snprintf(filename,sizeof(filename),"%s-%d",f,pid);
+            else
+                snprintf(filename,sizeof(filename),"%s",f);
+
+            // XXX FIX! inherit current capture state!
 	}
 
 	if(!doNotStart) {
 
 	    if(gLogFD<0) {
-		gLogFD = open(M_LOGFILE, O_CREAT | O_WRONLY | append, 0666);
+		gLogFD = open(filename, O_CREAT | O_WRONLY | append, 0666);
 		if(gLogFD<0) {
 		    fprintf(stderr, "Unable to create " M_LOGFILE);
 		    perror(":");
 		} else {
 		    struct sigaction action;
 		    sigset_t mset;
 
 		    // Dump out the address map when we terminate