NO BUG - Expand build system documentation DONTBUILD (NPOTB)
authorGregory Szorc <gps@mozilla.com>
Mon, 23 Sep 2013 17:21:10 -0700
changeset 162147 b8cebe1d71db7d666c3d1a9123fda7545edbc982
parent 162146 8b28b4bed72cef438095abde2d77e2234804ea5e
child 162148 28e5d67b6b5a33a00c95ac52fb0dfe9ca61d3500
push id3066
push userakeybl@mozilla.com
push dateMon, 09 Dec 2013 19:58:46 +0000
treeherdermozilla-beta@a31a0dce83aa [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
milestone27.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
NO BUG - Expand build system documentation DONTBUILD (NPOTB)
build/docs/build-overview.rst
build/docs/conf.py
build/docs/environment-variables.rst
build/docs/glossary.rst
build/docs/index.rst
build/docs/mozconfigs.rst
build/docs/slow.rst
new file mode 100644
--- /dev/null
+++ b/build/docs/build-overview.rst
@@ -0,0 +1,108 @@
+.. _build_overview:
+
+=====================
+Build System Overview
+=====================
+
+This document provides an overview on how the build system works. It is
+targeted at people wanting to learn about internals of the build system.
+It is not meant for persons who casually interact with the build system.
+That being said, knowledge empowers, so consider reading on.
+
+The build system is composed of many different components working in
+harmony to build the source tree. We begin with a graphic overview.
+
+.. graphviz::
+
+   digraph build_components {
+      rankdir="LR";
+      "configure" -> "config.status" -> "build backend" -> "build output"
+   }
+
+Phase 1: Configuration
+======================
+
+Phase 1 centers around the configure script, which is a bash shell script.
+The file is generated from a file called configure.in which is written in M4
+and processed using Autoconf 2.13 to create the final configure script.
+You don't have to worry about how you obtain a configure file: the build system
+does this for you.
+
+The primary job of configure is to determine characteristics of the system and
+compiler, apply options passed into it, and validate everything looks OK to
+build. The primary output of the configure script is an executable file in the
+object directory called config.status. configure also produces some additional
+files (like autoconf.mk). However, the most important file in terms of
+architecture is config.status.
+
+The existence of a config.status file may be familiar to those who have worked
+with Autoconf before. However, Mozilla's config.status is different from almost
+any other config.status you've ever seen: it's written in Python! Instead of
+having our configure script produce a shell script, we have it generating Python.
+
+Now is as good a time as any to mention that Python is prevalent in our build
+system. If we need to write code for the build system, we do it in Python.
+That's just how we roll.
+
+config.status contains 2 parts: data structures representing the output of
+configure and a command-line interface for preparing/configuring/generating
+an appropriate build backend. (A build backend is merely a tool used to build
+the tree - like GNU Make or Tup). These data structures essentially describe
+the current state of the system and what the existing build configuration looks
+like. For example, it defines which compiler to use, how to invoke it, which
+application features are enabled, etc. You are encouraged to open up
+config.status to have a look for yourself!
+
+Once we have emitted a config.status file, we pass into the realm of phase 2.
+
+Phase 2: Build Backend Preparation and the Build Definition
+===========================================================
+
+Once configure has determined what the current build configuration is, we need
+to apply this to the source tree so we can actually build.
+
+What essentially happens is the automatically-produced config.status Python
+script is executed as soon as configure has generated it. config.status is charged
+with the task of tell a tool had to build the tree. To do this, config.status
+must first scan the build system definition.
+
+The build system definition consists of various moz.build files in the tree.
+There is roughly one moz.build file per directory or pet set of related directories.
+Each moz.build files defines how its part of the build config works. For example it
+says I want these C++ files compiled or look for additional information in these
+directories. config.status starts with the main moz.build file and then recurses
+into all referenced files and directories. As the moz.build files are read, data
+structures describing the overall build system definition are emitted. These data
+structures are then read by a build backend generator which then converts them
+into files, function calls, etc. In the case of a `make` backend, the generator
+writes out Makefiles.
+
+When config.status runs, you'll see the following output::
+
+   Reticulating splines...
+   Finished reading 1096 moz.build files into 1276 descriptors in 2.40s
+   Backend executed in 2.39s
+   2188 total backend files. 0 created; 1 updated; 2187 unchanged
+   Total wall time: 5.03s; CPU time: 3.79s; Efficiency: 75%
+
+What this is saying is that a total of 1096 moz.build files were read. Altogether,
+1276 data structures describing the build configuration were derived from them.
+It took 2.40s wall time to just read these files and produce the data structures.
+The 1276 data structures were fed into the build backend which then determined it
+had to manage 2188 files derived from those data structures. Most of them
+already existed and didn't need changed. However, 1 was updated as a result of
+the new configuration. The whole process took 5.03s. Although, only 3.79s was in
+CPU time. That likely means we spent roughly 25% of the time waiting on I/O.
+
+Phase 3: Invokation of the Build Backend
+========================================
+
+When most people think of the build system, they think of phase 3. This is
+where we take all the code in the tree and produce Firefox or whatever
+application you are creating. Phase 3 effectively takes whatever was
+generated by phase 2 and runs it. Since the dawn of Mozilla, this has been
+make consuming Makefiles. However, with the transition to moz.build files,
+you may soon see non-Make build backends, such as Tup or Visual Studio.
+
+When building the tree, most of the time is spent in phase 3. This is when
+header files are installed, C++ files are compiled, files are preprocessed, etc.
--- a/build/docs/conf.py
+++ b/build/docs/conf.py
@@ -12,16 +12,17 @@ from datetime import datetime
 
 here = os.path.abspath(os.path.dirname(__file__))
 mozilla_dir = os.path.normpath(os.path.join(here, '..', '..'))
 
 import mdn_theme
 
 extensions = [
     'sphinx.ext.autodoc',
+    'sphinx.ext.graphviz',
 ]
 
 templates_path = ['_templates']
 source_suffix = '.rst'
 master_doc = 'index'
 project = u'Mozilla Build System'
 year = datetime.now().year
 copyright = u'%s, Mozilla Foundation, CC BY-SA 3.0' % year
new file mode 100644
--- /dev/null
+++ b/build/docs/environment-variables.rst
@@ -0,0 +1,44 @@
+.. _environment_variables:
+
+================================================
+Environment Variables Impacting the Build System
+================================================
+
+Various environment variables have an impact on the behavior of the
+build system. This document attempts to document them.
+
+AUTOCLOBBER
+   If defines, the build system will automatically clobber as needed.
+   The default behavior is to print a message and error out when a
+   clobber is needed.
+
+   This variable is typically defined in a :ref:`mozconfig <mozconfig>`
+   file via ``mk_add_options``.
+
+REBUILD_CHECK
+   If defined, the build system will print information about why
+   certain files were rebuilt.
+
+   This feature is disabled by default because it makes the build slower.
+
+MACH_NO_TERMINAL_FOOTER
+   If defined, the terminal footer displayed when building with mach in
+   a TTY is disabled.
+
+MACH_NO_WRITE_TIMES
+   If defined, mach commands will not prefix output lines with the
+   elapsed time since program start. This option is equivalent to
+   passing ``--log-no-times`` to mach.
+
+MOZ_PSEUDO_DERECURSE
+   Activate an *experimental* build mode where make directory traversal
+   is derecursified. This mode should result in faster build times at
+   the expense of busted builds from time-to-time. The end goal is for
+   this build mode to be the default. At which time, this variable will
+   likely go away.
+
+   A value of ``1`` activates the mode with full optimizations.
+
+   A value of ``no-parallel-export`` activates the mode without
+   optimizations to the *export* tier, which are known to be slightly
+   buggy.
--- a/build/docs/glossary.rst
+++ b/build/docs/glossary.rst
@@ -21,8 +21,24 @@ Glossary
        options, and writes out metadata to be consumed by the build
        system.
 
    config.status
        An executable file produced by **configure** that takes the
        generated build config and writes out files used to build the
        tree. Traditionally, config.status writes out a bunch of
        Makefiles.
+
+   install manifest
+       A file containing metadata describing file installation rules.
+       A large part of the build system consists of copying files
+       around to appropriate places. We write out special files
+       describing the set of required operations so we can process the
+       actions effeciently. These files are install manifests.
+
+   clobber build
+      A build performed with an initially empty object directory. All
+      build actions must be performed.
+
+   incremental build
+      A build performed with the result of a previous build in an
+      object directory. The build should not have to work as hard because
+      it will be able to reuse the work from previous builds.
--- a/build/docs/index.rst
+++ b/build/docs/index.rst
@@ -10,18 +10,21 @@ Overview
 
    glossary
 
 Important Concepts
 ==================
 .. toctree::
    :maxdepth: 1
 
+   build-overview
    Mozconfig Files <mozconfigs>
    Profile Guided Optimization <pgo>
+   slow
+   environment-variables
 
 mozbuild
 ========
 
 mozbuild is a Python package containing a lot of the code for the
 Mozilla build system.
 
 .. toctree::
--- a/build/docs/mozconfigs.rst
+++ b/build/docs/mozconfigs.rst
@@ -1,8 +1,10 @@
+.. _mozconfig:
+
 ===============
 mozconfig Files
 ===============
 
 mozconfig files are used to configure how a build works.
 
 mozconfig files are actually shell scripts. They are executed in a
 special context with specific variables and functions exposed to them.
new file mode 100644
--- /dev/null
+++ b/build/docs/slow.rst
@@ -0,0 +1,156 @@
+.. _slow:
+
+============================
+Why the Build System is Slow
+============================
+
+A common complaint about the build system is that it's slow. There are
+many reasons contributing to its slowness. We will attempt to document
+them here.
+
+First, it is important to distinguish between a :term:`clobber build`
+and an :term:`incremental build`. The reasons for why each are slow can
+be different.
+
+The build does a lot of work
+============================
+
+It may not be obvious, but the main reason the build system is slow is
+because it does a lot of work! The source tree consists of a few
+thousand C++ files. On a modern machine, we spend over 120 minutes of CPU
+core time compiling files! So, if you are looking for the root cause of
+slow clobber builds, look at the sheer volume of C++ files in the tree.
+
+You don't have enough CPU cores and MHz
+=======================================
+
+The build should be CPU bound. If the build system maintainers are
+optimizing the build system perfectly, every CPU core in your machine
+should be 100% saturated during a build. While this isn't currently the
+case (keep reading below), generally speaking, the more CPU cores you
+have in your machine and the more total MHz in your machine, the better.
+
+**We highly recommend building with no fewer than 4 physical CPU
+cores.** Please note the *physical* in this sentence. Hyperthreaded
+cores (an Intel Core i7 will report 8 CPU cores but only 4 are physical
+for example) only yield at most a 1.25x speedup per core.
+
+We also recommend using the most modern CPU model possible. Haswell
+chips deliver much more performance per CPU cycle than say Sandy Bridge
+CPUs.
+
+This cause impacts both clobber and incremental builds.
+
+You are building with a slow I/O layer
+======================================
+
+The build system can be I/O bound if your I/O layer is slow. Linking
+libxul on some platforms and build architectures can perform gigabytes
+of I/O.
+
+To minimize the impact of slow I/O on build performance, **we highly
+recommend building with an SSD.** Power users with enough memory may opt
+to build from a RAM disk. Mechanical disks should be avoided if at all
+possible.
+
+This cause impacts both clobber and incremental builds.
+
+You don't have enough memory
+============================
+
+The build system allocates a lot of memory, especially when building
+many things in parallel. If you don't have enough free system memory,
+the build will cause swap activity, slowing down your system and the
+build. Even if you never get to the point of swapping, the build system
+performs a lot of I/O and having all accessed files in memory and the
+page cache can significantly reduce the influence of the I/O layer on
+the build system.
+
+**We recommend building with no less than 8 GB of system memory.** As
+always, the more memory you have, the better. For a bare bones machine
+doing nothing more than building the source tree, anything more than 16
+GB is likely entering the point of diminishing returns.
+
+This cause impacts both clobber and incremental builds.
+
+You are building with pymake
+============================
+
+Pymake is slower than GNU make. One reason is Python is generally slower
+than C. The build system maintainers are consistently looking at
+optimizing pymake. However, it is death by a thousand cuts.
+
+This cause impacts both clobber and incremental builds.
+
+You are building on Windows
+===========================
+
+Builds on Windows are slow for a few reasons. First, Windows builds use
+pymake, not GNU make (because of compatibility issues with GNU make).
+But, there are other sources of slowness.
+
+New processes on Windows are about a magnitude slower to spawn than on
+UNIX-y systems such as Linux. This is because Windows has optimized new
+threads while the \*NIX platforms typically optimize new processes.
+Anyway, the build system spawns thousands of new processes during a
+build. Parts of the build that rely on rapid spawning of new processes
+are slow on Windows as a result. This is most pronounced when running
+*configure*. The configure file is a giant shell script and shell
+scripts rely heavily on new processes. This is why configure on Windows
+can run over a minute slower on Windows.
+
+Another reason Windows builds are slower is because Windows lacks proper
+symlink support. On systems that support symlinks, we can generate a
+file into a staging area then symlink it into the final directory very
+quickly. On Windows, we have to perform a full file copy. This incurs
+much more I/O. And if done poorly, can muck with file modification
+times, messing up build dependencies. As of the summer of 2013, the
+impact of symlinks is being mitigated through the use
+of an :term:`install manifest`.
+
+These issues impact both clobber and incremental builds.
+
+Recursive make traversal is slow
+================================
+
+The build system has traditionally been built by employing recursive
+make. Recursive make involves make iterating through directories / make
+files sequentially and executing each in turn. This is inefficient for
+directories containing few targets/tasks because make could be *starved*
+for work when processing these directories. Any time make is starved,
+the build isn't using all available CPU cycles and the build is slower
+as a result.
+
+Work has started in bug 907365 to fix this issue by changing the way
+make traverses all the make files.
+
+The impact of slow recursive make traversal is mostly felt on
+incremental builds. Traditionally, most of the wall time during a
+no-op build is spent in make traversal.
+
+make is inefficient
+===================
+
+Compared to modern build backends like Tup or Ninja, make is slow and
+inefficient. We can only make make so fast. At some point, we'll hit a
+performance plateau and will need to use a different tool to make builds
+faster.
+
+Please note that clobber and incremental builds are different. A clobber
+build with make will likely be as fast as a clobber build with e.g. Tup.
+However, Tup should vastly outperform make when it comes to incremental
+builds. Therefore, this issue is mostly seen when performing incremental
+builds.
+
+C++ header dependency hell
+==========================
+
+Modifying a *.h* file can have significant impact on the build system.
+If you modify a *.h* that is used by 1000 C++ files, all of those 1000
+C++ files will be recompiled.
+
+Our code base has traditionally been sloppy managing the impact of
+changed headers on build performance. Bug 785103 tracks improving the
+situation.
+
+This issue mostly impacts the times of an :term:`incremental build`.