Bug 1426275: document SCHEDULES in reStructuredText; r=ahal
authorDustin J. Mitchell <dustin@mozilla.com>
Wed, 27 Dec 2017 22:19:45 +0000
changeset 398296 b3cc94c0accd6830aea6513a06b38b5952d19d98
parent 398295 f0aa8bfa1fa67daf53be5a759ea7a5292902592d
child 398297 7e989a375aa8c57296632b88656a1931b762a331
push id33213
push userebalazs@mozilla.com
push dateTue, 09 Jan 2018 09:51:47 +0000
treeherdermozilla-central@4248602674ff [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersahal
bugs1426275
milestone59.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1426275: document SCHEDULES in reStructuredText; r=ahal MozReview-Commit-ID: 9TdVLzBfXHF
taskcluster/docs/optimization-process.rst
taskcluster/docs/optimization-schedules.rst
taskcluster/docs/optimization.rst
new file mode 100644
--- /dev/null
+++ b/taskcluster/docs/optimization-process.rst
@@ -0,0 +1,75 @@
+Optimization Process
+====================
+
+Optimization proceeds in three phases: removing tasks, replacing tasks,
+and finally generating a subgraph containing only the remaining tasks.
+
+Assume the following task graph as context for these examples::
+
+    TC1 <--\     ,- UP1
+          , B1 <--- T1a
+    I1 <-|       `- T1b
+          ` B2 <--- T2a
+    TC2 <--/     |- T2b
+                 `- UP2
+
+Removing Tasks
+--------------
+
+This phase begins with tasks on which nothing depends and follows the
+dependency graph backward from there -- right to left in the diagram above. If
+a task is not removed, then nothing it depends on will be removed either.
+Thus if T1a and T1b are both removed, B1 may be removed as well. But if T2b is
+not removed, then B2 may not be removed either.
+
+For each task with no remaining dependencies, the decision whether to remove is
+made by calling the optimization strategy's ``should_remove_task`` method. If
+this method returns True, the task is removed.
+
+The optimization process takes a ``do_not_optimize`` argument containing a list
+of tasks that cannot be removed under any circumstances. This is used to
+"force" running specific tasks.
+
+Replacing Tasks
+---------------
+
+This phase begins with tasks having no dependencies and follows the reversed
+dependency graph from there -- left to right in the diagram above. If a task is
+not replaced, then anything depending on that task cannot be replaced.
+Replacement is generally done on the basis of some hash of the inputs to the
+task. In the diagram above, if both TC1 and I1 are replaced with existing
+tasks, then B1 is a candidate for replacement. But if TC2 has no replacement,
+then replacement of B2 will not be considered.
+
+It is possible to replace a task with nothing.  This is similar to optimzing
+away, but is useful for utility tasks like UP1. If such a task is considered
+for replacement, then all of its dependencies (here, B1) have already been
+replaced and there is no utility in running the task and no need for a
+replacement task.  It is an error for a task on which others depend to be
+replaced with nothing.
+
+The ``do_not_optimize`` set applies to task replacement, as does an additional
+``existing_tasks`` dictionary which allows the caller to supply as set of
+known, pre-existing tasks. This is used for action tasks, for example, where it
+contains the entire task-graph generated by the original decision task.
+
+Subgraph Generation
+-------------------
+
+The first two phases annotate each task in the existing taskgraph with their
+fate: removed, replaced, or retained. The tasks that are replaced also have a
+replacement taskId.
+
+The last phase constructs a subgraph containing the retained tasks, and
+simultaneously rewrites all dependencies to refer to taskIds instead of labels.
+To do so, it assigns a taskId to each retained task and uses the replacement
+taskId for all replaced tasks.
+
+The result is an optimized taskgraph with tasks named by taskId instead of
+label. At this phase, the edges in the task graph diverge from the
+``task.dependencies`` attributes, as the latter may contain dependencies
+outside of the taskgraph (for replacement tasks).
+
+As a side-effect, this phase also expands all ``{"task-reference": ".."}``
+objects within the task definitions.
+
new file mode 100644
--- /dev/null
+++ b/taskcluster/docs/optimization-schedules.rst
@@ -0,0 +1,89 @@
+Optimization and SCHEDULES
+==========================
+
+Most optimization of builds and tests is handled with ``SCHEDULES``.
+The concept is this: we allocate tasks into named components, and associate a set of such components to each file in the source tree.
+Given a set of files changed in a push, we then calculate the union of components affected by each file, and remove tasks that are not tagged with any of them.
+
+This optimization system is intended to be *conservative*.
+It represents what could *possibly* be affected, rather than any intuitive notion of what tasks would be useful to run for changes to a particular file.
+For example:
+
+* ``dom/url/URL.cpp`` schedules tasks on all platform and could potentially cause failures in any test suite
+
+* ``dom/system/mac/CoreLocationLocationProvider.mm`` could not possibly affect any platform but ``macosx``, but potentially any test suite
+
+* ``python/mozbuild/mozbuild/preprocessor.py`` could possibly affect any platform, and should also schedule Python lint tasks
+
+Exclusive and Inclusive
+-----------------------
+
+The first wrinkle in this "simple" plan is that there are a lot of files, and for the most part they all affect most components.
+But there are some components which are only affected by a well-defined set of files.
+For example, a Python lint component need only be scheduled when Python files are changed.
+
+We divide the components into "exclusive" and "inclusive" components.
+Absent any other configuration, any file in the repository is assumed to affect all of the exclusive components and none of the inclusive components.
+
+Exclusive components can be thought of as a series of families.
+For example, the platform (linux, windows, macosx, android) is a component family.
+The test suite (mochitest, reftest, xpcshell, etc.) is another.
+By default, source files are associated with every component in every family.
+This means tasks tagged with an exclusive component will *always* run, unless none of the modified source files are associated with that component.
+
+But what if we only want to run a particular task when a pre-determined file is modified?
+This is where inclusive components are used.
+Any task tagged with an inclusive component will *only* be run when a source file associated with that component is modified.
+Lint tasks and well separated unittest tasks are good examples of things you might want to schedule inclusively.
+
+A good way to keep this straight is to think of exclusive platform-family components (``macosx``, ``android``, ``windows``, ``linux``) and inclusive linting components (``py-lint``, ``js-lint``).
+An arbitrary file in the repository affects all platform families, but does not necessarily require a lint run.
+But we can configure mac-only files such as ``CoreLocationLocationProvider.mm`` to affect exclusively ``macosx``, and Python files like ``preprocessor.py`` to affect ``py-lint`` in addition to the exclusive components.
+
+It is also possible to define a file as affecting an inclusive component and nothing else.
+For example, the source code and configuration for the Python linting tasks does not affect any tasks other than linting.
+
+.. note:
+
+    Most unit test suite tasks are allocated to components for their platform family and for the test suite.
+    This indicates that if a platform family is affected (for example, ``android``) then the builds for that platform should execute as well as the full test suite.
+    If only a single suite is affected (for example, by a change to a reftest source file), then the reftests should execute for all platforms.
+
+    However, some test suites, for which the set of contributing files are well-defined, are represented as inclusive components.
+    These components will not be executed by default for any platform families, but only when one or more of the contributing files are changed.
+
+Specification
+-------------
+
+Components are defined as either inclusive or exclusive in :py:mod:`mozbuild.schedules`.
+
+File Annotation
+:::::::::::::::
+
+Files are annotated with their affected components in ``moz.build`` files with stanzas like ::
+
+    
+    with Files('**/*.py'):
+        SCHEDULES.inclusive += ['py-lint']
+
+for inclusive components and ::
+
+    with Files('*gradle*'):
+        SCHEDULES.exclusive = ['android']
+
+for exclusive components.
+Note the use of ``+=`` for inclusive compoenents (as this is adding to the existing set of affected components) but ``=`` for exclusive components (as this is resetting the affected set to something smaller).
+For cases where an inclusive component is affected exclusively (such as the python-lint configuration in the example above), that component can be assigned to ``SCHEDULES.exclusive``::
+
+    with Files('**/pep8rc'):
+        SCHEDULES.exclusive = ['py-lint']
+
+Task Annotation
+:::::::::::::::
+
+Tasks are annotated with the components they belong to using the ``"skip-unless-schedules"`` optimization, which takes a list of components for this task::
+
+    task['optimization'] = {'skip-unless-schedules': ['windows', 'gtest']}
+
+For tests, this value is set automatically by the test transform based on the suite name and the platform family, doing the correct thing for inclusive test suites.
+Tests also use SETA via ``"skip-unless-schedules-or-seta"``, which skips a task if it is not affected *or* if SETA deems it unimportant.
--- a/taskcluster/docs/optimization.rst
+++ b/taskcluster/docs/optimization.rst
@@ -38,82 +38,15 @@ considered for optimization. This behavi
 .. note:
 
     Because it is a mix of "what the push author wanted" and "what should run
     when necessary", try pushes with the old option syntax (``-b do -p all``,
     etc.) *do* optimize target tasks.  This can cause unexpected results when
     requested jobs are optimized away.  If those jobs were actually necessary,
     then a try push with ``try_task_config.json`` is the solution.
 
-Optimization Process
---------------------
-
-Optimization proceeds in three phases: removing tasks, replacing tasks,
-and finally generating a subgraph containing only the remaining tasks.
-
-Assume the following task graph as context for these examples::
-
-    TC1 <--\     ,- UP1
-          , B1 <--- T1a
-    I1 <-|       `- T1b
-          ` B2 <--- T2a
-    TC2 <--/     |- T2b
-                 `- UP2
-
-Removing Tasks
-::::::::::::::
-
-This phase begins with tasks on which nothing depends and follows the
-dependency graph backward from there -- right to left in the diagram above. If
-a task is not removed, then nothing it depends on will be removed either.
-Thus if T1a and T1b are both removed, B1 may be removed as well. But if T2b is
-not removed, then B2 may not be removed either.
-
-For each task with no remaining dependencies, the decision whether to remove is
-made by calling the optimization strategy's ``should_remove_task`` method. If
-this method returns True, the task is removed.
-
-The optimization process takes a ``do_not_optimize`` argument containing a list
-of tasks that cannot be removed under any circumstances. This is used to
-"force" running specific tasks.
-
-Replacing Tasks
-:::::::::::::::
+More Information
+----------------
 
-This phase begins with tasks having no dependencies and follows the reversed
-dependency graph from there -- left to right in the diagram above. If a task is
-not replaced, then anything depending on that task cannot be replaced.
-Replacement is generally done on the basis of some hash of the inputs to the
-task. In the diagram above, if both TC1 and I1 are replaced with existing
-tasks, then B1 is a candidate for replacement. But if TC2 has no replacement,
-then replacement of B2 will not be considered.
-
-It is possible to replace a task with nothing.  This is similar to optimzing
-away, but is useful for utility tasks like UP1. If such a task is considered
-for replacement, then all of its dependencies (here, B1) have already been
-replaced and there is no utility in running the task and no need for a
-replacement task.  It is an error for a task on which others depend to be
-replaced with nothing.
-
-The ``do_not_optimize`` set applies to task replacement, as does an additional
-``existing_tasks`` dictionary which allows the caller to supply as set of
-known, pre-existing tasks. This is used for action tasks, for example, where it
-contains the entire task-graph generated by the original decision task.
+.. toctree::
 
-Subgraph Generation
-:::::::::::::::::::
-
-The first two phases annotate each task in the existing taskgraph with their
-fate: removed, replaced, or retained. The tasks that are replaced also have a
-replacement taskId.
-
-The last phase constructs a subgraph containing the retained tasks, and
-simultaneously rewrites all dependencies to refer to taskIds instead of labels.
-To do so, it assigns a taskId to each retained task and uses the replacement
-taskId for all replaced tasks.
-
-The result is an optimized taskgraph with tasks named by taskId instead of
-label. At this phase, the edges in the task graph diverge from the
-``task.dependencies`` attributes, as the latter may contain dependencies
-outside of the taskgraph (for replacement tasks).
-
-As a side-effect, this phase also expands all ``{"task-reference": ".."}``
-objects within the task definitions.
+    optimization-process
+    optimization-schedules