Bug 1478472 [wpt PR 12187] - Add support for fuzzy matching in reftests., a=testonly
authorjgraham <james@hoppipolla.co.uk>
Tue, 26 Mar 2019 13:55:29 +0000
changeset 467268 18fb7bd95d12a7f535be2fda194e4b5cab9620db
parent 467267 88d4845c21396701d526fd9b1303fe715b3e065f
child 467269 875a1979d02963e8d2188cf2a42edf34fd923f05
push id35796
push usercsabou@mozilla.com
push dateMon, 01 Apr 2019 21:56:51 +0000
treeherdermozilla-central@e8b3c73b4e32 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewerstestonly
bugs1478472, 12187
milestone68.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1478472 [wpt PR 12187] - Add support for fuzzy matching in reftests., a=testonly Automatic update from web-platform-tests Add support for fuzzy matching in reftests (#12187) This allows fuzzy matching in reftests in which a comparison can succeed if the images are different within a specified tolerance. It is useful in the case of antialiasing, and in other scenarios where it's not possible to make an exact match in all cases. Differences between tests are characterised by two values: * The maximum difference for any pixel on any color channel (in the range 0 to 255) * The maximum total number of differing pixels The fuzziness can be supplied in two places, according to whether it's a property of the test or of the implementation: * In the reftest itself, using a <meta name=fuzzy> tag * In the expectation metadata file using a fuzzy: key that takes a list The general format of the fuzziness specifier is range = [name "="] [digits, "-"], digits fuzziness = [ url, "-" ], range, ";", range name = "maxDifference" | "totalPixels" The first range represents the maximum difference of any channel per pixel and the second represents the total number of pixel differences. So for example a specifier could be: * "maxDifference=10;totalPixels=300" - meaning a difference of exactly 10 per color channel and exactly 300 pixels different in total (all ranges are inclusive). * "5-10;200-300" - meaning a maximum difference of between 5 and 10 per color channel and between 200 and 300 pixels differing in total The definition of url is a little different between the meta element and the expecation metadata. In the first case the url is resolved against the current file, and applies to any reference in the current file with that name. So for example <meta name="fuzzy" content="option-1-ref.html:5;200"> would allow a fuzziness of up to 5 on a specific channel and up to 200 opixels different for comparisons involving the file containing the meta element and option-1-ref.html. In the case of expectation metadata, the metadata is always associated with the root test, so urls are always resolved relative to that. In the case as above where only a single URL is supplied, any reference document with that URL will have the fuzziness applied for whatever comparisons it's involved in e.g. [test1.html] fuzzy: option-1-ref.html:5;200 would apply the fuziness to any comparison involving option-1-ref.html whilst running the set of reftests rooted on test1.html. To specify an exact comparison for the fuzziness, one can also supply a full reference pair e.g. [test1.html] fuzzy: subtest.html==option-1-ref.html:5;200 in which case the fuzziness would only apply to "match" comparison involving subtest.html on the lhs and option-1-ref.html on the rhs (both resolved relative to test1.html). -- wpt-commits: 1f570a686843ca10f151a79956ee16110f4a4d42 wpt-pr: 12187
testing/web-platform/tests/docs/_writing-tests/reftests.md
testing/web-platform/tests/infrastructure/metadata/infrastructure/reftest/reftest_fuzzy.html.ini
testing/web-platform/tests/infrastructure/reftest/fuzzy-ref-1.html
testing/web-platform/tests/infrastructure/reftest/reftest_fuzzy.html
testing/web-platform/tests/infrastructure/reftest/reftest_fuzzy_1.html
testing/web-platform/tests/tools/manifest/item.py
testing/web-platform/tests/tools/manifest/sourcefile.py
testing/web-platform/tests/tools/manifest/tests/test_sourcefile.py
testing/web-platform/tests/tools/wptrunner/docs/expectation.rst
testing/web-platform/tests/tools/wptrunner/requirements.txt
testing/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py
testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py
testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py
testing/web-platform/tests/tools/wptrunner/wptrunner/tests/test_manifestexpected.py
testing/web-platform/tests/tools/wptrunner/wptrunner/tests/test_wpttest.py
testing/web-platform/tests/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py
testing/web-platform/tests/tools/wptrunner/wptrunner/wpttest.py
--- a/testing/web-platform/tests/docs/_writing-tests/reftests.md
+++ b/testing/web-platform/tests/docs/_writing-tests/reftests.md
@@ -107,20 +107,77 @@ attribute specified on the root element.
 screenshot to be delayed until the `load` event has fired and the
 `reftest-wait` class has been removed from the root element. Note that
 in neither case is exact timing of the screenshot guaranteed: it is
 only guaranteed to be after those events.
 
 ## Fuzzy Matching
 
 In some situations a test may have subtle differences in rendering
-compared to the reference due to, e.g., anti-aliasing. This may cause
-the test to pass on some platforms but fail on others. In this case
-some affordance for subtle discrepancies is desirable. However no
-mechanism to allow this has yet been standardized.
+compared to the reference due to, e.g., anti-aliasing. To allow for
+these small differences, we allow tests to specify a fuzziness
+characterised by two parameters, both of which must be specified:
+
+ * A maximum difference in the per-channel color value for any pixel.
+ * A number of total pixels that may be different.
+
+The maximum difference in the per pixel color value is formally
+defined as follows: let <code>T<sub>x,y,c</sub></code> be the value of
+colour channel `c` at pixel coordinates `x`, `y` in the test image and
+<code>R<sub>x,y,c</sub></code> be the corresponding value in the
+reference image, and let <code>width</code> and <code>height</code> be
+the dimensions of the image in pixels. Then <code>maxDifference =
+max<sub>x=[0,width) y=[0,height), c={r,g,b}</sub>(|T<sub>x,y,c</sub> -
+R<sub>x,y,c</sub>|)</code>.
+
+To specify the fuzziness in the test file one may add a `<meta
+name=fuzzy>` element (or, in the case of more complex tests, to any
+page containing the `<link rel=[mis]match>` elements). In the simplest
+case this has a `content` attribute containing the parameters above,
+separated by a colon e.g.
+
+```
+<meta name=fuzzy content="maxDifference=15;totalPixels=300">
+```
+
+would allow for a  difference of exactly 15 / 255 on any color channel
+and 300 exactly pixels total difference. The argument names are optional
+and may be elided; the above is the same as:
+
+```
+<meta name=fuzzy content="15;300">
+```
+
+The values may also be given as ranges e.g.
+
+```
+<meta name=fuzzy content="maxDifference=10-15;totalPixels=200-300">
+```
+
+or
+
+```
+<meta name=fuzzy content="10-15;200-300">
+```
+
+In this case the maximum pixel difference must be in the range
+`10-15` and the total number of different pixels must be in the range
+`200-300`.
+
+In cases where a single test has multiple possible refs and the
+fuzziness is not the same for all refs, a ref may be specified by
+prefixing the `content` value with the relative url for the ref e.g.
+
+```
+<meta name=fuzzy content="option1-ref.html:10-15;200-300">
+```
+
+One meta element is required per reference requiring a unique
+fuzziness value, but any unprefixed value will automatically be
+applied to any ref that doesn't have a more specific value.
 
 ## Limitations
 
 In some cases, a test cannot be a reftest. For example, there is no
 way to create a reference for underlining, since the position and
 thickness of the underline depends on the UA, the font, and/or the
 platform. However, once it's established that underlining an inline
 element works, it's possible to construct a reftest for underlining
new file mode 100644
--- /dev/null
+++ b/testing/web-platform/tests/infrastructure/metadata/infrastructure/reftest/reftest_fuzzy.html.ini
@@ -0,0 +1,2 @@
+[reftest_fuzzy.html]
+  fuzzy: fuzzy-ref-1.html:maxDifference=255;100-100
new file mode 100644
--- /dev/null
+++ b/testing/web-platform/tests/infrastructure/reftest/fuzzy-ref-1.html
@@ -0,0 +1,9 @@
+<!DOCTYPE html>
+<style>
+div {
+  width: 100px;
+  height: 100px;
+  background-color: green;
+}
+</style>
+<div></div>
new file mode 100644
--- /dev/null
+++ b/testing/web-platform/tests/infrastructure/reftest/reftest_fuzzy.html
@@ -0,0 +1,13 @@
+<!DOCTYPE html>
+<link rel=match href=fuzzy-ref-1.html>
+<!-- This meta is overridden in the corresponding ini file -->
+<meta name=fuzzy content="fuzzy-ref-1.html:128;100">
+<style>
+div {
+  width: 99px;
+  height: 100px;
+  background-color: green;
+}
+</style>
+<div></div>
+
new file mode 100644
--- /dev/null
+++ b/testing/web-platform/tests/infrastructure/reftest/reftest_fuzzy_1.html
@@ -0,0 +1,12 @@
+<!DOCTYPE html>
+<link rel=match href=fuzzy-ref-1.html>
+<meta name=fuzzy content="fuzzy-ref-1.html:255;100">
+<style>
+div {
+  width: 99px;
+  height: 100px;
+  background-color: green;
+}
+</style>
+<div></div>
+
--- a/testing/web-platform/tests/tools/manifest/item.py
+++ b/testing/web-platform/tests/tools/manifest/item.py
@@ -1,10 +1,10 @@
 from copy import copy
-
+from six import iteritems
 from six.moves.urllib.parse import urljoin, urlparse
 from abc import ABCMeta, abstractproperty
 
 item_types = {}
 
 
 class ManifestItemMeta(ABCMeta):
     """Custom metaclass that registers all the subclasses in the
@@ -164,28 +164,38 @@ class RefTestBase(URLManifestItem):
     @property
     def viewport_size(self):
         return self._extras.get("viewport_size")
 
     @property
     def dpi(self):
         return self._extras.get("dpi")
 
+    @property
+    def fuzzy(self):
+        rv = self._extras.get("fuzzy", [])
+        if isinstance(rv, list):
+            return {tuple(item[0]): item[1]
+                    for item in self._extras.get("fuzzy", [])}
+        return rv
+
     def meta_key(self):
         return (self.timeout, self.viewport_size, self.dpi)
 
     def to_json(self):
         rv = [self.url, self.references, {}]
         extras = rv[-1]
         if self.timeout is not None:
             extras["timeout"] = self.timeout
         if self.viewport_size is not None:
             extras["viewport_size"] = self.viewport_size
         if self.dpi is not None:
             extras["dpi"] = self.dpi
+        if self.fuzzy:
+            extras["fuzzy"] = list(iteritems(self.fuzzy))
         return rv
 
     @classmethod
     def from_json(cls, manifest, path, obj):
         url, references, extras = obj
         return cls(manifest.tests_root,
                    path,
                    manifest.url_base,
--- a/testing/web-platform/tests/tools/manifest/sourcefile.py
+++ b/testing/web-platform/tests/tools/manifest/sourcefile.py
@@ -1,11 +1,12 @@
 import hashlib
 import re
 import os
+from collections import deque
 from six import binary_type
 from six.moves.urllib.parse import urljoin
 from fnmatch import fnmatch
 try:
     from xml.etree import cElementTree as ElementTree
 except ImportError:
     from xml.etree import ElementTree
 
@@ -448,16 +449,89 @@ class SourceFile(object):
             return None
 
         if not self.dpi_nodes:
             return None
 
         return self.dpi_nodes[0].attrib.get("content", None)
 
     @cached_property
+    def fuzzy_nodes(self):
+        """List of ElementTree Elements corresponding to nodes in a test that
+        specify reftest fuzziness"""
+        return self.root.findall(".//{http://www.w3.org/1999/xhtml}meta[@name='fuzzy']")
+
+    @cached_property
+    def fuzzy(self):
+        rv = {}
+        if self.root is None:
+            return rv
+
+        if not self.fuzzy_nodes:
+            return rv
+
+        args = ["maxDifference", "totalPixels"]
+
+        for node in self.fuzzy_nodes:
+            item = node.attrib.get("content", "")
+
+            parts = item.rsplit(":", 1)
+            if len(parts) == 1:
+                key = None
+                value = parts[0]
+            else:
+                key = urljoin(self.url, parts[0])
+                reftype = None
+                for ref in self.references:
+                    if ref[0] == key:
+                        reftype = ref[1]
+                        break
+                if reftype not in ("==", "!="):
+                    raise ValueError("Fuzzy key %s doesn't correspond to a references" % key)
+                key = (self.url, key, reftype)
+                value = parts[1]
+            ranges = value.split(";")
+            if len(ranges) != 2:
+                raise ValueError("Malformed fuzzy value %s" % item)
+            arg_values = {None: deque()}
+            for range_str_value in ranges:
+                if "=" in range_str_value:
+                    name, range_str_value = [part.strip()
+                                             for part in range_str_value.split("=", 1)]
+                    if name not in args:
+                        raise ValueError("%s is not a valid fuzzy property" % name)
+                    if arg_values.get(name):
+                        raise ValueError("Got multiple values for argument %s" % name)
+                else:
+                    name = None
+                if "-" in range_str_value:
+                    range_min, range_max = range_str_value.split("-")
+                else:
+                    range_min = range_str_value
+                    range_max = range_str_value
+                try:
+                    range_value = [int(x.strip()) for x in (range_min, range_max)]
+                except ValueError:
+                    raise ValueError("Fuzzy value %s must be a range of integers" %
+                                     range_str_value)
+                if name is None:
+                    arg_values[None].append(range_value)
+                else:
+                    arg_values[name] = range_value
+            rv[key] = []
+            for arg_name in args:
+                if arg_values.get(arg_name):
+                    value = arg_values.pop(arg_name)
+                else:
+                    value = arg_values[None].popleft()
+                rv[key].append(value)
+            assert list(arg_values.keys()) == [None] and len(arg_values[None]) == 0
+        return rv
+
+    @cached_property
     def testharness_nodes(self):
         """List of ElementTree Elements corresponding to nodes representing a
         testharness.js script"""
         return self.root.findall(".//{http://www.w3.org/1999/xhtml}script[@src='/resources/testharness.js']")
 
     @cached_property
     def content_is_testharness(self):
         """Boolean indicating whether the file content represents a
@@ -744,17 +818,18 @@ class SourceFile(object):
                 RefTestNode(
                     self.tests_root,
                     self.rel_path,
                     self.url_base,
                     self.rel_url,
                     references=self.references,
                     timeout=self.timeout,
                     viewport_size=self.viewport_size,
-                    dpi=self.dpi
+                    dpi=self.dpi,
+                    fuzzy=self.fuzzy
                 )]
 
         elif self.content_is_css_visual and not self.name_is_reference:
             rv = VisualTest.item_type, [
                 VisualTest(
                     self.tests_root,
                     self.rel_path,
                     self.url_base,
--- a/testing/web-platform/tests/tools/manifest/tests/test_sourcefile.py
+++ b/testing/web-platform/tests/tools/manifest/tests/test_sourcefile.py
@@ -784,8 +784,46 @@ test()"""
                                             u'/_fake_base/html/test.any.serviceworker.html',
                                             u'/_fake_base/html/test.any.serviceworker.html?wss',
                                             u'/_fake_base/html/test.any.sharedworker.html',
                                             u'/_fake_base/html/test.any.sharedworker.html?wss',
                                             u'/_fake_base/html/test.any.worker.html',
                                             u'/_fake_base/html/test.any.worker.html?wss']
 
     assert items[0].url_base == "/_fake_base/"
+
+
+@pytest.mark.parametrize("fuzzy, expected", [
+    (b"ref.html:1;200", {("/foo/test.html", "/foo/ref.html", "=="): [[1, 1], [200, 200]]}),
+    (b"ref.html:0-1;100-200", {("/foo/test.html", "/foo/ref.html", "=="): [[0, 1], [100, 200]]}),
+    (b"0-1;100-200", {None: [[0,1], [100, 200]]}),
+    (b"maxDifference=1;totalPixels=200", {None: [[1, 1], [200, 200]]}),
+    (b"totalPixels=200;maxDifference=1", {None: [[1, 1], [200, 200]]}),
+    (b"totalPixels=200;1", {None: [[1, 1], [200, 200]]}),
+    (b"maxDifference=1;200", {None: [[1, 1], [200, 200]]}),])
+def test_reftest_fuzzy(fuzzy, expected):
+    content = b"""<link rel=match href=ref.html>
+<meta name=fuzzy content="%s">
+""" % fuzzy
+
+    s = create("foo/test.html", content)
+
+    assert s.content_is_ref_node
+    assert s.fuzzy == expected
+
+
+@pytest.mark.parametrize("fuzzy, expected", [
+    ([b"1;200"], {None: [[1, 1], [200, 200]]}),
+    ([b"ref-2.html:0-1;100-200"], {("/foo/test.html", "/foo/ref-2.html", "=="): [[0, 1], [100, 200]]}),
+    ([b"1;200", b"ref-2.html:0-1;100-200"],
+     {None: [[1, 1], [200, 200]],
+      ("/foo/test.html", "/foo/ref-2.html", "=="): [[0,1], [100, 200]]})])
+def test_reftest_fuzzy_multi(fuzzy, expected):
+    content = b"""<link rel=match href=ref-1.html>
+<link rel=match href=ref-2.html>
+"""
+    for item in fuzzy:
+        content += b'\n<meta name=fuzzy content="%s">' % item
+
+    s = create("foo/test.html", content)
+
+    assert s.content_is_ref_node
+    assert s.fuzzy == expected
--- a/testing/web-platform/tests/tools/wptrunner/docs/expectation.rst
+++ b/testing/web-platform/tests/tools/wptrunner/docs/expectation.rst
@@ -185,33 +185,39 @@ When used for expectation data, manifest
  * A section per test URL described by the manifest, with the section
    heading being the part of the test URL following the last ``/`` in
    the path (this allows multiple tests in a single manifest file with
    the same path part of the URL, but different query parts).
 
  * A subsection per subtest, with the heading being the title of the
    subtest.
 
- * A key ``type`` indicating the test type. This takes the values
-   ``testharness`` and ``reftest``.
-
- * For reftests, keys ``reftype`` indicating the reference type
-   (``==`` or ``!=``) and ``refurl`` indicating the URL of the
-   reference.
-
  * A key ``expected`` giving the expectation value of each (sub)test.
 
  * A key ``disabled`` which can be set to any value to indicate that
    the (sub)test is disabled and should either not be run (for tests)
    or that its results should be ignored (subtests).
 
  * A key ``restart-after`` which can be set to any value to indicate that
    the runner should restart the browser after running this test (e.g. to
    clear out unwanted state).
 
+ * A key ``fuzzy`` that is used for reftests. This is interpreted as a
+   list containing entries like ``<meta name=fuzzy>`` content value,
+   which consists of an optional reference identifier followed by a
+   colon, then a range indicating the maximum permitted pixel
+   difference per channel, then semicolon, then a range indicating the
+   maximum permitted total number of differing pixels. The reference
+   identifier is either a single relative URL, resolved against the
+   base test URL, in which case the fuzziness applies to any
+   comparison with that URL, or takes the form lhs url, comparison,
+   rhs url, in which case the fuzziness only applies for any
+   comparison involving that specifc pair of URLs. Some illustrative
+   examples are given below.
+
  * Variables ``debug``, ``os``, ``version``, ``processor`` and
    ``bits`` that describe the configuration of the browser under
    test. ``debug`` is a boolean indicating whether a build is a debug
    build. ``os`` is a string indicating the operating system, and
    ``version`` a string indicating the particular version of that
    operating system. ``processor`` is a string indicating the
    processor architecture and ``bits`` an integer indicating the
    number of bits. This information is typically provided by
@@ -241,8 +247,23 @@ A more complex manifest with conditional
   [canvas_test.html]
     expected:
       if os == "osx": FAIL
       if os == "windows" and version == "XP": FAIL
       PASS
 
 Note that ``PASS`` in the above works, but is unnecessary; ``PASS``
 (or ``OK``) is always the default expectation for (sub)tests.
+
+A manifest with fuzzy reftest values might be::
+
+  [reftest.html]
+    fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]
+
+In this case the default fuzziness for any comparison would be to
+require a maximum difference per channel of less than or equal to 10
+and less than or equal to 200 total pixels different. For any
+comparison involving ref1.html on the right hand side, the limits
+would instead be a difference per channel not more than 20 and a total
+difference count of not less than 200 and not more than 300. For the
+specific comparison subtest1.html == ref2.html (both resolved against
+the test URL) these limits would instead be 10 to 15 and 0 to 20,
+respectively.
--- a/testing/web-platform/tests/tools/wptrunner/requirements.txt
+++ b/testing/web-platform/tests/tools/wptrunner/requirements.txt
@@ -1,5 +1,7 @@
 html5lib == 1.0.1
 mozinfo == 0.10
 mozlog==4.0
 mozdebug==0.1.1
+pillow == 5.2.0
 urllib3[secure]==1.24.1
+
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py
@@ -1,18 +1,21 @@
 import base64
 import hashlib
 import httplib
+import io
 import os
 import threading
 import traceback
 import socket
 import urlparse
 from abc import ABCMeta, abstractmethod
 
+from PIL import Image, ImageChops, ImageStat
+
 from ..testrunner import Stop
 from protocol import Protocol, BaseProtocolPart
 
 here = os.path.split(__file__)[0]
 
 # Extra timeout to use after internal test timeout at which the harness
 # should force a timeout
 extra_timeout = 5  # seconds
@@ -281,58 +284,80 @@ class RefTestImplementation(object):
         if key not in self.screenshot_cache:
             success, data = self.executor.screenshot(test, viewport_size, dpi)
 
             if not success:
                 return False, data
 
             screenshot = data
             hash_value = hash_screenshot(data)
-
-            self.screenshot_cache[key] = (hash_value, None)
+            self.screenshot_cache[key] = (hash_value, screenshot)
 
             rv = (hash_value, screenshot)
         else:
             rv = self.screenshot_cache[key]
 
         self.message.append("%s %s" % (test.url, rv[0]))
         return True, rv
 
     def reset(self):
         self.screenshot_cache.clear()
 
-    def is_pass(self, lhs_hash, rhs_hash, relation):
+    def is_pass(self, hashes, screenshots, relation, fuzzy):
         assert relation in ("==", "!=")
-        self.message.append("Testing %s %s %s" % (lhs_hash, relation, rhs_hash))
-        return ((relation == "==" and lhs_hash == rhs_hash) or
-                (relation == "!=" and lhs_hash != rhs_hash))
+        if not fuzzy or fuzzy == ((0,0), (0,0)):
+            equal = hashes[0] == hashes[1]
+        else:
+            max_per_channel, pixels_different = self.get_differences(screenshots)
+            allowed_per_channel, allowed_different = fuzzy
+            self.logger.info("Allowed %s pixels different, maximum difference per channel %s" %
+                             ("-".join(str(item) for item in allowed_different),
+                              "-".join(str(item) for item in allowed_per_channel)))
+            equal = (allowed_per_channel[0] <= max_per_channel <= allowed_per_channel[1] and
+                     allowed_different[0] <= pixels_different <= allowed_different[1])
+        return equal if relation == "==" else not equal
+
+    def get_differences(self, screenshots):
+        lhs = Image.open(io.BytesIO(base64.b64decode(screenshots[0]))).convert("RGB")
+        rhs = Image.open(io.BytesIO(base64.b64decode(screenshots[1]))).convert("RGB")
+        diff = ImageChops.difference(lhs, rhs)
+        minimal_diff = diff.crop(diff.getbbox())
+        mask = minimal_diff.convert("L", dither=None)
+        stat = ImageStat.Stat(minimal_diff, mask)
+        per_channel = max(item[1] for item in stat.extrema)
+        count = stat.count[0]
+        self.logger.info("Found %s pixels different, maximum difference per channel %s" %
+                         (count, per_channel))
+        return per_channel, count
 
     def run_test(self, test):
         viewport_size = test.viewport_size
         dpi = test.dpi
         self.message = []
 
         # Depth-first search of reference tree, with the goal
         # of reachings a leaf node with only pass results
 
         stack = list(((test, item[0]), item[1]) for item in reversed(test.references))
         while stack:
             hashes = [None, None]
             screenshots = [None, None]
 
             nodes, relation = stack.pop()
+            fuzzy = self.get_fuzzy(test, nodes, relation)
 
             for i, node in enumerate(nodes):
                 success, data = self.get_hash(node, viewport_size, dpi)
                 if success is False:
                     return {"status": data[0], "message": data[1]}
 
                 hashes[i], screenshots[i] = data
 
-            if self.is_pass(hashes[0], hashes[1], relation):
+            if self.is_pass(hashes, screenshots, relation, fuzzy):
+                fuzzy = self.get_fuzzy(test, nodes, relation)
                 if nodes[1].references:
                     stack.extend(list(((nodes[1], item[0]), item[1]) for item in reversed(nodes[1].references)))
                 else:
                     # We passed
                     return {"status":"PASS", "message": None}
 
         # We failed, so construct a failure message
 
@@ -347,16 +372,35 @@ class RefTestImplementation(object):
             relation,
             {"url": nodes[1].url, "screenshot": screenshots[1], "hash": hashes[1]},
         ]
 
         return {"status": "FAIL",
                 "message": "\n".join(self.message),
                 "extra": {"reftest_screenshots": log_data}}
 
+    def get_fuzzy(self, root_test, test_nodes, relation):
+        full_key = tuple([item.url for item in test_nodes] + [relation])
+        ref_only_key = test_nodes[1].url
+
+        fuzzy_override = root_test.fuzzy_override
+        fuzzy = test_nodes[0].fuzzy
+
+        sources = [fuzzy_override, fuzzy]
+        keys = [full_key, ref_only_key, None]
+        value = None
+        for source in sources:
+            for key in keys:
+                if key in source:
+                    value = source[key]
+                    break
+            if value:
+                break
+        return value
+
     def retake_screenshot(self, node, viewport_size, dpi):
         success, data = self.executor.screenshot(node, viewport_size, dpi)
         if not success:
             return False, data
 
         key = (node.url, viewport_size, dpi)
         hash_val, _ = self.screenshot_cache[key]
         self.screenshot_cache[key] = hash_val, data
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py
@@ -841,17 +841,17 @@ class MarionetteRefTestExecutor(RefTestE
         screenshot = protocol.marionette.screenshot(full=False)
         # strip off the data:img/png, part of the url
         if screenshot.startswith("data:image/png;base64,"):
             screenshot = screenshot.split(",", 1)[1]
 
         return screenshot
 
 
-class InternalRefTestImplementation(object):
+class InternalRefTestImplementation(RefTestImplementation):
     def __init__(self, executor):
         self.timeout_multiplier = executor.timeout_multiplier
         self.executor = executor
 
     @property
     def logger(self):
         return self.executor.logger
 
@@ -865,31 +865,32 @@ class InternalRefTestImplementation(obje
         self.executor.protocol.marionette._send_message("reftest:setup", data)
 
     def reset(self, screenshot=None):
         # this is obvious wrong; it shouldn't be a no-op
         # see https://github.com/web-platform-tests/wpt/issues/15604
         pass
 
     def run_test(self, test):
-        references = self.get_references(test)
+        references = self.get_references(test, test)
         timeout = (test.timeout * 1000) * self.timeout_multiplier
         rv = self.executor.protocol.marionette._send_message("reftest:run",
                                                              {"test": self.executor.test_url(test),
                                                               "references": references,
                                                               "expected": test.expected(),
                                                               "timeout": timeout,
                                                               "width": 800,
                                                               "height": 600})["value"]
         return rv
 
-    def get_references(self, node):
+    def get_references(self, root_test, node):
         rv = []
         for item, relation in node.references:
-            rv.append([self.executor.test_url(item), self.get_references(item), relation])
+            rv.append([self.executor.test_url(item), self.get_references(root_test, item), relation,
+                       {"fuzzy": self.get_fuzzy(root_test, [node, item], relation)}])
         return rv
 
     def teardown(self):
         try:
             if self.executor.protocol.marionette and self.executor.protocol.marionette.session_id:
                 self.executor.protocol.marionette._send_message("reftest:teardown", {})
                 self.executor.protocol.marionette.set_context(self.executor.protocol.marionette.CONTEXT_CONTENT)
                 # the reftest runner opens/closes a window with focus, so as
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py
@@ -1,10 +1,11 @@
 import os
 import urlparse
+from collections import deque
 
 from wptmanifest.backends import static
 from wptmanifest.backends.static import ManifestItem
 
 import expected
 
 """Manifest structure used to store expected results of a test.
 
@@ -92,16 +93,115 @@ def leak_threshold(node):
         for item in node_items:
             process, value = item.rsplit(":", 1)
             rv[process.strip()] = int(value.strip())
     except KeyError:
         pass
     return rv
 
 
+def fuzzy_prop(node):
+    """Fuzzy reftest match
+
+    This can either be a list of strings or a single string. When a list is
+    supplied, the format of each item matches the description below.
+
+    The general format is
+    fuzzy = [key ":"] <prop> ";" <prop>
+    key = <test name> [reftype <reference name>]
+    reftype = "==" | "!="
+    prop = [propName "=" ] range
+    propName = "maxDifferences" | "totalPixels"
+    range = <digits> ["-" <digits>]
+
+    So for example:
+      maxDifferences=10;totalPixels=10-20
+
+      specifies that for any test/ref pair for which no other rule is supplied,
+      there must be a maximum pixel difference of exactly 10, and betwen 10 and
+      20 total pixels different.
+
+      test.html==ref.htm:10;20
+
+      specifies that for a equality comparison between test.html and ref.htm,
+      resolved relative to the test path, there can be a maximum difference
+      of 10 in the pixel value for any channel and 20 pixels total difference.
+
+      ref.html:10;20
+
+      is just like the above but applies to any comparison involving ref.html
+      on the right hand side.
+
+    The return format is [(key, (maxDifferenceRange, totalPixelsRange))], where
+    the key is either None where no specific reference is specified, the reference
+    name where there is only one component or a tuple (test, ref, reftype) when the
+    exact comparison is specified. maxDifferenceRange and totalPixelsRange are tuples
+    of integers indicating the inclusive range of allowed values.
+"""
+    rv = []
+    args = ["maxDifference", "totalPixels"]
+    try:
+        value = node.get("fuzzy")
+    except KeyError:
+        return rv
+    if not isinstance(value, list):
+        value = [value]
+    for item in value:
+        if not isinstance(item, (str, unicode)):
+            rv.append(item)
+            continue
+        parts = item.rsplit(":", 1)
+        if len(parts) == 1:
+            key = None
+            fuzzy_values = parts[0]
+        else:
+            key, fuzzy_values = parts
+            for reftype in ["==", "!="]:
+                if reftype in key:
+                    key = key.split(reftype)
+                    key.append(reftype)
+                    key = tuple(key)
+        ranges = fuzzy_values.split(";")
+        if len(ranges) != 2:
+            raise ValueError("Malformed fuzzy value %s" % item)
+        arg_values = {None: deque()}
+        for range_str_value in ranges:
+            if "=" in range_str_value:
+                name, range_str_value = [part.strip()
+                                         for part in range_str_value.split("=", 1)]
+                if name not in args:
+                    raise ValueError("%s is not a valid fuzzy property" % name)
+                if arg_values.get(name):
+                    raise ValueError("Got multiple values for argument %s" % name)
+            else:
+                name = None
+            if "-" in range_str_value:
+                range_min, range_max = range_str_value.split("-")
+            else:
+                range_min = range_str_value
+                range_max = range_str_value
+            try:
+                range_value = tuple(int(item.strip()) for item in (range_min, range_max))
+            except ValueError:
+                raise ValueError("Fuzzy value %s must be a range of integers" % range_str_value)
+            if name is None:
+                arg_values[None].append(range_value)
+            else:
+                arg_values[name] = range_value
+        range_values = []
+        for arg_name in args:
+            if arg_values.get(arg_name):
+                value = arg_values.pop(arg_name)
+            else:
+                value = arg_values[None].popleft()
+            range_values.append(value)
+        rv.append((key, tuple(range_values)))
+    return rv
+
+
 class ExpectedManifest(ManifestItem):
     def __init__(self, name, test_path, url_base):
         """Object representing all the tests in a particular manifest
 
         :param name: Name of the AST Node associated with this object.
                      Should always be None since this should always be associated with
                      the root node of the AST.
         :param test_path: Path of the test file associated with this manifest.
@@ -178,16 +278,20 @@ class ExpectedManifest(ManifestItem):
     @property
     def leak_threshold(self):
         return leak_threshold(self)
 
     @property
     def lsan_max_stack_depth(self):
         return int_prop("lsan-max-stack-depth", self)
 
+    @property
+    def fuzzy(self):
+        return fuzzy_prop(self)
+
 
 class DirectoryManifest(ManifestItem):
     @property
     def disabled(self):
         return bool_prop("disabled", self)
 
     @property
     def restart_after(self):
@@ -224,16 +328,21 @@ class DirectoryManifest(ManifestItem):
     @property
     def leak_threshold(self):
         return leak_threshold(self)
 
     @property
     def lsan_max_stack_depth(self):
         return int_prop("lsan-max-stack-depth", self)
 
+    @property
+    def fuzzy(self):
+        return fuzzy_prop(self)
+
+
 class TestNode(ManifestItem):
     def __init__(self, name):
         """Tree node associated with a particular test in a manifest
 
         :param name: name of the test"""
         assert name is not None
         ManifestItem.__init__(self, name)
         self.updated_expected = []
@@ -296,16 +405,20 @@ class TestNode(ManifestItem):
     @property
     def leak_threshold(self):
         return leak_threshold(self)
 
     @property
     def lsan_max_stack_depth(self):
         return int_prop("lsan-max-stack-depth", self)
 
+    @property
+    def fuzzy(self):
+        return fuzzy_prop(self)
+
     def append(self, node):
         """Add a subtest to the current test
 
         :param node: AST Node associated with the subtest"""
         child = ManifestItem.append(self, node)
         self.subtests[child.name] = child
 
     def get_subtest(self, name):
new file mode 100644
--- /dev/null
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/tests/test_manifestexpected.py
@@ -0,0 +1,38 @@
+import os
+import sys
+from io import BytesIO
+
+import pytest
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", ".."))
+
+from wptrunner import manifestexpected
+
+
+@pytest.mark.parametrize("fuzzy, expected", [
+    (b"ref.html:1;200", [("ref.html", ((1, 1), (200, 200)))]),
+    (b"ref.html:0-1;100-200", [("ref.html", ((0, 1), (100, 200)))]),
+    (b"0-1;100-200", [(None, ((0, 1), (100, 200)))]),
+    (b"maxDifference=1;totalPixels=200", [(None, ((1, 1), (200, 200)))]),
+    (b"totalPixels=200;maxDifference=1", [(None, ((1, 1), (200, 200)))]),
+    (b"totalPixels=200;1", [(None, ((1, 1), (200, 200)))]),
+    (b"maxDifference=1;200", [(None, ((1, 1), (200, 200)))]),
+    (b"test.html==ref.html:maxDifference=1;totalPixels=200",
+     [((u"test.html", u"ref.html", "=="), ((1, 1), (200, 200)))]),
+    (b"test.html!=ref.html:maxDifference=1;totalPixels=200",
+     [((u"test.html", u"ref.html", "!="), ((1, 1), (200, 200)))]),
+    (b"[test.html!=ref.html:maxDifference=1;totalPixels=200, test.html==ref1.html:maxDifference=5-10;100]",
+     [((u"test.html", u"ref.html", "!="), ((1, 1), (200, 200))),
+      ((u"test.html", u"ref1.html", "=="), ((5,10), (100, 100)))]),
+])
+def test_fuzzy(fuzzy, expected):
+    data = """
+[test.html]
+  fuzzy: %s""" % fuzzy
+    f = BytesIO(data)
+    manifest = manifestexpected.static.compile(f,
+                                               {},
+                                               data_cls_getter=manifestexpected.data_cls_getter,
+                                               test_path="test/test.html",
+                                               url_base="/")
+    assert manifest.get_test("/test/test.html").fuzzy == expected
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/tests/test_wpttest.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/tests/test_wpttest.py
@@ -1,16 +1,17 @@
 import os
 import sys
 from io import BytesIO
 
 from mock import Mock
 
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
 
+from manifest import manifest as wptmanifest
 from manifest.item import TestharnessTest
 from wptrunner import manifestexpected, wpttest
 
 dir_ini_0 = """\
 prefs: [a:b]
 """
 
 dir_ini_1 = """\
@@ -39,16 +40,21 @@ test_1 = """\
     if os == 'win': FAIL
 """
 
 test_2 = """\
 [2.html]
   lsan-max-stack-depth: 42
 """
 
+test_fuzzy = """\
+[fuzzy.html]
+  fuzzy: fuzzy-ref.html:1;200
+"""
+
 
 testharness_test = """<script src="/resources/testharness.js"></script>
 <script src="/resources/testharnessreport.js"></script>"""
 
 
 def make_mock_manifest(*items):
     rv = Mock(tests_root="/foobar")
     tests = []
@@ -134,8 +140,31 @@ def test_metadata_lsan_stack_depth():
             {},
             data_cls_getter=lambda x,y: manifestexpected.DirectoryManifest)
     ]
 
     test = tests[0][2].pop()
     test_obj = wpttest.from_manifest(tests, test, inherit_metadata, test_metadata.get_test(test.id))
 
     assert test_obj.lsan_max_stack_depth == 42
+
+
+def test_metadata_fuzzy():
+    manifest_data = {
+        "items": {"reftest": {"a/fuzzy.html": [["/a/fuzzy.html",
+                                                [["/a/fuzzy-ref.html", "=="]],
+                                                {"fuzzy": [[["/a/fuzzy.html", '/a/fuzzy-ref.html', '=='],
+                                                            [[2, 3], [10, 15]]]]}]]}},
+        "paths": {"a/fuzzy.html": ["0"*40, "reftest"]},
+        "version": wptmanifest.CURRENT_VERSION,
+        "url_base": "/"}
+    manifest = wptmanifest.Manifest.from_json(".", manifest_data)
+    test_metadata = manifestexpected.static.compile(BytesIO(test_fuzzy),
+                                                    {},
+                                                    data_cls_getter=manifestexpected.data_cls_getter,
+                                                    test_path="a/fuzzy.html",
+                                                    url_base="/")
+
+    test = manifest.iterpath("a/fuzzy.html").next()
+    test_obj = wpttest.from_manifest(manifest, test, [], test_metadata.get_test(test.id))
+
+    assert test_obj.fuzzy == {('/a/fuzzy.html', '/a/fuzzy-ref.html', '=='): [[2, 3], [10, 15]]}
+    assert test_obj.fuzzy_override == {'/a/fuzzy-ref.html': ((1, 1), (200, 200))}
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py
@@ -336,16 +336,18 @@ class ManifestItem(object):
         for item in self._flatten().iteritems():
             yield item
 
     def iterkeys(self):
         for item in self._flatten().iterkeys():
             yield item
 
     def remove_value(self, key, value):
+        if key not in self._data:
+            return
         try:
             self._data[key].remove(value)
         except ValueError:
             return
         if not self._data[key]:
             del self._data[key]
         value.remove()
 
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/wpttest.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/wpttest.py
@@ -1,10 +1,11 @@
 import os
 import subprocess
+import urlparse
 from collections import defaultdict
 
 from wptmanifest.parser import atoms
 
 atom_reset = atoms["Reset"]
 enabled_tests = set(["testharness", "reftest", "wdspec"])
 
 
@@ -274,22 +275,22 @@ class Test(object):
 
         tags.add("dir:%s" % self.id.lstrip("/").split("/")[0])
 
         return tags
 
     @property
     def prefs(self):
         prefs = {}
-        for meta in self.itermeta():
+        for meta in reversed(list(self.itermeta())):
             meta_prefs = meta.prefs
+            if atom_reset in meta_prefs:
+                del meta_prefs[atom_reset]
+                prefs = {}
             prefs.update(meta_prefs)
-            if atom_reset in meta_prefs:
-                del prefs[atom_reset]
-                break
         return prefs
 
     def expected(self, subtest=None):
         if subtest is None:
             default = self.result_cls.default_expected
         else:
             default = self.subtest_result_cls.default_expected
 
@@ -354,27 +355,28 @@ class ManualTest(Test):
         return self.url
 
 
 class ReftestTest(Test):
     result_cls = ReftestResult
     test_type = "reftest"
 
     def __init__(self, tests_root, url, inherit_metadata, test_metadata, references,
-                 timeout=None, path=None, viewport_size=None, dpi=None, protocol="http"):
+                 timeout=None, path=None, viewport_size=None, dpi=None, fuzzy=None, protocol="http"):
         Test.__init__(self, tests_root, url, inherit_metadata, test_metadata, timeout,
                       path, protocol)
 
         for _, ref_type in references:
             if ref_type not in ("==", "!="):
                 raise ValueError
 
         self.references = references
         self.viewport_size = viewport_size
         self.dpi = dpi
+        self._fuzzy = fuzzy or {}
 
     @classmethod
     def from_manifest(cls,
                       manifest_file,
                       manifest_test,
                       inherit_metadata,
                       test_metadata,
                       nodes=None,
@@ -393,17 +395,18 @@ class ReftestTest(Test):
                    manifest_test.url,
                    inherit_metadata,
                    test_metadata,
                    [],
                    timeout=timeout,
                    path=manifest_test.path,
                    viewport_size=manifest_test.viewport_size,
                    dpi=manifest_test.dpi,
-                   protocol="https" if hasattr(manifest_test, "https") and manifest_test.https else "http")
+                   protocol="https" if hasattr(manifest_test, "https") and manifest_test.https else "http",
+                   fuzzy=manifest_test.fuzzy)
 
         nodes[url] = node
 
         for ref_url, ref_type in manifest_test.references:
             comparison_key = (ref_type,) + tuple(sorted([url, ref_url]))
             if ref_url in nodes:
                 manifest_node = ref_url
                 if comparison_key in references_seen:
@@ -449,16 +452,40 @@ class ReftestTest(Test):
     @property
     def id(self):
         return self.url
 
     @property
     def keys(self):
         return ("reftype", "refurl")
 
+    @property
+    def fuzzy(self):
+        return self._fuzzy
+
+    @property
+    def fuzzy_override(self):
+        values = {}
+        for meta in reversed(list(self.itermeta(None))):
+            value = meta.fuzzy
+            if not value:
+                continue
+            if atom_reset in value:
+                value.remove(atom_reset)
+                values = {}
+            for key, data in value:
+                if len(key) == 3:
+                    key[0] = urlparse.urljoin(self.url, key[0])
+                    key[1] = urlparse.urljoin(self.url, key[1])
+                else:
+                    # Key is just a relative url to a ref
+                    key = urlparse.urljoin(self.url, key)
+                values[key] = data
+        return values
+
 
 class WdspecTest(Test):
 
     result_cls = WdspecResult
     subtest_result_cls = WdspecSubtestResult
     test_type = "wdspec"
 
     default_timeout = 25