Bug 968419 - Store and submit a persistent health report identifier. r=rnewman, r=bsmedberg, a=lsblakk
authorGregory Szorc <gps@mozilla.com>
Thu, 20 Feb 2014 11:30:52 -0800
changeset 192148 751264a8a3a073b2f10d6d25cb8a29d6cde3335f
parent 192147 1cad4c4ddeaa1c9ce857b74022a772ee4dc90343
child 192149 bd93ca95b9b6d5938b5153001f1d31099aece27d
push id3503
push userraliiev@mozilla.com
push dateMon, 28 Apr 2014 18:51:11 +0000
treeherdermozilla-beta@c95ac01e332e [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersrnewman, bsmedberg, lsblakk
bugs968419
milestone30.0a2
Bug 968419 - Store and submit a persistent health report identifier. r=rnewman, r=bsmedberg, a=lsblakk Up to this point, Firefox Health Report has generated and submitted a random UUID with each upload. Generated UUIDs were stored on the client. During upload, the client asked the server to delete all old UUIDs. Well-behaving clients thus left at most one record/ID on the server. Unfortunately, clients in the wild have not been behaving properly. We are seeing multiple documents on the server that appear to come from the same client. Clients are uploading new records but failing to delete the old ones. These old, undeleted "orphan" records are severely impacting the ability to derive useful knowledge from FHR data because it is difficult, resource intensive, and error prone to filter the records on the server. This is undermining the ability for FHR data to be put to good use. This patch introduces a persistent client identifier. When the client is initialized, it generates a random UUID. That UUID is persisted to the profile and sent as part of every upload. For privacy reasons, if a client opts out of data submission, the client ID will be reset as soon as all remote data has been deleted. We still issue and send upload IDs. They exist mostly for forensics purposes so we may log client behavior and more accurately determine what exactly misbehaving, orphan-producing clients are doing. It is worth noting that this persistent client identifier will not solve all problems of branching and orphaned records. For example, profile copying will result in multiple clients sharing a client identifier. A "client ID version" field has been added to facilitate an upgrade path towards client IDs with different generation semantics.
services/healthreport/docs/dataformat.rst
services/healthreport/docs/identifiers.rst
services/healthreport/docs/index.rst
services/healthreport/healthreporter.jsm
services/healthreport/tests/xpcshell/test_healthreporter.js
--- a/services/healthreport/docs/dataformat.rst
+++ b/services/healthreport/docs/dataformat.rst
@@ -378,16 +378,31 @@ lastPingDate
     this will not be present.
 
 thisPingDate
     UTC date when this payload was constructed.
 
 version
     Integer version of this payload format. Currently only 1 is defined.
 
+clientID
+    An identifier that identifies the client that is submitting data.
+
+    This property may not be present in older clients.
+
+    See :ref:`healthreport_identifiers` for more info on identifiers.
+
+clientIDVersion
+    Integer version associated with the generation semantics for the
+    ``clientID``.
+
+    If the value is ``1``, ``clientID`` is a randomly-generated UUID.
+
+    This property may not be present in older clients.
+
 data
     Object holding data constituting health report.
 
 Data Properties
 ---------------
 
 The bulk of the health report is contained within the *data* object. This
 object has the following keys:
new file mode 100644
--- /dev/null
+++ b/services/healthreport/docs/identifiers.rst
@@ -0,0 +1,83 @@
+.. _healthreport_identifiers:
+
+===========
+Identifiers
+===========
+
+Firefox Health Report records some identifiers to keep track of clients
+and uploaded documents.
+
+Identifier Types
+================
+
+Document/Upload IDs
+-------------------
+
+A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
+client creates or uploads a new document.
+
+When clients generate a new *Document ID*, they persist this ID to disk
+**before** the upload attempt.
+
+As part of the upload, the client sends all old *Document IDs* to the server
+and asks the server to delete them. In well-behaving clients, the server
+has a single record for each client with a randomly-changing *Document ID*.
+
+Client IDs
+----------
+
+A *Client ID* is an identifier that **attempts** to uniquely identify an
+individual FHR client. Please note the emphasis on *attempts* in that last
+sentence: *Client IDs* do not guarantee uniqueness.
+
+The *Client ID* is generated when the client first runs or as needed.
+
+The *Client ID* is transferred to the server as part of every upload. The
+server is thus able to affiliate multiple document uploads with a single
+*Client ID*.
+
+Client ID Versions
+^^^^^^^^^^^^^^^^^^
+
+The semantics for how a *Client ID* is generated are versioned.
+
+Version 1
+   The *Client ID* is a randomly-generated UUID.
+
+History of Identifiers
+======================
+
+In the beginning, there were just *Document IDs*. The thinking was clients
+would clean up after themselves and leave at most 1 active document on the
+server.
+
+Unfortunately, this did not work out. Using brute force analysis to
+deduplicate records on the server, a number of interesting patterns emerged.
+
+Orphaning
+   Clients would upload a new payload while not deleting the old payload.
+
+Divergent records
+   Records would share data up to a certain date and then the data would
+   almost completely diverge. This appears to be indicative of profile
+   copying.
+
+Rollback
+   Records would share data up to a certain date. Each record in this set
+   would contain data for a day or two but no extra data. This could be
+   explained by filesystem rollback on the client.
+
+A significant percentage of the records on the server belonged to
+misbehaving clients. Identifying these records was extremely resource
+intensive and error-prone. These records were undermining the ability
+to use Firefox Health Report data.
+
+Thus, the *Client ID* was born. The intent of the *Client ID* was to
+uniquely identify clients so the extreme effort required and the
+questionable reliability of deduplicating server data would become
+problems of the past.
+
+The *Client ID* was originally a randomly-generated UUID (version 1). This
+allowed detection of orphaning and rollback. However, these version 1
+*Client IDs* were still susceptible to use on multiple profiles and
+machines if the profile was copied.
--- a/services/healthreport/docs/index.rst
+++ b/services/healthreport/docs/index.rst
@@ -19,16 +19,17 @@ application. In other words, everything 
 a reusable library. However, the terminology and some of the features
 are very specific to what the Firefox Health Report feature requires.
 
 .. toctree::
    :maxdepth: 1
 
    architecture
    dataformat
+   identifiers
 
 Legal and Privacy Concerns
 ==========================
 
 Because Firefox Health Report collects and submits data to remote
 servers and is an opt-out feature, there are legal and privacy
 concerns over what data may be collected and submitted. **Additions or
 changes to submitted data should be signed off by responsible
--- a/services/healthreport/healthreporter.jsm
+++ b/services/healthreport/healthreporter.jsm
@@ -55,21 +55,29 @@ const TELEMETRY_SHUTDOWN = "HEALTHREPORT
 const TELEMETRY_COLLECT_CHECKPOINT = "HEALTHREPORT_POST_COLLECT_CHECKPOINT_MS";
 
 
 /**
  * Helper type to assist with management of Health Reporter state.
  *
  * Instances are not meant to be created outside of a HealthReporter instance.
  *
- * Please note that remote IDs are treated as a queue. When a new remote ID is
- * added, it goes at the back of the queue. When we look for the current ID, we
- * pop the ID at the front of the queue. This helps ensure that all IDs on the
- * server are eventually deleted. This should eventually become irrelevant once
- * the server supports multiple ID deletion.
+ * There are two types of IDs associated with clients.
+ *
+ * Since the beginning of FHR, there has existed a per-upload ID: a UUID is
+ * generated at upload time and associated with the state before upload starts.
+ * That same upload includes a request to delete all other upload IDs known by
+ * the client.
+ *
+ * Per-upload IDs had the unintended side-effect of creating "orphaned"
+ * records/upload IDs on the server. So, a stable client identifer has been
+ * introduced. This client identifier is generated when it's missing and sent
+ * as part of every upload.
+ *
+ * There is a high chance we may remove upload IDs in the future.
  */
 function HealthReporterState(reporter) {
   this._reporter = reporter;
 
   let profD = OS.Constants.Path.profileDir;
 
   if (!profD || !profD.length) {
     throw new Error("Could not obtain profile directory. OS.File not " +
@@ -84,16 +92,30 @@ function HealthReporterState(reporter) {
   let leaf = reporter._stateLeaf || "state.json";
 
   this._filename = OS.Path.join(this._stateDir, leaf);
   this._log.debug("Storing state in " + this._filename);
   this._s = null;
 }
 
 HealthReporterState.prototype = Object.freeze({
+  /**
+   * Persistent string identifier associated with this client.
+   */
+  get clientID() {
+    return this._s.clientID;
+  },
+
+  /**
+   * The version associated with the client ID.
+   */
+  get clientIDVersion() {
+    return this._s.clientIDVersion;
+  },
+
   get lastPingDate() {
     return new Date(this._s.lastPingTime);
   },
 
   get lastSubmitID() {
     return this._s.remoteIDs[0];
   },
 
@@ -112,19 +134,29 @@ HealthReporterState.prototype = Object.f
       } catch (ex if ex instanceof OS.FileError) {
         if (!ex.becauseExists) {
           throw ex;
         }
       }
 
       let resetObjectState = function () {
         this._s = {
+          // The payload version. This is bumped whenever there is a
+          // backwards-incompatible change.
           v: 1,
+          // The persistent client identifier.
+          clientID: CommonUtils.generateUUID(),
+          // Denotes the mechanism used to generate the client identifier.
+          // 1: Random UUID.
+          clientIDVersion: 1,
+          // Upload IDs that might be on the server.
           remoteIDs: [],
+          // When we last performed an uploaded.
           lastPingTime: 0,
+          // Tracks whether we removed an outdated payload.
           removedOutdatedLastpayload: false,
         };
       }.bind(this);
 
       try {
         this._s = yield CommonUtils.readJSON(this._filename);
       } catch (ex if ex instanceof OS.File.Error) {
         if (!ex.becauseNoSuchFile) {
@@ -149,16 +181,33 @@ HealthReporterState.prototype = Object.f
 
       if (this._s.v != 1) {
         this._log.warn("Unknown version in state file: " + this._s.v);
         resetObjectState();
         // We explicitly don't save here in the hopes an application re-upgrade
         // comes along and fixes us.
       }
 
+      let regen = false;
+      if (!this._s.clientID) {
+        this._log.warn("No client ID stored. Generating random ID.");
+        regen = true;
+      }
+
+      if (typeof(this._s.clientID) != "string") {
+        this._log.warn("Client ID is not a string. Regenerating.");
+        regen = true;
+      }
+
+      if (regen) {
+        this._s.clientID = CommonUtils.generateUUID();
+        this._s.clientIDVersion = 1;
+        yield this.save();
+      }
+
       // Always look for preferences. This ensures that downgrades followed
       // by reupgrades don't result in excessive data loss.
       for (let promise of this._migratePrefs()) {
         yield promise;
       }
     }.bind(this));
   },
 
@@ -209,16 +258,34 @@ HealthReporterState.prototype = Object.f
       return this.setLastPingDate(date);
     }
 
     this._log.info("Recording last ping time and deleted remote document.");
     this._s.lastPingTime = date.getTime();
     return this.removeRemoteIDs(ids);
   },
 
+  /**
+   * Reset the client ID to something else.
+   *
+   * This fails if remote IDs are stored because changing the client ID
+   * while there is remote data will create orphaned records.
+   */
+  resetClientID: function () {
+    if (this.remoteIDs.length) {
+      throw new Error("Cannot reset client ID while remote IDs are stored.");
+    }
+
+    this._log.warn("Resetting client ID.");
+    this._s.clientID = CommonUtils.generateUUID();
+    this._s.clientIDVersion = 1;
+
+    return this.save();
+  },
+
   _migratePrefs: function () {
     let prefs = this._reporter._prefs;
 
     let lastID = prefs.get("lastSubmitID", null);
     let lastPingDate = CommonUtils.getDatePref(prefs, "lastPingTime",
                                                0, this._log, OLDEST_ALLOWED_YEAR);
 
     // If we have state from prefs, migrate and save it to a file then clear
@@ -892,16 +959,18 @@ AbstractHealthReporter.prototype = Objec
 
     // May not be present if we are generating as a result of init error.
     if (this._providerManager) {
       yield this._providerManager.ensurePullOnlyProvidersRegistered();
     }
 
     let o = {
       version: 2,
+      clientID: this._state.clientID,
+      clientIDVersion: this._state.clientIDVersion,
       thisPingDate: pingDateString,
       geckoAppInfo: this.obtainAppInfo(this._log),
       data: {last: {}, days: {}},
     };
 
     let outputDataDays = o.data.days;
 
     // Guard here in case we don't track this (e.g., on Android).
@@ -1452,14 +1521,28 @@ this.HealthReporter.prototype = Object.f
       this._log.info("Received request to delete remote data but no data stored.");
       request.onNoDataAvailable();
       return;
     }
 
     this._log.warn("Deleting remote data.");
     let client = new BagheeraClient(this.serverURI);
 
-    return client.deleteDocument(this.serverNamespace, this.lastSubmitID)
-                 .then(this._onBagheeraResult.bind(this, request, true, this._now()),
-                       this._onSubmitDataRequestFailure.bind(this));
+    return Task.spawn(function* doDelete() {
+      try {
+        let result = yield client.deleteDocument(this.serverNamespace,
+                                                 this.lastSubmitID);
+        yield this._onBagheeraResult(request, true, this._now(), result);
+      } catch (ex) {
+        this._log.error("Error processing request to delete data: " +
+                        CommonUtils.exceptionStr(error));
+      } finally {
+        // If we don't have any remote documents left, nuke the ID.
+        // This is done for privacy reasons. Why preserve the ID if we
+        // don't need to?
+        if (!this.haveRemoteData()) {
+          yield this._state.resetClientID();
+        }
+      }
+    }.bind(this));
   },
 });
 
--- a/services/healthreport/tests/xpcshell/test_healthreporter.js
+++ b/services/healthreport/tests/xpcshell/test_healthreporter.js
@@ -105,16 +105,18 @@ add_task(function test_constructor() {
   let reporter = yield getReporter("constructor");
 
   try {
     do_check_eq(reporter.lastPingDate.getTime(), 0);
     do_check_null(reporter.lastSubmitID);
     do_check_eq(typeof(reporter._state), "object");
     do_check_eq(reporter._state.lastPingDate.getTime(), 0);
     do_check_eq(reporter._state.remoteIDs.length, 0);
+    do_check_eq(reporter._state.clientIDVersion, 1);
+    do_check_neq(reporter._state.clientID, null);
 
     let failed = false;
     try {
       new HealthReporter("foo.bar");
     } catch (ex) {
       failed = true;
       do_check_true(ex.message.startsWith("Branch must end"));
     } finally {
@@ -288,33 +290,40 @@ add_task(function test_remove_old_lastpa
   } finally {
     reporter._shutdown();
   }
 });
 
 add_task(function test_json_payload_simple() {
   let reporter = yield getReporter("json_payload_simple");
 
+  let clientID = reporter._state.clientID;
+  do_check_neq(clientID, null);
+
   try {
     let now = new Date();
     let payload = yield reporter.getJSONPayload();
     do_check_eq(typeof payload, "string");
     let original = JSON.parse(payload);
 
     do_check_eq(original.version, 2);
     do_check_eq(original.thisPingDate, reporter._formatDate(now));
+    do_check_eq(original.clientID, clientID);
+    do_check_eq(original.clientIDVersion, reporter._state.clientIDVersion);
+    do_check_eq(original.clientIDVersion, 1);
     do_check_eq(Object.keys(original.data.last).length, 0);
     do_check_eq(Object.keys(original.data.days).length, 0);
     do_check_false("notInitialized" in original);
 
     yield reporter._state.setLastPingDate(
       new Date(now.getTime() - 24 * 60 * 60 * 1000 - 10));
 
     original = JSON.parse(yield reporter.getJSONPayload());
     do_check_eq(original.lastPingDate, reporter._formatDate(reporter.lastPingDate));
+    do_check_eq(original.clientID, clientID);
 
     // This could fail if we cross UTC day boundaries at the exact instance the
     // test is executed. Let's tempt fate.
     do_check_eq(original.thisPingDate, reporter._formatDate(now));
 
     payload = yield reporter.getJSONPayload(true);
     do_check_eq(typeof payload, "object");
   } finally {
@@ -631,23 +640,25 @@ add_task(function test_data_submission_s
 
     let data = yield getHealthReportProviderValues(reporter, now);
     do_check_eq(data.continuationUploadAttempt, 1);
     do_check_eq(data.uploadSuccess, 1);
     do_check_eq(Object.keys(data).length, 3);
 
     let d = reporter.lastPingDate;
     let id = reporter.lastSubmitID;
+    let clientID = reporter._state.clientID;
 
     reporter._shutdown();
 
     // Ensure reloading state works.
     reporter = yield getReporter("data_submission_success");
     do_check_eq(reporter.lastSubmitID, id);
     do_check_eq(reporter.lastPingDate.getTime(), d.getTime());
+    do_check_eq(reporter._state.clientID, clientID);
 
     reporter._shutdown();
   } finally {
     yield shutdownServer(server);
   }
 });
 
 add_task(function test_recurring_daily_pings() {
@@ -698,24 +709,37 @@ add_task(function test_request_remote_da
     defineNow(policy, policy._futureDate(-24 * 60 * 60 * 1000));
     policy.recordUserAcceptance();
     defineNow(policy, policy.nextDataSubmissionDate);
     yield policy.checkStateAndTrigger();
     let id = reporter.lastSubmitID;
     do_check_neq(id, null);
     do_check_true(server.hasDocument(reporter.serverNamespace, id));
 
+    let clientID = reporter._state.clientID;
+    do_check_neq(clientID, null);
+
     defineNow(policy, policy._futureDate(10 * 1000));
 
     let promise = reporter.requestDeleteRemoteData();
     do_check_neq(promise, null);
     yield promise;
     do_check_null(reporter.lastSubmitID);
     do_check_false(reporter.haveRemoteData());
     do_check_false(server.hasDocument(reporter.serverNamespace, id));
+
+    // Client ID should be updated.
+    do_check_neq(reporter._state.clientID, null);
+    do_check_neq(reporter._state.clientID, clientID);
+    do_check_eq(reporter._state.clientIDVersion, 1);
+
+    // And it should be persisted to disk.
+    let o = yield CommonUtils.readJSON(reporter._state._filename);
+    do_check_eq(o.clientID, reporter._state.clientID);
+    do_check_eq(o.clientIDVersion, 1);
   } finally {
     reporter._shutdown();
     yield shutdownServer(server);
   }
 });
 
 add_task(function test_multiple_simultaneous_uploads() {
   let [reporter, server] = yield getReporterAndServer("multiple_simultaneous_uploads");
@@ -1091,8 +1115,83 @@ add_task(function test_state_downgrade_u
 
     let o = yield CommonUtils.readJSON(reporter._state._filename);
     do_check_eq(o.remoteIDs.length, 3);
     do_check_eq(o.lastPingTime, now.getTime() + 1000);
   } finally {
     reporter._shutdown();
   }
 });
+
+// Missing client ID in state should be created on state load.
+add_task(function* test_state_create_client_id() {
+  let reporter = getHealthReporter("state_create_client_id");
+
+  yield CommonUtils.writeJSON({
+    v: 1,
+    remoteIDs: ["id1", "id2"],
+    lastPingTime: Date.now(),
+    removeOutdatedLastPayload: true,
+  }, reporter._state._filename);
+
+  try {
+    yield reporter.init();
+
+    do_check_eq(reporter.lastSubmitID, "id1");
+    do_check_neq(reporter._state.clientID, null);
+    do_check_eq(reporter._state.clientID.length, 36);
+    do_check_eq(reporter._state.clientIDVersion, 1);
+
+    let clientID = reporter._state.clientID;
+
+    // The client ID should be persisted as soon as it is created.
+    reporter._shutdown();
+
+    reporter = getHealthReporter("state_create_client_id");
+    yield reporter.init();
+    do_check_eq(reporter._state.clientID, clientID);
+  } finally {
+    reporter._shutdown();
+  }
+});
+
+// Invalid stored client ID is reset automatically.
+add_task(function* test_empty_client_id() {
+  let reporter = getHealthReporter("state_empty_client_id");
+
+  yield CommonUtils.writeJSON({
+    v: 1,
+    clientID: "",
+    remoteIDs: ["id1", "id2"],
+    lastPingTime: Date.now(),
+    removeOutdatedLastPayload: true,
+  }, reporter._state._filename);
+
+  try {
+    yield reporter.init();
+
+    do_check_neq(reporter._state.clientID, null);
+    do_check_eq(reporter._state.clientID.length, 36);
+  } finally {
+    reporter._shutdown();
+  }
+});
+
+add_task(function* test_nonstring_client_id() {
+  let reporter = getHealthReporter("state_nonstring_client_id");
+
+  yield CommonUtils.writeJSON({
+    v: 1,
+    clientID: 42,
+    remoteIDs: ["id1", "id2"],
+    lastPingTime: Date.now(),
+    remoteOutdatedLastPayload: true,
+  }, reporter._state._filename);
+
+  try {
+    yield reporter.init();
+
+    do_check_neq(reporter._state.clientID, null);
+    do_check_eq(reporter._state.clientID.length, 36);
+  } finally {
+    reporter._shutdown();
+  }
+});