author | Nicholas Nethercote <nnethercote@mozilla.com> |
Mon, 25 Aug 2014 17:43:57 -0700 | |
changeset 201816 | 52b9465cc90b3141a46cd05daf52c7eded1c328d |
parent 201815 | 9ebed11d8a2bf847913e14be0e9f33a6c3421040 |
child 201817 | 857d543db683db4217da6d5eab55cad73a50fc2b |
push id | 48265 |
push user | nnethercote@mozilla.com |
push date | Wed, 27 Aug 2014 05:20:30 +0000 |
treeherder | mozilla-inbound@52b9465cc90b [default view] [failures only] |
perfherder | [talos] [build metrics] [platform microbench] (compared to previous push) |
reviewers | roc |
bugs | 1058335 |
milestone | 34.0a1 |
first release with | nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
|
last release without | nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
|
xpcom/glue/pldhash.cpp | file | annotate | diff | comparison | revisions | |
xpcom/glue/pldhash.h | file | annotate | diff | comparison | revisions |
--- a/xpcom/glue/pldhash.cpp +++ b/xpcom/glue/pldhash.cpp @@ -238,26 +238,16 @@ MinCapacity(uint32_t aLength) { return (aLength * 4 + (3 - 1)) / 3; // == ceil(aLength * 4 / 3) } MOZ_ALWAYS_INLINE bool PLDHashTable::Init(const PLDHashTableOps* aOps, void* aData, uint32_t aEntrySize, const fallible_t&, uint32_t aLength) { -#ifdef DEBUG - if (aEntrySize > 16 * sizeof(void*)) { - printf_stderr( - "pldhash: for the table at address %p, the given aEntrySize" - " of %lu definitely favors chaining over double hashing.\n", - (void*)this, - (unsigned long) aEntrySize); - } -#endif - if (aLength > PL_DHASH_MAX_INITIAL_LENGTH) { return false; } ops = aOps; data = aData; // Compute the smallest capacity allowing |aLength| elements to be inserted
--- a/xpcom/glue/pldhash.h +++ b/xpcom/glue/pldhash.h @@ -162,82 +162,24 @@ typedef PLDHashOperator (*PLDHashEnumera typedef size_t (*PLDHashSizeOfEntryExcludingThisFun)( PLDHashEntryHdr* aHdr, mozilla::MallocSizeOf aMallocSizeOf, void* aArg); /* * A PLDHashTable is currently 8 words (without the PL_DHASHMETER overhead) * on most architectures, and may be allocated on the stack or within another * structure or class (see below for the Init and Finish functions to use). * - * To decide whether to use double hashing vs. chaining, we need to develop a - * trade-off relation, as follows: - * - * Let alpha be the load factor, esize the entry size in words, count the - * entry count, and pow2 the power-of-two table size in entries. - * - * (PLDHashTable overhead) > (PLHashTable overhead) - * (unused table entry space) > (malloc and .next overhead per entry) + - * (buckets overhead) - * (1 - alpha) * esize * pow2 > 2 * count + pow2 - * - * Notice that alpha is by definition (count / pow2): - * - * (1 - alpha) * esize * pow2 > 2 * alpha * pow2 + pow2 - * (1 - alpha) * esize > 2 * alpha + 1 - * - * esize > (1 + 2 * alpha) / (1 - alpha) - * - * This assumes both tables must keep keyHash, key, and value for each entry, - * where key and value point to separately allocated strings or structures. - * If key and value can be combined into one pointer, then the trade-off is: - * - * esize > (1 + 3 * alpha) / (1 - alpha) - * - * If the entry value can be a subtype of PLDHashEntryHdr, rather than a type - * that must be allocated separately and referenced by an entry.value pointer - * member, and provided key's allocation can be fused with its entry's, then - * k (the words wasted per entry with chaining) is 4. - * - * To see these curves, feed gnuplot input like so: - * - * gnuplot> f(x,k) = (1 + k * x) / (1 - x) - * gnuplot> plot [0:.75] f(x,2), f(x,3), f(x,4) - * - * For k of 2 and a well-loaded table (alpha > .5), esize must be more than 4 - * words for chaining to be more space-efficient than double hashing. - * - * Solving for alpha helps us decide when to shrink an underloaded table: - * - * esize > (1 + k * alpha) / (1 - alpha) - * esize - alpha * esize > 1 + k * alpha - * esize - 1 > (k + esize) * alpha - * (esize - 1) / (k + esize) > alpha - * - * alpha < (esize - 1) / (esize + k) - * - * Therefore double hashing should keep alpha >= (esize - 1) / (esize + k), - * assuming esize is not too large (in which case, chaining should probably be - * used for any alpha). For esize=2 and k=3, we want alpha >= .2; for esize=3 - * and k=2, we want alpha >= .4. For k=4, esize could be 6, and alpha >= .5 - * would still obtain. - * - * The current implementation uses a lower bound of 0.25 for alpha when - * deciding whether to shrink the table (while still respecting - * PL_DHASH_MIN_CAPACITY). - * - * Note a qualitative difference between chaining and double hashing: under - * chaining, entry addresses are stable across table shrinks and grows. With - * double hashing, you can't safely hold an entry pointer and use it after an - * ADD or REMOVE operation, unless you sample aTable->mGeneration before adding - * or removing, and compare the sample after, dereferencing the entry pointer - * only if aTable->mGeneration has not changed. - * - * The moral of this story: there is no one-size-fits-all hash table scheme, - * but for small table entry size, and assuming entry address stability is not - * required, double hashing wins. + * There used to be a long, math-heavy comment here about the merits of + * double hashing vs. chaining; it was removed in bug 1058335. In short, double + * hashing is more space-efficient unless the element size gets large (in which + * case you should keep using double hashing but switch to using pointer + * elements). Also, with double hashing, you can't safely hold an entry pointer + * and use it after an ADD or REMOVE operation, unless you sample + * aTable->mGeneration before adding or removing, and compare the sample after, + * dereferencing the entry pointer only if aTable->mGeneration has not changed. */ struct PLDHashTable { /* * Virtual operations; see below. This field is public because it's commonly * zeroed to indicate that a table is no longer live. */ const PLDHashTableOps* ops;