Bug 1401883 - Don't hold unnecessary references to the Windows app shell. r=jimm, a=RyanVM
authorTing-Yu Chou <janus926@gmail.com>
Wed, 06 Jun 2018 11:05:18 -0400
changeset 473729 8a24c581a1820b0619f34a5f2f6fd240d83cb3f5
parent 473728 05f93f4378c67c9a3e608edf11c6ef6dde18e091
child 473730 159957bbef8f0c8c4adcc461d3fd25a3dbf19b74
push id1728
push userjlund@mozilla.com
push dateMon, 18 Jun 2018 21:12:27 +0000
treeherdermozilla-release@c296fde26f5f [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersjimm, RyanVM
bugs1401883, 1220517
Bug 1401883 - Don't hold unnecessary references to the Windows app shell. r=jimm, a=RyanVM When runnables are posted to the main thread's event loop, the event loop notifies any thread observers that this has been done. The app shell registers itself as just such a runnable, and posts messages to the native event loop that processing native events needs to be done. On Windows, this posting takes an extra reference to the app shell, to keep it alive until the message is processed by the native event loop, since app shell code needs to be invoked during that processing. The processing releases this extra reference, so everything stays balanced. Except that it's possible for messages to be posted to the native event loop, and then browser shutdown happens. Those messages are not processed and the associated references taken are not released. This imbalance means that in debug builds, we appear to be leaking the app shell, and that leaking results in intermittent oranges. This intermittent orange has manifested itself in a variety of ways over the years, depending on how big the app shell itself was (since that changes the number of bytes leaked) and how many leak-checked things the app shell was holding on to. This bug is merely the latest manifestation; the last serious work on analyzing the phenomenon and fixing it was done in bug 1220517. The solution proposed in that bug was that we simply stop the extra reference counting; when the app shell is destroyed normally, we shouldn't be processing the native event loop any more anyway. So even if the native event loop is holding (freed) pointers to the app shell, we'd never execute the callback and perform a use-after-free. Reading through the code suggests that this *ought* to be the case, but the potential for shooting ourselves in the foot seems awfully high. In any event, this is not a problem unique to Windows; we have seen this same sort of thing happen on OS X. See nsAppShell::ProcessGeckoEvents in widget/cocoa/nsAppShell.mm. Here we propose a slightly different solution: we keep track of the number of native event callbacks we have scheduled, incrementing when we schedule, and decrementing when we actually run one. When the app shell is destroyed, we simply set the number of outstanding events to zero, and we prohibit the callback from accessing the app shell if there are no outstanding events. This solution is analogous to dropping the extra reference counting, but avoids potential badness if we do wind up processing native events after the app shell is destroyed.
--- a/widget/windows/nsAppShell.cpp
+++ b/widget/windows/nsAppShell.cpp
@@ -21,16 +21,17 @@
 #include "mozilla/StaticPtr.h"
 #include "nsTHashtable.h"
 #include "nsHashKeys.h"
 #include "GeckoProfiler.h"
 #include "nsComponentManagerUtils.h"
 #include "ScreenHelperWin.h"
 #include "HeadlessScreenHelper.h"
 #include "mozilla/widget/ScreenManager.h"
+#include "mozilla/Atomics.h"
 #if defined(ACCESSIBILITY)
 #include "mozilla/a11y/Compatibility.h"
 #include "mozilla/a11y/Platform.h"
 #endif // defined(ACCESSIBILITY)
 // These are two messages that the code in winspool.drv on Windows 7 explicitly
 // waits for while it is pumping other Windows messages, during display of the
@@ -180,36 +181,49 @@ namespace crashreporter {
 void LSPAnnotate();
 } // namespace crashreporter
 } // namespace mozilla
 using mozilla::crashreporter::LSPAnnotate;
+// Note that since we're on x86-ish processors here, ReleaseAcquire is the
+// semantics that normal loads and stores would use anyway.
+static Atomic<size_t, ReleaseAcquire> sOutstandingNativeEventCallbacks;
 nsAppShell::EventWindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
   if (uMsg == sAppShellGeckoMsgId) {
+    // The app shell might have been destroyed between this message being
+    // posted and being executed, so be extra careful.
+    if (!sOutstandingNativeEventCallbacks) {
+      return TRUE;
+    }
     nsAppShell *as = reinterpret_cast<nsAppShell *>(lParam);
-    NS_RELEASE(as);
+    --sOutstandingNativeEventCallbacks;
     return TRUE;
   return DefWindowProc(hwnd, uMsg, wParam, lParam);
   if (mEventWnd) {
     // DestroyWindow doesn't do anything when called from a non UI thread.
     // Since mEventWnd was created on the UI thread, it must be destroyed on
     // the UI thread.
     SendMessage(mEventWnd, WM_CLOSE, 0, 0);
+  // Cancel any outstanding native event callbacks.
+  sOutstandingNativeEventCallbacks = 0;
 #if defined(ACCESSIBILITY)
 static ULONG gUiaMsg;
 static HHOOK gUiaHook;
 static uint32_t gUiaAttempts;
 static const uint32_t kMaxUiaAttempts = 5;
@@ -458,17 +472,17 @@ nsAppShell::DoProcessMoreGeckoEvents()
              "We should have created mEventWnd in Init, if this is called.");
   // Post a message to the hidden message window
-  NS_ADDREF_THIS(); // will be released when the event is processed
+  ++sOutstandingNativeEventCallbacks;
     MutexAutoLock lock(mLastNativeEventScheduledMutex);
     // Time stamp this event so we can detect cases where the event gets
     // dropping in sub classes / modal loops we do not control.
     mLastNativeEventScheduled = TimeStamp::NowLoRes();
   ::PostMessage(mEventWnd, sAppShellGeckoMsgId, 0, reinterpret_cast<LPARAM>(this));