Bug 1401883 - don't hold unnecessary references to the Windows app shell; r=jimm
authorTing-Yu Chou <janus926@gmail.com>
Wed, 06 Jun 2018 11:05:18 -0400
changeset 421588 5fd3144d5d009fdac3fae2ee33143901164e0d12
parent 421587 2491b63fe79aa6113a77c49f2fa9f7c2ed7e59ff
child 421589 7b760c430347f138d9c5105e70a67a56a3b595a8
push id104070
push usernfroyd@mozilla.com
push dateWed, 06 Jun 2018 15:08:37 +0000
treeherdermozilla-inbound@5fd3144d5d00 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
bugs1401883, 1220517
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1401883 - don't hold unnecessary references to the Windows app shell; r=jimm When runnables are posted to the main thread's event loop, the event loop notifies any thread observers that this has been done. The app shell registers itself as just such a runnable, and posts messages to the native event loop that processing native events needs to be done. On Windows, this posting takes an extra reference to the app shell, to keep it alive until the message is processed by the native event loop, since app shell code needs to be invoked during that processing. The processing releases this extra reference, so everything stays balanced. Except that it's possible for messages to be posted to the native event loop, and then browser shutdown happens. Those messages are not processed and the associated references taken are not released. This imbalance means that in debug builds, we appear to be leaking the app shell, and that leaking results in intermittent oranges. This intermittent orange has manifested itself in a variety of ways over the years, depending on how big the app shell itself was (since that changes the number of bytes leaked) and how many leak-checked things the app shell was holding on to. This bug is merely the latest manifestation; the last serious work on analyzing the phenomenon and fixing it was done in bug 1220517. The solution proposed in that bug was that we simply stop the extra reference counting; when the app shell is destroyed normally, we shouldn't be processing the native event loop any more anyway. So even if the native event loop is holding (freed) pointers to the app shell, we'd never execute the callback and perform a use-after-free. Reading through the code suggests that this *ought* to be the case, but the potential for shooting ourselves in the foot seems awfully high. In any event, this is not a problem unique to Windows; we have seen this same sort of thing happen on OS X. See nsAppShell::ProcessGeckoEvents in widget/cocoa/nsAppShell.mm. Here we propose a slightly different solution: we keep track of the number of native event callbacks we have scheduled, incrementing when we schedule, and decrementing when we actually run one. When the app shell is destroyed, we simply set the number of outstanding events to zero, and we prohibit the callback from accessing the app shell if there are no outstanding events. This solution is analogous to dropping the extra reference counting, but avoids potential badness if we do wind up processing native events after the app shell is destroyed.
--- a/widget/windows/nsAppShell.cpp
+++ b/widget/windows/nsAppShell.cpp
@@ -21,16 +21,17 @@
 #include "mozilla/StaticPtr.h"
 #include "nsTHashtable.h"
 #include "nsHashKeys.h"
 #include "GeckoProfiler.h"
 #include "nsComponentManagerUtils.h"
 #include "ScreenHelperWin.h"
 #include "HeadlessScreenHelper.h"
 #include "mozilla/widget/ScreenManager.h"
+#include "mozilla/Atomics.h"
 #if defined(ACCESSIBILITY)
 #include "mozilla/a11y/Compatibility.h"
 #include "mozilla/a11y/Platform.h"
 #endif // defined(ACCESSIBILITY)
 // These are two messages that the code in winspool.drv on Windows 7 explicitly
 // waits for while it is pumping other Windows messages, during display of the
@@ -180,36 +181,49 @@ namespace crashreporter {
 void LSPAnnotate();
 } // namespace crashreporter
 } // namespace mozilla
 using mozilla::crashreporter::LSPAnnotate;
+// Note that since we're on x86-ish processors here, ReleaseAcquire is the
+// semantics that normal loads and stores would use anyway.
+static Atomic<size_t, ReleaseAcquire> sOutstandingNativeEventCallbacks;
 nsAppShell::EventWindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
   if (uMsg == sAppShellGeckoMsgId) {
+    // The app shell might have been destroyed between this message being
+    // posted and being executed, so be extra careful.
+    if (!sOutstandingNativeEventCallbacks) {
+      return TRUE;
+    }
     nsAppShell *as = reinterpret_cast<nsAppShell *>(lParam);
-    NS_RELEASE(as);
+    --sOutstandingNativeEventCallbacks;
     return TRUE;
   return DefWindowProc(hwnd, uMsg, wParam, lParam);
   if (mEventWnd) {
     // DestroyWindow doesn't do anything when called from a non UI thread.
     // Since mEventWnd was created on the UI thread, it must be destroyed on
     // the UI thread.
     SendMessage(mEventWnd, WM_CLOSE, 0, 0);
+  // Cancel any outstanding native event callbacks.
+  sOutstandingNativeEventCallbacks = 0;
 #if defined(ACCESSIBILITY)
 static ULONG gUiaMsg;
 static HHOOK gUiaHook;
 static uint32_t gUiaAttempts;
 static const uint32_t kMaxUiaAttempts = 5;
@@ -458,17 +472,17 @@ nsAppShell::DoProcessMoreGeckoEvents()
              "We should have created mEventWnd in Init, if this is called.");
   // Post a message to the hidden message window
-  NS_ADDREF_THIS(); // will be released when the event is processed
+  ++sOutstandingNativeEventCallbacks;
     MutexAutoLock lock(mLastNativeEventScheduledMutex);
     // Time stamp this event so we can detect cases where the event gets
     // dropping in sub classes / modal loops we do not control.
     mLastNativeEventScheduled = TimeStamp::NowLoRes();
   ::PostMessage(mEventWnd, sAppShellGeckoMsgId, 0, reinterpret_cast<LPARAM>(this));