Bug 1274584 - [mozprocess] Fix IO Completion Port failed to signal process shutdown, r=jgriffin
authorAndrew Halberstadt <ahalberstadt@mozilla.com>
Mon, 30 May 2016 11:02:13 -0400
changeset 339214 fd75d39e98b629bfd05189d2c0b493cad17b3a2e
parent 339213 fe37010e2b84f4374bed42220615f30807efb781
child 339215 84a2a85ac27edf3b5ec8291fb3ded0cb31a4c907
push id6249
push userjlund@mozilla.com
push dateMon, 01 Aug 2016 13:59:36 +0000
treeherdermozilla-beta@bad9d4f5bf7e [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersjgriffin
bugs1274584
milestone49.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1274584 - [mozprocess] Fix IO Completion Port failed to signal process shutdown, r=jgriffin Sometimes the IO completion port doesn't shutdown child processes. When this happens, mozprocess will attempt to force kill the child processes manually. However, there is a bug here which causes the OSError to get raised. Although this fixes that bug, the original issue(s) which prevented the IOC port from signaling shutdown remain and are still undiagnosed. MozReview-Commit-ID: L3DQPW0Is5v
testing/mozbase/mozprocess/mozprocess/processhandler.py
--- a/testing/mozbase/mozprocess/mozprocess/processhandler.py
+++ b/testing/mozbase/mozprocess/mozprocess/processhandler.py
@@ -390,22 +390,24 @@ falling back to not using job objects fo
                     if countdowntokill != 0:
                         diff = datetime.now() - countdowntokill
                         # Arbitrarily wait 3 minutes for windows to get its act together
                         # Windows sometimes takes a small nap between notifying the
                         # IO Completion port and actually killing the children, and we
                         # don't want to mistake that situation for the situation of an unexpected
                         # parent abort (which is what we're looking for here).
                         if diff.seconds > self.MAX_IOCOMPLETION_PORT_NOTIFICATION_DELAY:
+                            print >> sys.stderr, "WARNING | IO Completion Port failed to signal process shutdown"
                             print >> sys.stderr, "Parent process %s exited with children alive:" % self.pid
                             print >> sys.stderr, "PIDS: %s" %  ', '.join([str(i) for i in self._spawned_procs])
-                            print >> sys.stderr, "Attempting to kill them..."
+                            print >> sys.stderr, "Attempting to kill them, but no guarantee of success"
 
                             self.kill()
                             self._process_events.put({self.pid: 'FINISHED'})
+                            break
 
                     if not portstatus:
                         # Check to see what happened
                         errcode = winprocess.GetLastError()
                         if errcode == winprocess.ERROR_ABANDONED_WAIT_0:
                             # Then something has killed the port, break the loop
                             print >> sys.stderr, "IO Completion Port unexpectedly closed"
                             self._process_events.put({self.pid: 'FINISHED'})
@@ -467,17 +469,17 @@ falling back to not using job objects fo
                     self.returncode = winprocess.GetExitCodeProcess(self._handle)
                 else:
                     # Dude, the process is like totally dead!
                     return self.returncode
 
                 threadalive = False
                 if hasattr(self, "_procmgrthread"):
                     threadalive = self._procmgrthread.is_alive()
-                if self._job and threadalive:
+                if self._job and threadalive and threading.current_thread() != self._procmgrthread:
                     self.debug("waiting with IO completion port")
                     # Then we are managing with IO Completion Ports
                     # wait on a signal so we know when we have seen the last
                     # process come through.
                     # We use queues to synchronize between the thread and this
                     # function because events just didn't have robust enough error
                     # handling on pre-2.7 versions
                     err = None