Fix potential race-condition in DisplayLink::drop on macOS (#32116)

Fix a segfault in CVDisplayLink

We see 1-2 crashes a day on macOS on the `CVDisplayLink` thread.

```
Segmentation fault: 11 on thread 9325960 (CVDisplayLink)
CoreVideo	CVHWTime::reset()
CoreVideo	CVXTime::reset()
CoreVideo	CVDisplayLink::runIOThread()
libsystem_pthread.dylib	_pthread_start
libsystem_pthread.dylib	thread_start
```

With the help of the Zed AI, I dove into the crash report, which looks
like this:

```
Crashed Thread:        49  CVDisplayLink

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x00000000000001f6
Exception Codes:       0x0000000000000001, 0x00000000000001f6

Thread 49 Crashed:: CVDisplayLink
0   CoreVideo                     	       0x18c1ed994 CVHWTime::reset() + 64
1   CoreVideo                     	       0x18c1ee474 CVXTime::reset() + 52
2   CoreVideo                     	       0x18c1ee198 CVDisplayLink::runIOThread() + 176
3   libsystem_pthread.dylib       	       0x18285ac0c _pthread_start + 136
4   libsystem_pthread.dylib       	       0x182855b80 thread_start + 8

Thread 49 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000000   x1: 0x000000018c206e08   x2: 0x0000002c00001513   x3: 0x0001d4630002a433
    x4: 0x00000e2100000000   x5: 0x0001d46300000000   x6: 0x000000000000002c   x7: 0x0000000000000000
    x8: 0x000000000000002e   x9: 0x000000004d555458  x10: 0x0000000000000000  x11: 0x0000000000000000
   x12: 0x0000000000000000  x13: 0x0000000000000000  x14: 0x0000000000000000  x15: 0x0000000000000000
   x16: 0x0000000182856a9c  x17: 0x00000001f19bc540  x18: 0x0000000000000000  x19: 0x0000600003c56ed8
   x20: 0x000000000002a433  x21: 0x0000000000000000  x22: 0x0000000000000000  x23: 0x0000000000000000
   x24: 0x0000000000000000  x25: 0x0000000000000000  x26: 0x0000000000000000  x27: 0x0000000000000000
   x28: 0x0000000000000000   fp: 0x000000016b02ade0   lr: 0x000000018c1ed984
    sp: 0x000000016b02adc0   pc: 0x000000018c1ed994 cpsr: 0x80001000
   far: 0x00000000000001f6  esr: 0x92000006 (Data Abort) byte read Translation fault

Binary Images:
       0x1828c9000 -        0x182e07fff com.apple.CoreFoundation (6.9) <df489a59-b4f6-32b8-9bb4-9b832960aa52> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
```

Using lldb to disassemble `CVHWTime::reset()` (and the AI to interpret
it), the crash is caused by dereferencing the pointer at the start of
the CVHWTime struct + 0x1c8. In this case the pointer has (the clearly
nonsense) value 0x2e (and 0x2e + 0x1c8 = 0x1f6, the failing address).

As to how this could happen...

Looking at the implementation of `CVDisplayLinkRelease`, it calls
straight into `CFRelease` on the main thread; and so it is not safe to
call `CVDisplayLinkRelease` concurrently with other threads that access
the CVDisplayLink. While we already stopped the display link, it turns
out that `CVDisplayLinkStop` just sets a flag on the struct to instruct
the io-thread to exit "soon", and returns immediately. That means we
don't know when the other thread will actually exit, and so we can't
safely call `CVDisplayLinkRelease`.

So, for now, we just leak these objects. They should be created
relatively infrequently (when the app is foregrounded/backgrounded), so
I don't think this is a huge problem.

Release Notes:

- Fix a rare crash on macOS when putting the app in the background.
This commit is contained in:
Conrad Irwin 2025-06-04 17:10:27 -06:00 committed by GitHub
parent 7c64737e00
commit a2e98e9f0e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -12,7 +12,7 @@ use std::ffi::c_void;
use util::ResultExt;
pub struct DisplayLink {
display_link: sys::DisplayLink,
display_link: Option<sys::DisplayLink>,
frame_requests: dispatch_source_t,
}
@ -59,7 +59,7 @@ impl DisplayLink {
)?;
Ok(Self {
display_link,
display_link: Some(display_link),
frame_requests,
})
}
@ -70,7 +70,7 @@ impl DisplayLink {
dispatch_resume(crate::dispatch_sys::dispatch_object_t {
_ds: self.frame_requests,
});
self.display_link.start()?;
self.display_link.as_mut().unwrap().start()?;
}
Ok(())
}
@ -80,7 +80,7 @@ impl DisplayLink {
dispatch_suspend(crate::dispatch_sys::dispatch_object_t {
_ds: self.frame_requests,
});
self.display_link.stop()?;
self.display_link.as_mut().unwrap().stop()?;
}
Ok(())
}
@ -89,6 +89,14 @@ impl DisplayLink {
impl Drop for DisplayLink {
fn drop(&mut self) {
self.stop().log_err();
// We see occasional segfaults on the CVDisplayLink thread.
//
// It seems possible that this happens because CVDisplayLinkRelease releases the CVDisplayLink
// on the main thread immediately, but the background thread that CVDisplayLink uses for timers
// is still accessing it.
//
// We might also want to upgrade to CADisplayLink, but that requires dropping old macOS support.
std::mem::forget(self.display_link.take());
unsafe {
dispatch_source_cancel(self.frame_requests);
}