linux: Fix restarting by waiting for sockets to be closed (#11488)

This fixes a race-condition that showed up when trying to restart
Nightly/Preview/...

When running with these release channels, Zed tries to ensure that
there's only one instance of Zed running.

It does that by listening on a TCP socket to which other instances can
connect on start. If the other instance receives a message, it knows
that another Zed instance is running and exits.

On Linux, though, we ran into a race condition:

1. `kill -0`, which checks whether a process is still running, returns
an error, signalling that the old Zed process has exited
2. BUT: the process was still listening on the TCP port.

It seems like that on Linux, process resources aren't guaranteed to be
cleaned up as soon as signal handling stops working for a process.

The fix is to wait until the process is no longer listening on any TCP
sockets.

There's a slight downside to this: GPUI processes that never listen on
any TCP sockets now have to pay the cost of an additional `lsof` call
when restarting. We do think that it's a reasonable tradeoff for now
though, since the other options (extending the platform interface to
provide callbacks, sharing the listening port in the framework, ...)
seem wider-reaching only to fix a very local bug.



Release Notes:

- N/A

Co-authored-by: Bennet <bennetbo@gmx.de>
This commit is contained in:
Thorsten Ball 2024-05-07 15:46:41 +02:00 committed by GitHub
parent 0c11d841e8
commit 5a7b8f7fe3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -150,12 +150,22 @@ impl<P: LinuxClient + 'static> Platform for P {
}
};
// script to wait for the current process to exit and then restart the app
log::info!("Restarting process, using app path: {:?}", app_path);
// Script to wait for the current process to exit and then restart the app.
// We also wait for possibly open TCP sockets by the process to be closed,
// since on Linux it's not guaranteed that a process' resources have been
// cleaned up when `kill -0` returns.
let script = format!(
r#"
while kill -O {pid} 2>/dev/null; do
sleep 0.1
done
while lsof -nP -iTCP -a -p {pid} 2>/dev/null; do
sleep 0.1
done
{app_path}
"#,
pid = app_pid,