Kb/wasm panics (#35319)

Follow-up of https://github.com/zed-industries/zed/pull/34208
Closes https://github.com/zed-industries/zed/issues/35185

Previous code assumed that extensions' language server wrappers may leak
only in static data (e.g. fields that were not cleared on deinit), but
we seem to have a race that breaks this assumption.

1. We do clean `all_lsp_adapters` field after
https://github.com/zed-industries/zed/pull/34334 and it's called for
every extension that is unregistered.
2. `LspStore::maintain_workspace_config` ->
`LspStore::refresh_workspace_configurations` chain is triggered
independently, apparently on `ToolchainStoreEvent::ToolchainActivated`
event which means somewhere behind there's potentially a Python code
that gets executed to activate the toolchian, making
`refresh_workspace_configurations` start timings unpredictable.
3. Seems that toolchain activation overlaps with plugin reload, as 
`2025-07-28T12:16:19+03:00 INFO [extension_host] extensions updated.
loading 0, reloading 1, unloading 0` suggests in the issue logs.

The plugin reload seem to happen faster than workspace configuration
refresh in


c65da547c9/crates/project/src/lsp_store.rs (L7426-L7456)

as the language servers are just starting and take extra time to respond
to the notification.

At least one of the `.clone()`d `adapter`s there is the adapter that got
removed during plugin reload and has its channel closed, which causes a
panic later.

----------------------------

A good fix would be to re-architect the workspace refresh approach, same
as other accesses to the language server collections.
One way could be to use `Weak`-based structures instead, as definitely
the extension server data belongs to extension, not the `LspStore`.
This is quite a large undertaking near the extension core though, so is
not done yet.

Currently, to stop the excessive panics, no more `.expect` is done on
the channel result, as indeed, it now can be closed very dynamically.
This will result in more errors (and backtraces, presumably) printed in
the logs and no panics.

More logging and comments are added, and workspace querying is replaced
to the concurrent one: no need to wait until a previous server had
processed the notification to send the same to the next one.

Release Notes:

- Fixed warm-related panic happening during startup
This commit is contained in:
Kirill Bulatov 2025-07-30 12:18:26 +03:00 committed by GitHub
parent 7be1f2418d
commit 49b75e9e93
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 69 additions and 55 deletions

View file

@ -7393,21 +7393,23 @@ impl LspStore {
}
pub(crate) async fn refresh_workspace_configurations(
this: &WeakEntity<Self>,
lsp_store: &WeakEntity<Self>,
fs: Arc<dyn Fs>,
cx: &mut AsyncApp,
) {
maybe!(async move {
let servers = this
.update(cx, |this, cx| {
let Some(local) = this.as_local() else {
let mut refreshed_servers = HashSet::default();
let servers = lsp_store
.update(cx, |lsp_store, cx| {
let toolchain_store = lsp_store.toolchain_store(cx);
let Some(local) = lsp_store.as_local() else {
return Vec::default();
};
local
.language_server_ids
.iter()
.flat_map(|((worktree_id, _), server_ids)| {
let worktree = this
let worktree = lsp_store
.worktree_store
.read(cx)
.worktree_for_id(*worktree_id, cx);
@ -7423,43 +7425,54 @@ impl LspStore {
)
});
server_ids.iter().filter_map(move |server_id| {
let fs = fs.clone();
let toolchain_store = toolchain_store.clone();
server_ids.iter().filter_map(|server_id| {
let delegate = delegate.clone()? as Arc<dyn LspAdapterDelegate>;
let states = local.language_servers.get(server_id)?;
match states {
LanguageServerState::Starting { .. } => None,
LanguageServerState::Running {
adapter, server, ..
} => Some((
adapter.adapter.clone(),
server.clone(),
delegate.clone()? as Arc<dyn LspAdapterDelegate>,
)),
} => {
let fs = fs.clone();
let toolchain_store = toolchain_store.clone();
let adapter = adapter.clone();
let server = server.clone();
refreshed_servers.insert(server.name());
Some(cx.spawn(async move |_, cx| {
let settings =
LocalLspStore::workspace_configuration_for_adapter(
adapter.adapter.clone(),
fs.as_ref(),
&delegate,
toolchain_store,
cx,
)
.await
.ok()?;
server
.notify::<lsp::notification::DidChangeConfiguration>(
&lsp::DidChangeConfigurationParams { settings },
)
.ok()?;
Some(())
}))
}
}
})
}).collect::<Vec<_>>()
})
.collect::<Vec<_>>()
})
.ok()?;
let toolchain_store = this.update(cx, |this, cx| this.toolchain_store(cx)).ok()?;
for (adapter, server, delegate) in servers {
let settings = LocalLspStore::workspace_configuration_for_adapter(
adapter,
fs.as_ref(),
&delegate,
toolchain_store.clone(),
cx,
)
.await
.ok()?;
server
.notify::<lsp::notification::DidChangeConfiguration>(
&lsp::DidChangeConfigurationParams { settings },
)
.ok();
}
log::info!("Refreshing workspace configurations for servers {refreshed_servers:?}");
// TODO this asynchronous job runs concurrently with extension (de)registration and may take enough time for a certain extension
// to stop and unregister its language server wrapper.
// This is racy : an extension might have already removed all `local.language_servers` state, but here we `.clone()` and hold onto it anyway.
// This now causes errors in the logs, we should find a way to remove such servers from the processing everywhere.
let _: Vec<Option<()>> = join_all(servers).await;
Some(())
})
.await;