Commit graph

111 commits

Author SHA1 Message Date
Oleksiy Syvokon
d87603dd60
agent: Send stale file notifications using the project_notifications tool (#34005)
This commit introduces the `project_notifications` tool, which
proactively pushes notifications to the agent.

Unlike other tools, `Thread` automatically invokes this tool on every
turn, even when the LLM doesn't ask for it. When notifications are
available, the tool use and results are inserted into the thread,
simulating an LLM tool call.

As with other tools, users can disable `project_notifications` in
Profiles if they do not want them.

Currently, the tool only notifies users about stale files: that is,
files that have been edited by the user while the agent is also working
on them. In the future, notifications may be expanded to include
compiler diagnostics, long-running processes, and more.

Release Notes:

- Added `project_notifications` tool
2025-07-07 19:48:18 +03:00
Julia Ryan
0068de0386
debugger: Handle the envFile setting for Go (#33666)
Fixes #32984

Release Notes:

- The Go debugger now respects the `envFile` setting.
2025-07-01 09:14:59 -07:00
Michael Sloan
d497f52e17
agent: Improve error handling and retry for zed-provided models (#33565)
* Updates to `zed_llm_client-0.8.5` which adds support for `retry_after`
when anthropic provides it.

* Distinguishes upstream provider errors and rate limits from errors
that originate from zed's servers

* Moves `LanguageModelCompletionError::BadInputJson` to
`LanguageModelCompletionEvent::ToolUseJsonParseError`. While arguably
this is an error case, the logic in thread is cleaner with this move.
There is also precedent for inclusion of errors in the event type -
`CompletionRequestStatus::Failed` is how cloud errors arrive.

* Updates `PROVIDER_ID` / `PROVIDER_NAME` constants to use proper types
instead of `&str`, since they can be constructed in a const fashion.

* Removes use of `CLIENT_SUPPORTS_EXA_WEB_SEARCH_PROVIDER_HEADER_NAME`
as the server no longer reads this header and just defaults to that
behavior.

Release notes for this is covered by #33275

Release Notes:

- N/A

---------

Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Richard <richard@zed.dev>
2025-06-30 21:01:32 -06:00
Richard Feldman
6073d2c93c
Automatically retry when API is Overloaded or 500s (#33275)
<img width="484" alt="Screenshot 2025-06-25 at 2 26 16 PM"
src="https://github.com/user-attachments/assets/340f15d7-b115-4895-bae8-b12a915bfda1"
/>

<img width="460" alt="Screenshot 2025-06-25 at 2 26 08 PM"
src="https://github.com/user-attachments/assets/6e587a38-d542-405f-809f-402e87520538"
/>

Now we:
* Automatically retry up to 3 times on upstream Overloaded or 500 errors
(currently for Anthropic only; will add others in future PRs)
* Also automatically retry on rate limit errors (using the provided
duration to wait, if we were given one)
* Give you a notification if you don't have Zed open and we stopped the
thread because of an error

Still todo in future PRs:
* Update collab to report Overloaded and 500 errors differently if
collab itself is passing through an upstream error vs not (currently we
report these as "Zed's API is overloaded" when actually it's the
upstream one!)
* Updating providers other than Anthropic to categorize their errors so
that they benefit from this
* Expanding graceful error handling/retry to other things besides
Overloaded and 500 errors (e.g. connection reset)

Release Notes:

- Automatically retry in Agent Panel instead of erroring out when an
upstream AI API is overloaded or 500s
- Show a notification when an Agent thread errors out and Zed is not the
active window
2025-06-26 10:53:33 -04:00
Bennet Bo Fenner
224de2ec6c
settings: Remove version fields (#33372)
This cleans up our settings to not include any `version` fields, as we
have an actual settings migrator now.

This PR removes `language_models > anthropic > version`,
`language_models > openai > version` and `agent > version`.

We had migration paths in the code for a long time, so in practice
almost everyone should be using the latest version of these settings.


Release Notes:

- Remove `version` fields in settings for `agent`, `language_models >
anthropic`, `language_models > openai`. Your settings will automatically
be migrated. If you're running into issues with this open an issue
[here](https://github.com/zed-industries/zed/issues)
2025-06-25 19:05:29 +02:00
Bennet Bo Fenner
7be57baef0
agent: Fix issue with Anthropic thinking models (#33317)
cc @osyvokon 

We were seeing a bunch of errors in our backend when people were using
Claude models with thinking enabled.

In the logs we would see
> an error occurred while interacting with the Anthropic API:
invalid_request_error: messages.x.content.0.type: Expected `thinking` or
`redacted_thinking`, but found `text`. When `thinking` is enabled, a
final `assistant` message must start with a thinking block (preceeding
the lastmost set of `tool_use` and `tool_result` blocks). We recommend
you include thinking blocks from previous turns. To avoid this
requirement, disable `thinking`. Please consult our documentation at
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

However, this issue did not occur frequently and was not easily
reproducible. Turns out it was triggered by us not correctly handling
[Redacted Thinking
Blocks](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#thinking-redaction).

I could constantly reproduce this issue by including this magic string:
`ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
` in the request, which forces `claude-3-7-sonnet` to emit redacted
thinking blocks (confusingly the magic string does not seem to be
working for `claude-sonnet-4`). As soon as we hit a tool call Anthropic
would return an error.

Thanks to @osyvokon for pointing me in the right direction 😄!


Release Notes:

- agent: Fixed an issue where Anthropic models would sometimes return an
error when thinking was enabled
2025-06-24 16:23:59 +00:00
Max Brunsfeld
2283ec5de2
Extract an agent_ui crate from agent (#33284)
This PR moves the UI-dependent logic in the `agent` crate into its own
crate, `agent_ui`. The remaining `agent` crate no longer depends on
`editor`, `picker`, `ui`, `workspace`, etc.

This has compile time benefits, but the main motivation is to isolate
our core agentic logic, so that we can make agents more
pluggable/configurable.

Release Notes:

- N/A
2025-06-23 18:00:28 -07:00
Oleksiy Syvokon
6df4c537b9
agent: Less disruptive changed file notification (#31693)
When the user edits one of the tracked files, we used to notify the
agent by inserting a user message at the end of the thread. This was
causing a few problems:
- The agent would stop doing its work and start reading changed files
- The agent would write something like, "Thank you for letting me know
about these changed files."

This fix contains two parts:
1. Changing the prompt to indicate this is a service message
2. Moving the message higher in the conversation thread

This works, but it slightly hurts caching.

We may consider making these notification messages stick in history,
trading context tokens count for the cache.

This might be related to #30906

Release Notes:

- N/A

---------

Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-06-16 18:45:24 +03:00
Antonio Scandurra
019a14bcde
Replace async-watch with a custom watch (#32245)
The `async-watch` crate doesn't seem to be maintained and we noticed
several panics coming from it, such as:

```
[bug] failed to observe change after notificaton.
zed::reliability::init_panic_hook::{{closure}}::hea8cdcb6299fad6b+154543526
std::panicking::rust_panic_with_hook::h33b18b24045abff4+127578547
std::panicking::begin_panic_handler::{{closure}}::hf8313cc2fd0126bc+127577770
std::sys::backtrace::__rust_end_short_backtrace::h57fe07c8aea5c98a+127571385
__rustc[95feac21a9532783]::rust_begin_unwind+127576909
core::panicking::panic_fmt::hd54fb667be51beea+9433328
core::option::expect_failed::h8456634a3dada3e4+9433291
assistant_tools::edit_agent::EditAgent::apply_edit_chunks::{{closure}}::habe2e1a32b267fd4+26921553
gpui::app::async_context::AsyncApp::spawn::{{closure}}::h12f5f25757f572ea+25923441
async_task::raw::RawTask<F,T,S,M>::run::h3cca0d402690ccba+25186815
<gpui::platform::linux::x11::client::X11Client as gpui::platform::linux::platform::LinuxClient>::run::h26264aefbcfbc14b+73961666
gpui::platform::linux::platform::<impl gpui::platform::Platform for P>::run::hb12dcd4abad715b5+73562509
gpui::app::Application::run::h0f936a5f855a3f9f+150676820
zed::main::ha17f9a25fe257d35+154788471
std::sys::backtrace::__rust_begin_short_backtrace::h1edd02429370b2bd+154624579
std::rt::lang_start::{{closure}}::h3d2e300f10059b0a+154264777
std::rt::lang_start_internal::h418648f91f5be3a1+127502049
main+154806636
__libc_start_main+46051972301573
_start+12358494
```

I didn't find an executor-agnostic watch crate that was well maintained
(we already tried postage and async-watch), so decided to implement it
our own version.

Release Notes:

- Fixed a panic that could sometimes occur when the agent performed
edits.
2025-06-06 16:00:09 +00:00
Ben Brandt
709523bf36
Store profile per thread (#31907)
This allows storing the profile per thread, as well as moving the logic
of which tools are enabled or not to the profile itself.

This makes it much easier to switch between profiles, means there is
less global state being changed on every profile change.

Release Notes:

- agent panel: allow saving the profile per thread

---------

Co-authored-by: Ben Kunkle <ben.kunkle@gmail.com>
2025-06-06 12:05:27 +00:00
Oleksiy Syvokon
3884de937b
assistant: Partial fix for HTML entities in tools params (#32148)
This problem seems to be specific to Opus 4. Eval shows improvement from
89% to 97%.

Closes: https://github.com/zed-industries/zed/issues/32060

Release Notes:

- N/A

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
2025-06-05 10:36:55 +00:00
Danilo Leal
63c1033448
agent: Generate a notification when reaching tool use limit (#31894)
When reaching the consecutive tool call limit, the agent gets blocked
and without a notification, you wouldn't know that. This PR adds the
ability to be notified when that happens, and you can use either sound
_and_ toast, or just one of them.

Release Notes:

- agent: Added support for getting notified (via toast and/or sound)
when reaching the consecutive tool call limit.
2025-06-02 21:57:42 -03:00
Marshall Bowers
a23ee61a4b
Pass up intent with completion requests (#31710)
This PR adds a new `intent` field to completion requests to assist in
categorizing them correctly.

Release Notes:

- N/A

---------

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
2025-05-29 20:43:12 +00:00
Oleksiy Syvokon
cb187b0b4d
evals: Configurable number of max dialog turns (#31680)
Release Notes:

- N/A
2025-05-29 10:35:29 +00:00
Richard Feldman
00fd045844
Make language model deserialization more resilient (#31311)
This expands our deserialization of JSON from models to be more tolerant
of different variations that the model may send, including
capitalization, wrapping things in objects vs. being plain strings, etc.

Also when deserialization fails, it reports the entire error in the JSON
so we can see what failed to deserialize. (Previously these errors were
very unhelpful at diagnosing the problem.)

Finally, also removes the `WrappedText` variant since the custom
deserializer just turns that style of JSON into a normal `Text` variant.

Release Notes:

- N/A
2025-05-28 12:06:07 -04:00
Marshall Bowers
8faeb34367
Rename assistant_settings to agent_settings (#31513)
This PR renames the `assistant_settings` crate to `agent_settings`, as
well a number of constructs within it.

Release Notes:

- N/A
2025-05-27 15:16:55 +00:00
Oleksiy Syvokon
61a40e293d
evals: Allow threads explorer to search for JSON files recursively (#31509)
It's just more convenient to call it from CLI this way.

+ minor fixes in evals

Release Notes:

- N/A
2025-05-27 14:18:47 +00:00
Joseph T. Lyons
c208532693
Use read-only access methods for read-only entity operations (#31479)
Another follow-up to #31254

Release Notes:

- N/A
2025-05-26 23:04:31 -04:00
Oleksiy Syvokon
68a46c3627
evals: Configurable judge model (#31282)
This is needed for apples-to-apples comparison of different agent
models.

Another change is that now `cargo -p eval` accepts model names as
`provider_id/model_id` instead of separate `--provider` and `--model`
params.


Release Notes:

- N/A
2025-05-23 15:03:09 +00:00
Marshall Bowers
cb52acbf3d
eval: Don't read the model from the user settings (#31230)
This PR fixes an issue where the eval was incorrectly pulling the
provider/model from the user settings, which could cause problems when
running certain evals.

Was introduced in #30168 due to the restructuring after the removal of
the `assistant` crate.

Release Notes:

- N/A
2025-05-23 00:21:35 +00:00
Marshall Bowers
5c0b161563
Handle new refusal stop reason from Claude 4 models (#31217)
This PR adds support for handling the new [`refusal` stop
reason](https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals)
from Claude 4 models.

<img width="409" alt="Screenshot 2025-05-22 at 4 31 56 PM"
src="https://github.com/user-attachments/assets/707b04f5-5a52-4a19-95d9-cbd2be2dd86f"
/>

Release Notes:

- Added handling for `"stop_reason": "refusal"` from Claude 4 models.
2025-05-22 16:56:59 -04:00
Piotr Osiewicz
77dadfedfe
chore: Make terminal_view own the TerminalSlashCommand (#31070)
This reduces 'touch crates/editor/src/editor.rs && cargo +nightly build'
from 8.9s to 8.5s. That same scenario used to take 8s less than a week
ago. :)
I'm measuring with nightly rustc, because it's compile times are better
than those of stable thanks to
https://github.com/rust-lang/rust/pull/138522

main (8.2s total):

![image](https://github.com/user-attachments/assets/767a2ac4-7bba-4147-bd16-9b09eed5b433)

[cargo-timing.html.zip](https://github.com/user-attachments/files/20364175/cargo-timing.html.zip)

#22be776 (7.5s total):

[cargo-timing-20250521T085303.892834Z.html.zip](https://github.com/user-attachments/files/20364391/cargo-timing-20250521T085303.892834Z.html.zip)

![image](https://github.com/user-attachments/assets/c4476df9-cb6e-4403-b0db-de00521f1fd0)


Release Notes:

- N/A
2025-05-21 09:27:54 +00:00
Kirill Bulatov
16366cf9f2
Use anyhow more idiomatically (#31052)
https://github.com/zed-industries/zed/issues/30972 brought up another
case where our context is not enough to track the actual source of the
issue: we get a general top-level error without inner error.

The reason for this was `.ok_or_else(|| anyhow!("failed to read HEAD
SHA"))?; ` on the top level.

The PR finally reworks the way we use anyhow to reduce such issues (or
at least make it simpler to bubble them up later in a fix).
On top of that, uses a few more anyhow methods for better readability.

* `.ok_or_else(|| anyhow!("..."))`, `map_err` and other similar error
conversion/option reporting cases are replaced with `context` and
`with_context` calls
* in addition to that, various `anyhow!("failed to do ...")` are
stripped with `.context("Doing ...")` messages instead to remove the
parasitic `failed to` text
* `anyhow::ensure!` is used instead of `if ... { return Err(...); }`
calls
* `anyhow::bail!` is used instead of `return Err(anyhow!(...));`

Release Notes:

- N/A
2025-05-20 23:06:07 +00:00
Richard Feldman
4bb04cef9d
Accept wrapped text content from LLM providers (#31048)
Some providers sometimes send `{ "type": "text", "text": ... }` instead
of just the text as a string. Now we accept those instead of erroring.

Release Notes:

- N/A
2025-05-20 20:50:02 +00:00
Piotr Osiewicz
a092e2dc03
extension: Add debug_adapters to extension manifest (#30676)
Also pass worktree to the get_dap_binary.

Release Notes:

- N/A
2025-05-20 11:01:33 +02:00
Oleksiy Syvokon
6420df3975
eval: Count execution errors as failures (#30712)
- Evals returning an error (e.g., LLM API format mismatch) were silently
skipped in the aggregated results. Now we count them as a failure (0%
success score).

- Setting the `VERBOSE` environment variable to something non-empty
disables string truncation

Release Notes:

- N/A
2025-05-14 20:44:19 +03:00
Oleksiy Syvokon
255d8f7cf8
agent: Overwrite files more cautiously (#30649)
1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.

With this change, the pass rate of the new eval increased from 10% to
100%.

2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.

3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.

Release Notes:

- N/A
2025-05-14 10:40:44 +03:00
Richard Feldman
8fdf309a4a
Have read_file support images (#30435)
This is very basic support for them. There are a number of other TODOs
before this is really a first-class supported feature, so not adding any
release notes for it; for now, this PR just makes it so that if
read_file tries to read a PNG (which has come up in practice), it at
least correctly sends it to Anthropic instead of messing up.

This also lays the groundwork for future PRs for more first-class
support for images in tool calls across more image file formats and LLM
providers.

Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
2025-05-13 10:58:00 +02:00
Richard Feldman
49887d6934
Add no_tools_enabled eval (#30537)
This is our first eval of the Minimal tool profile. Right now they're
all passing; the value of having it is to catch regressions in the
system prompt (which has special logic in it for the case where no tools
are enabled).

Release Notes:

- N/A
2025-05-12 08:52:03 +00:00
Max Brunsfeld
65b13968a2
Wait to locate system-installed Node until the shell environment is loaded (#30416)
Release Notes:

- Fixed a race condition that sometimes prevented a system-installed
`node` binary from being detected.
- Fixed a bug where the `node.path` setting was not respected when
invoking npm.
2025-05-09 19:24:28 +00:00
Antonio Scandurra
1b593f616f
Include EditAgent's raw output when inspecting thread (#30337)
This allows us to debug the raw edits that were generated when people
report feedback, when running evals and when opening the thread as
Markdown.

Release Notes:

- Improved debug output for agent threads.
2025-05-09 06:58:45 +00:00
Antonio Scandurra
9f6809a28d
Reuse conversation cache when streaming edits (#30245)
Release Notes:

- Improved latency when the agent applies edits.
2025-05-08 14:36:34 +02:00
Antonio Scandurra
89430a019c
Fix agent reading and editing files over SSH (#30144)
Release Notes:

- Fixed a bug that would prevent the agent from working over SSH.

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Cole Miller <m@cole-miller.net>
2025-05-07 17:07:01 +00:00
Oleksiy Syvokon
ac007139ab
evals: Enable Python LSP (#29987)
We now have one eval that uses a Python repo


Release Notes:

- N/A
2025-05-06 10:28:59 +00:00
Cole Miller
c12e6376b8
Terminal tool improvements (#29924)
WIP

- On macOS/Linux, run the command in bash instead of the user's shell
- Try to prevent the agent from running commands that expect interaction

Release Notes:

- Agent Beta: Switched to using `bash` (if available) instead of the
user's shell when calling the terminal tool.
- Agent Beta: Prevented the agent from hanging when trying to run
interactive commands.

---------

Co-authored-by: WeetHet <stas.ale66@gmail.com>
2025-05-05 15:57:03 -04:00
Bennet Bo Fenner
9cb5ffac25
context_store: Refactor state management (#29910)
Because we instantiated `ContextServerManager` both in `agent` and
`assistant-context-editor`, and these two entities track the running MCP
servers separately, we were effectively running every MCP server twice.

This PR moves the `ContextServerManager` into the project crate (now
called `ContextServerStore`). The store can be accessed via a project
instance. This ensures that we only instantiate one `ContextServerStore`
per project.

Also, this PR adds a bunch of tests to ensure that the
`ContextServerStore` behaves correctly (Previously there were none).

Closes #28714
Closes #29530

Release Notes:

- N/A
2025-05-05 21:36:12 +02:00
Oleksiy Syvokon
8199664a5a
agent: Handle attempts to use hallucinated tools (#29946)
This change:

1. Catches attempts to use missing tools. If this happens, we now send
Agent a message listing available tools, after which Agent can
gracefully recover. Prior behavior: thread would stop in a broken state.

Example of a hallucinated call and a message we send back: 

![image](https://github.com/user-attachments/assets/92a8f700-b192-4038-8c7e-0a74ca2e0146)

2. Adds evals for hallucinated tool use and imagined edits
3. Adds ability to configure a profile name in evals.



Release Notes:

- N/A
2025-05-05 19:31:11 +00:00
Marshall Bowers
3db4744e18
agent: Remove unneeded tracking of request usage (#29894)
This PR removes some unneeded tracking of the model request usage in the
`ActiveThread` and `ThreadEvent::UsageUpdated` events.

Release Notes:

- N/A
2025-05-05 01:16:53 +00:00
Michael Sloan
bb82d9ca82
agent eval: Fix --model arg and add --provider (#29883)
Release Notes:

- N/A
2025-05-04 13:43:57 -06:00
Max Brunsfeld
c3d9cdecab
Change cloud language model provider JSON protocol to surface errors and usage information (#29830)
Release Notes:

- N/A

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-05-04 17:37:42 +00:00
Antonio Scandurra
4d51602e7b
Encourage editing over re-creating a file from scratch (#29870)
I also introduced a new eval to prove the encouragement actually makes a
difference.

Release Notes:

- Improved agent behavior when streaming edits, encouraging it to
editing files as opposed to creating them from scratch
2025-05-04 13:18:28 +00:00
Agus Zubiaga
64316309aa
agent: Review edits in single-file editors (#29820)
Enables reviewing agent edits from single-file editors in addition to
the multibuffer experience we already had.


https://github.com/user-attachments/assets/a2c287f0-51d6-43a1-8537-821498b91983


This feature can be turned off by setting `assistant.single_file_review:
false`.

Release Notes:

- agent: Review edits in single-file editors
2025-05-02 17:57:16 -03:00
Max Brunsfeld
04772bf17d
Add support for queuing status updates in cloud language model provider (#29818)
This sets us up to display queue position information to the user, once
our language model backend is updated to support request queuing.

The JSON returned by the LLM backend will need to look like this:

```json
{"queue": {"status": "queued", "position": 1}}
{"queue": {"status": "started"}}
{"event": {"THE_UPSTREAM_MODEL_PROVIDER_EVENT": "..."}} 
```

Release Notes:

- N/A

---------

Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-05-02 20:36:39 +00:00
Cole Miller
9547d42b15
Support @-mentions in inline assists and when editing old agent panel messages (#29734)
Closes #ISSUE

Co-authored-by: Bennet <bennet@zed.dev>

Release Notes:

- Added support for context `@mentions` in the inline prompt editor and
when editing past messages in the agent panel.

---------

Co-authored-by: Bennet Bo Fenner <bennet@zed.dev>
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
2025-05-02 20:08:53 +00:00
Richard Feldman
9efc09c5a6
Add eval for open_tool (#29801)
Also have its description say it should only be used on request

Release Notes:

- N/A
2025-05-02 15:56:07 +00:00
Bennet Bo Fenner
24eb039752
context servers: Show configuration modal when extension is installed (#29309)
WIP

Release Notes:

- N/A

---------

Co-authored-by: Danilo Leal <67129314+danilo-leal@users.noreply.github.com>
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Co-authored-by: Cole Miller <m@cole-miller.net>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-05-01 20:02:14 +02:00
Antonio Scandurra
f891dfb358
Introduce a new StreamingEditFileTool (#29733)
This pull request introduces a new tool for streaming edits. The
short-term goal is for this tool to replace the existing `EditFileTool`,
but we want to get this out the door as soon as possible so that we can
start testing it.

`StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It
will be enabled by default for anyone who has the `agent-stream-edits`
feature flag, as well as people that set `assistant.stream_edits` to
`true` in their settings.

### Implementation

Streaming is achieved by requesting a completion while the `edit_file`
tool gets called. We invoke the model by taking the existing
conversation with the agent and appending a prompt specifically tailored
for editing. In that prompt, we ask the model to produce a stream of
`<old_text>`/`<new_text>` tags. As the model streams text in, we
incrementally parse it and start editing as soon as we can.

### Evals

Note that, as part of this pull request, I also defined some new evals
that I used to drive the behavior of the recursive LLM call. To run
them, use this command:

```bash
cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output
```

Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro.

I recommend running them one at a time, because right now we don't
really have a way of orchestrating of all these evals. I think we should
invest into that effort once the new agent panel goes live.

Release Notes:

- N/A

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-05-01 17:37:43 +02:00
Richard Feldman
afeb3d4fd9
Make eval more resilient to bad input from LLM (#29703)
I saw a slice panic (for begin > end) in a debug build of the eval. This
should just be a failed assertion, not a panic that takes out the whole
eval run!

Release Notes:

- N/A
2025-04-30 18:13:45 -04:00
Richard Feldman
04c68dc0cf
Make the default repetitions be 8, and concurrency 4 (#29576)
This is based on having observed that there is a lot of variation
between runs on `n=1` and `n=3`.

* With `n=8` two runs on the same branch give answers that seem close
enough to be reasonably consistent.
* With higher concurrency, trying to run this many repetitions seems to
lead language servers to time out a lot, causing evals to fail.

Release Notes:

- N/A
2025-04-30 15:21:19 -04:00
Richard Feldman
c8685dc90f
Fix eval judging missing final response (#29638)
Fixed issue where eval thread judges were not considering the last
response in the thread.

The problem was that they were getting the full list of messages from
`last_request`, which (being a request!) did not have the response yet.

Release Notes:

- N/A
2025-04-29 23:02:46 -04:00