Commit graph

13 commits

Author SHA1 Message Date
Oleksiy Syvokon
5112fcebeb
evals: Make LLMs configurable in edit_agent evals (#30813)
Release Notes:

- N/A
2025-05-16 11:10:15 +00:00
Piotr Osiewicz
0f17e82154
chore: Bump Rust to 1.87 (#30739)
Closes #ISSUE

Release Notes:

- N/A
2025-05-15 22:28:52 +00:00
Oleksiy Syvokon
255d8f7cf8
agent: Overwrite files more cautiously (#30649)
1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.

With this change, the pass rate of the new eval increased from 10% to
100%.

2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.

3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.

Release Notes:

- N/A
2025-05-14 10:40:44 +03:00
Richard Feldman
8fdf309a4a
Have read_file support images (#30435)
This is very basic support for them. There are a number of other TODOs
before this is really a first-class supported feature, so not adding any
release notes for it; for now, this PR just makes it so that if
read_file tries to read a PNG (which has come up in practice), it at
least correctly sends it to Anthropic instead of messing up.

This also lays the groundwork for future PRs for more first-class
support for images in tool calls across more image file formats and LLM
providers.

Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
2025-05-13 10:58:00 +02:00
Antonio Scandurra
1b593f616f
Include EditAgent's raw output when inspecting thread (#30337)
This allows us to debug the raw edits that were generated when people
report feedback, when running evals and when opening the thread as
Markdown.

Release Notes:

- Improved debug output for agent threads.
2025-05-09 06:58:45 +00:00
Antonio Scandurra
9f6809a28d
Reuse conversation cache when streaming edits (#30245)
Release Notes:

- Improved latency when the agent applies edits.
2025-05-08 14:36:34 +02:00
Antonio Scandurra
89430a019c
Fix agent reading and editing files over SSH (#30144)
Release Notes:

- Fixed a bug that would prevent the agent from working over SSH.

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Cole Miller <m@cole-miller.net>
2025-05-07 17:07:01 +00:00
Mikayla Maki
0cdd8bdded
Restore tool cards on thread deserialization (#30053)
Release Notes:

- N/A

---------

Co-authored-by: Julia Ryan <juliaryan3.14@gmail.com>
2025-05-06 18:16:34 -07:00
Antonio Scandurra
07e6e49583
Add new editing eval scenario and improve it substantially (#29997)
This improves the new eval scenario by ~80% (`0.29` vs `0.525`) without
decreasing performance in the other evals.

Release Notes:

- Improved the performance of the `edit_file` tool.
2025-05-06 12:22:42 +00:00
Antonio Scandurra
210c338df4
Restore original file content when rejecting an overwritten file (#29974)
Release Notes:

- Fixed a bug that would cause rejecting a hunk from the agent to delete
the file if the agent had decided to rewrite that file from scratch.
2025-05-06 07:05:55 +00:00
Antonio Scandurra
545ae27079
Add the ability to follow the agent as it makes edits (#29839)
Nathan here: I also tacked on a bunch of UI refinement.

Release Notes:

- Introduced the ability to follow the agent around as it reads and
edits files.

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-05-04 08:28:39 +00:00
Antonio Scandurra
35539847a4
Allow StreamingEditFileTool to also create files (#29785)
Refs #29733 

This pull request introduces a new field to the `StreamingEditFileTool`
that lets the model create or overwrite a file in a streaming way. When
one of the `assistant.stream_edits` setting / `agent-stream-edits`
feature flag is enabled, we are going to disable the `CreateFileTool` so
that the agent model can only use `StreamingEditFileTool` for file
creation.

Release Notes:

- N/A

---------

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-05-02 09:57:04 +00:00
Antonio Scandurra
f891dfb358
Introduce a new StreamingEditFileTool (#29733)
This pull request introduces a new tool for streaming edits. The
short-term goal is for this tool to replace the existing `EditFileTool`,
but we want to get this out the door as soon as possible so that we can
start testing it.

`StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It
will be enabled by default for anyone who has the `agent-stream-edits`
feature flag, as well as people that set `assistant.stream_edits` to
`true` in their settings.

### Implementation

Streaming is achieved by requesting a completion while the `edit_file`
tool gets called. We invoke the model by taking the existing
conversation with the agent and appending a prompt specifically tailored
for editing. In that prompt, we ask the model to produce a stream of
`<old_text>`/`<new_text>` tags. As the model streams text in, we
incrementally parse it and start editing as soon as we can.

### Evals

Note that, as part of this pull request, I also defined some new evals
that I used to drive the behavior of the recursive LLM call. To run
them, use this command:

```bash
cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output
```

Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro.

I recommend running them one at a time, because right now we don't
really have a way of orchestrating of all these evals. I think we should
invest into that effort once the new agent panel goes live.

Release Notes:

- N/A

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-05-01 17:37:43 +02:00