This introduces a new field `thinking_allowed` on `LanguageModelRequest`
which lets us control whether thinking should be enabled if the model
supports it.
We permit thinking in the Inline Assistant, Edit File tool and the Git
Commit message generator, this should make generation faster when using
a thinking model, e.g. `claude-sonnet-4-thinking`
Release Notes:
- N/A
This format is enabled for Google models as they seem to prefer it.
A relevant unit eval's pass rate has increased from 0.77 to 0.98.
Diff-fenced format looks like this (markdown fences and a line hint are
optional):
```diff
<<<<<<< SEARCH line=42
...
=======
...
>>>>>>> REPLACE
```
Release Notes:
- Agent: Gemini models now use the diff-fenced format when making edits
These changes help the agent edit files when `<old_text>` matches more
than one location.
First, the agent can specify an optional `<old_text line=XX>` parameter.
When this is provided and multiple matches exist, we use this hint to
identify the best match.
Second, when there is ambiguity in matches, we now return the agent a
more helpful message listing the line numbers of all possible matches.
Together, these changes should reduce the number of misplaced edits and
agent confusion.
I have ensured the LLM Worker works with these prompt changes.
Release Notes:
- Agent: Improved locating edits
The `async-watch` crate doesn't seem to be maintained and we noticed
several panics coming from it, such as:
```
[bug] failed to observe change after notificaton.
zed::reliability::init_panic_hook::{{closure}}::hea8cdcb6299fad6b+154543526
std::panicking::rust_panic_with_hook::h33b18b24045abff4+127578547
std::panicking::begin_panic_handler::{{closure}}::hf8313cc2fd0126bc+127577770
std::sys::backtrace::__rust_end_short_backtrace::h57fe07c8aea5c98a+127571385
__rustc[95feac21a9532783]::rust_begin_unwind+127576909
core::panicking::panic_fmt::hd54fb667be51beea+9433328
core::option::expect_failed::h8456634a3dada3e4+9433291
assistant_tools::edit_agent::EditAgent::apply_edit_chunks::{{closure}}::habe2e1a32b267fd4+26921553
gpui::app::async_context::AsyncApp::spawn::{{closure}}::h12f5f25757f572ea+25923441
async_task::raw::RawTask<F,T,S,M>::run::h3cca0d402690ccba+25186815
<gpui::platform::linux::x11::client::X11Client as gpui::platform::linux::platform::LinuxClient>::run::h26264aefbcfbc14b+73961666
gpui::platform::linux::platform::<impl gpui::platform::Platform for P>::run::hb12dcd4abad715b5+73562509
gpui::app::Application::run::h0f936a5f855a3f9f+150676820
zed::main::ha17f9a25fe257d35+154788471
std::sys::backtrace::__rust_begin_short_backtrace::h1edd02429370b2bd+154624579
std::rt::lang_start::{{closure}}::h3d2e300f10059b0a+154264777
std::rt::lang_start_internal::h418648f91f5be3a1+127502049
main+154806636
__libc_start_main+46051972301573
_start+12358494
```
I didn't find an executor-agnostic watch crate that was well maintained
(we already tried postage and async-watch), so decided to implement it
our own version.
Release Notes:
- Fixed a panic that could sometimes occur when the agent performed
edits.
When `<old_text>` points to more than one location in a file, we used to
edit the first match, confusing the agent along the way. Now we will
return an error, asking to expand `<old_text>` selection.
Closes #ISSUE
Release Notes:
- agent: Fixed incorrect file edits when edit locations are ambiguous
This PR adds a new `intent` field to completion requests to assist in
categorizing them correctly.
Release Notes:
- N/A
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
This change instructs models to wrap new file content in Markdown fences
and introduces a parser for this format. The reasons are:
1. This is the format we put a lot of effort into explaining in the
system prompt.
2. Gemini really prefers to do it.
3. It adds an option for a model to think before writing the content
The `eval_zode` pass rate for GEmini models goes from 0% to 100%. Other
models were already at 100%, this hasn't changed.
Release Notes:
- N/A
This eval checks that Edit Agent can create an empty file without
writing its thoughts into it. This issue is not specific to empty files,
but it's easier to reproduce with them.
For some mysterious reason, I could easily reproduce this issue roughly
90% of the time in actual Zed. However, once I extract the exact LLM
request before the failure point and generate from that, the
reproduction rate drops to 2%!
Things I've tried to make sure it's not a fluke: disabling prompt
caching, capturing the LLM request via a proxy server, running the
prompt on Claude separately from evals. Every time it was mostly giving
good outcomes, which doesn't match my actual experience in Zed.
At some point I discovered that simply adding one insignificant space or
a newline to the prompt suddenly results in an outcome I tried to
reproduce almost perfectly.
This weirdness happens even outside the Zed code base and even when
using a different subscription. The result is the same: an extra newline
or space changes the model behavior significantly enough, so that the
pass rate drops from 99% to 0-3%
I have no explanation to this.
Release Notes:
- N/A
This allows us to debug the raw edits that were generated when people
report feedback, when running evals and when opening the thread as
Markdown.
Release Notes:
- Improved debug output for agent threads.
This improves the new eval scenario by ~80% (`0.29` vs `0.525`) without
decreasing performance in the other evals.
Release Notes:
- Improved the performance of the `edit_file` tool.
Nathan here: I also tacked on a bunch of UI refinement.
Release Notes:
- Introduced the ability to follow the agent around as it reads and
edits files.
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Refs #29733
This pull request introduces a new field to the `StreamingEditFileTool`
that lets the model create or overwrite a file in a streaming way. When
one of the `assistant.stream_edits` setting / `agent-stream-edits`
feature flag is enabled, we are going to disable the `CreateFileTool` so
that the agent model can only use `StreamingEditFileTool` for file
creation.
Release Notes:
- N/A
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
This pull request introduces a new tool for streaming edits. The
short-term goal is for this tool to replace the existing `EditFileTool`,
but we want to get this out the door as soon as possible so that we can
start testing it.
`StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It
will be enabled by default for anyone who has the `agent-stream-edits`
feature flag, as well as people that set `assistant.stream_edits` to
`true` in their settings.
### Implementation
Streaming is achieved by requesting a completion while the `edit_file`
tool gets called. We invoke the model by taking the existing
conversation with the agent and appending a prompt specifically tailored
for editing. In that prompt, we ask the model to produce a stream of
`<old_text>`/`<new_text>` tags. As the model streams text in, we
incrementally parse it and start editing as soon as we can.
### Evals
Note that, as part of this pull request, I also defined some new evals
that I used to drive the behavior of the recursive LLM call. To run
them, use this command:
```bash
cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output
```
Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro.
I recommend running them one at a time, because right now we don't
really have a way of orchestrating of all these evals. I think we should
invest into that effort once the new agent panel goes live.
Release Notes:
- N/A
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>