This format is enabled for Google models as they seem to prefer it.
A relevant unit eval's pass rate has increased from 0.77 to 0.98.
Diff-fenced format looks like this (markdown fences and a line hint are
optional):
```diff
<<<<<<< SEARCH line=42
...
=======
...
>>>>>>> REPLACE
```
Release Notes:
- Agent: Gemini models now use the diff-fenced format when making edits
These changes help the agent edit files when `<old_text>` matches more
than one location.
First, the agent can specify an optional `<old_text line=XX>` parameter.
When this is provided and multiple matches exist, we use this hint to
identify the best match.
Second, when there is ambiguity in matches, we now return the agent a
more helpful message listing the line numbers of all possible matches.
Together, these changes should reduce the number of misplaced edits and
agent confusion.
I have ensured the LLM Worker works with these prompt changes.
Release Notes:
- Agent: Improved locating edits
This allows us to debug the raw edits that were generated when people
report feedback, when running evals and when opening the thread as
Markdown.
Release Notes:
- Improved debug output for agent threads.
This pull request introduces a new tool for streaming edits. The
short-term goal is for this tool to replace the existing `EditFileTool`,
but we want to get this out the door as soon as possible so that we can
start testing it.
`StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It
will be enabled by default for anyone who has the `agent-stream-edits`
feature flag, as well as people that set `assistant.stream_edits` to
`true` in their settings.
### Implementation
Streaming is achieved by requesting a completion while the `edit_file`
tool gets called. We invoke the model by taking the existing
conversation with the agent and appending a prompt specifically tailored
for editing. In that prompt, we ask the model to produce a stream of
`<old_text>`/`<new_text>` tags. As the model streams text in, we
incrementally parse it and start editing as soon as we can.
### Evals
Note that, as part of this pull request, I also defined some new evals
that I used to drive the behavior of the recursive LLM call. To run
them, use this command:
```bash
cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output
```
Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro.
I recommend running them one at a time, because right now we don't
really have a way of orchestrating of all these evals. I think we should
invest into that effort once the new agent panel goes live.
Release Notes:
- N/A
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>