Commit graph

55 commits

Author SHA1 Message Date
Oleksiy Syvokon
3884de937b
assistant: Partial fix for HTML entities in tools params (#32148)
This problem seems to be specific to Opus 4. Eval shows improvement from
89% to 97%.

Closes: https://github.com/zed-industries/zed/issues/32060

Release Notes:

- N/A

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
2025-06-05 10:36:55 +00:00
Oleksiy Syvokon
cde47e60cd
assistant_tools: Disallow extra tool parameters by default (#32081)
This prevents models from hallucinating tool parameters.


Release Notes:

- Prevent models from hallucinating tool parameters
2025-06-04 16:11:40 +00:00
Marshall Bowers
8faeb34367
Rename assistant_settings to agent_settings (#31513)
This PR renames the `assistant_settings` crate to `agent_settings`, as
well a number of constructs within it.

Release Notes:

- N/A
2025-05-27 15:16:55 +00:00
Oleksiy Syvokon
255d8f7cf8
agent: Overwrite files more cautiously (#30649)
1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.

With this change, the pass rate of the new eval increased from 10% to
100%.

2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.

3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.

Release Notes:

- N/A
2025-05-14 10:40:44 +03:00
Cole Miller
8b764a5477
Add a test for remote tool use by the agent (#30289)
- Adds a new smoke test for the use of the read_file tool by the agent
in an SSH project
- Fixes the SSH shutdown sequence to use a timer from the app's executor
instead of always using a real timer
- Changes the main executor loop for tests to advance the clock
automatically instead of panicking with `parked with nothing left to
run` when there is a delayed task

Release Notes:

- N/A
2025-05-08 16:53:04 -04:00
Antonio Scandurra
89430a019c
Fix agent reading and editing files over SSH (#30144)
Release Notes:

- Fixed a bug that would prevent the agent from working over SSH.

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Cole Miller <m@cole-miller.net>
2025-05-07 17:07:01 +00:00
Marshall Bowers
5539d82ea6
agent: Remove feature flag checks (#30055)
This PR removes all of the feature flag checks related to the Agent.

Tried to do this in the least invasive way possible; we can follow up
with a full removal.

Release Notes:

- N/A
2025-05-06 21:38:05 -04:00
Cole Miller
c12e6376b8
Terminal tool improvements (#29924)
WIP

- On macOS/Linux, run the command in bash instead of the user's shell
- Try to prevent the agent from running commands that expect interaction

Release Notes:

- Agent Beta: Switched to using `bash` (if available) instead of the
user's shell when calling the terminal tool.
- Agent Beta: Prevented the agent from hanging when trying to run
interactive commands.

---------

Co-authored-by: WeetHet <stas.ale66@gmail.com>
2025-05-05 15:57:03 -04:00
Antonio Scandurra
4d51602e7b
Encourage editing over re-creating a file from scratch (#29870)
I also introduced a new eval to prove the encouragement actually makes a
difference.

Release Notes:

- Improved agent behavior when streaming edits, encouraging it to
editing files as opposed to creating them from scratch
2025-05-04 13:18:28 +00:00
Richard Feldman
e6b0d8e48b
Delete obsolete tools (#29808)
Release Notes:

- Removed some obsolete tools: batch_tool, code_actions, code_symbols,
contents, symbol_info, rename

Co-authored-by: Cole Miller <m@cole-miller.net>
2025-05-02 18:52:42 +00:00
Richard Feldman
9efc09c5a6
Add eval for open_tool (#29801)
Also have its description say it should only be used on request

Release Notes:

- N/A
2025-05-02 15:56:07 +00:00
Bennet Bo Fenner
fde621f0e3
agent: Ensure that web search tool is always available (#29799)
Some changes in the LanguageModelRegistry caused the web search tool not
to show up, because the `DefaultModelChanged` event is not emitted at
startup anymore.

Release Notes:

- agent: Fixed an issue where the web search tool would not be available
after starting Zed (only when using zed.dev as a provider).
2025-05-02 15:34:08 +00:00
Antonio Scandurra
35539847a4
Allow StreamingEditFileTool to also create files (#29785)
Refs #29733 

This pull request introduces a new field to the `StreamingEditFileTool`
that lets the model create or overwrite a file in a streaming way. When
one of the `assistant.stream_edits` setting / `agent-stream-edits`
feature flag is enabled, we are going to disable the `CreateFileTool` so
that the agent model can only use `StreamingEditFileTool` for file
creation.

Release Notes:

- N/A

---------

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-05-02 09:57:04 +00:00
Antonio Scandurra
f891dfb358
Introduce a new StreamingEditFileTool (#29733)
This pull request introduces a new tool for streaming edits. The
short-term goal is for this tool to replace the existing `EditFileTool`,
but we want to get this out the door as soon as possible so that we can
start testing it.

`StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It
will be enabled by default for anyone who has the `agent-stream-edits`
feature flag, as well as people that set `assistant.stream_edits` to
`true` in their settings.

### Implementation

Streaming is achieved by requesting a completion while the `edit_file`
tool gets called. We invoke the model by taking the existing
conversation with the agent and appending a prompt specifically tailored
for editing. In that prompt, we ask the model to produce a stream of
`<old_text>`/`<new_text>` tags. As the model streams text in, we
incrementally parse it and start editing as soon as we can.

### Evals

Note that, as part of this pull request, I also defined some new evals
that I used to drive the behavior of the recursive LLM call. To run
them, use this command:

```bash
cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output
```

Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro.

I recommend running them one at a time, because right now we don't
really have a way of orchestrating of all these evals. I think we should
invest into that effort once the new agent panel goes live.

Release Notes:

- N/A

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-05-01 17:37:43 +02:00
Max Brunsfeld
57d8397f53
Remove unnecessary fields from the tool schemas (#29381)
This PR removes two fields from JSON schemas (`$schema` and `title`),
which are not expected by any model provider, but were spuriously
included by our JSON schema library, `schemars`.

These added noise to requests and cost wasted input tokens.

### Old

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "FetchToolInput",
  "type": "object",
  "required": [
    "url"
  ],
  "properties": {
    "url": {
      "description": "The URL to fetch.",
      "type": "string"
    }
  }
}
```

### New:

```json
{
  "properties": {
    "url": {
      "description": "The URL to fetch.",
      "type": "string"
    }
  },
  "required": [
    "url"
  ],
  "type": "object"
}
```

- N/A
2025-04-24 18:09:25 -07:00
Agus Zubiaga
8b5835de17
agent: Improve initial file search quality (#29317)
This PR significantly improves the quality of the initial file search
that occurs when the model doesn't yet know the full path to a file it
needs to read/edit.

Previously, the assertions in file_search often failed on main as the
model attempted to guess full file paths. On this branch, it reliably
calls `find_path` (previously `path_search`) before reading files.

After getting the model to find paths first, I noticed it would try
using `grep` instead of `path_search`. This motivated renaming
`path_search` to `find_path` (continuing the analogy to unix commands)
and adding system prompt instructions about proper tool selection.

Note: I know the command is just called `find`, but that seemed too
general.

In my eval runs, the `file_search` example improved from 40% ± 10% to
98% ± 2%. The only assertion I'm seeing occasionally fail is "glob
starts with `**` or project". We can probably add some instructions in
that regard.

Release Notes:

- N/A
2025-04-23 21:24:41 -03:00
Agus Zubiaga
45d3f5168a
eval: New add_arg_to_trait_method example (#29297)
Release Notes:

- N/A

---------

Co-authored-by: Richard Feldman <oss@rtfeldman.com>
2025-04-23 18:46:39 +00:00
Bennet Bo Fenner
822b6f837d
agent: Expose web search tool to beta users (#29273)
This gives all beta users access to the web search tool

Release Notes:

- agent: Added `web_search` tool
2025-04-23 15:30:20 +00:00
Agus Zubiaga
ce1a674eba
eval: Fine-grained assertions (#29246)
- Support programmatic examples
([example](17feb260a0/crates/eval/src/examples/file_search.rs))
- Combine data-driven example declarations into a single `.toml` file
([example](17feb260a0/crates/eval/src/examples/find_and_replace_diff_card.toml))
- Run judge on individual assertions (previously called "criteria")
- Report judge and programmatic assertions in one combined table

Note: We still need to work on concept naming 

<img width=400
src="https://github.com/user-attachments/assets/fc719c93-467f-412b-8d47-68821bd8a5f5">

Release Notes:

- N/A

---------

Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
2025-04-22 23:58:58 -03:00
Danilo Leal
19b547565d
agent: Refine the web search tool call UI (#29190)
This PR refines a bit the web search tool UI by introducing a component
(`ToolCallCardHeader`) that aims to standardize the heading element of
tool calls in the thread.

In terms of next steps, I plan to evolve this component further soon
(e.g., building a full-blown "tool call card" component), and even move
it to a place where I can re-use it in the active_thread as well without
making the `assistant_tools` a dependency of it.

Release Notes:

- N/A
2025-04-22 09:51:57 -03:00
Nathan Sobo
107d8ca483
Rename regex search tool to grep and accept an include glob pattern (#29100)
This PR renames the `regex_search` tool to `grep` because I think it
conveys more meaning to the model, the idea of searching the filesystem
with a regular expression. It's also one word and the model seems to be
using it effectively after some additional prompt tuning.

It also takes an include pattern to filter on the specific files we try
to search. I'd like to encourage the model to scope its searches more
aggressively, as in my testing, I'm only seeing it filter on file
extension.

Release Notes:

- N/A
2025-04-20 00:53:30 +00:00
Nathan Sobo
bab28560ef
Systematically optimize agentic editing performance (#28961)
Now that we've established a proper eval in tree, this PR is reboots of
our agent loop back to a set of minimal tools and simpler prompts. We
should aim to get this branch feeling subjectively competitive with
what's on main and then merge it, and build from there.

Let's invest in our eval and use it to drive better performance of the
agent loop. How you can help: Pick an example, and then make the outcome
faster or better. It's fine to even use your own subjective judgment, as
our evaluation criteria likely need tuning as well at this point. Focus
on making the agent work better in your own subjective experience first.
Let's focus on simple/practical improvements to make this thing work
better, then determine how we can craft our judgment criteria to lock
those improvements in.

Release Notes:

- N/A

---------

Co-authored-by: Max <max@zed.dev>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Agus <agus@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
2025-04-19 02:47:59 +00:00
Bennet Bo Fenner
456e54b87c
agent: Add websearch tool (#28621)
Staff only for now. We'll work on making this usable for non zed.dev
users later

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-04-16 19:25:00 +02:00
Richard Feldman
b794919842
Add contents_tool (#28738)
This is a combination of the "read file" and "list directory contents"
tools as part of a push to reduce our quantity of builtin tools by
combining some of them.

The functionality is all there for this tool, although there's room for
improvement on the visuals side: it currently always shows the same icon
and always says "Read" - so you can't tell at a glance when it's reading
a directory vs an individual file. Changing this will require a change
to the `Tool` trait, which can be in a separate PR. (FYI @danilo-leal!)

<img width="606" alt="Screenshot 2025-04-14 at 11 56 27 PM"
src="https://github.com/user-attachments/assets/bded72af-6476-4469-97c6-2f344629b0e4"
/>

Release Notes:

- Added `contents` tool
2025-04-15 00:54:25 -04:00
Bennet Bo Fenner
2603f36737
agent: Improve compatibility when using MCP servers with Gemini models (#28700)
WIP

Release Notes:

- agent: Improve compatibility when using MCPs with Gemini models
2025-04-14 21:55:25 +02:00
Bennet Bo Fenner
a051194195
agent: Check built-in tools schema compatibility in tests (#28691)
This ensures that we respect the `LanguageModelToolSchemaFormat` value
when we call `tool.input_schema`. This prevents us from breaking Gemini
compatibility when adding/changing built-in tools. See #28634.

The test suite will now fail with an error message like this, when
providing an incompatible input_schema:

```
thread 'tests::test_tool_schema_compatibility' panicked at crates/assistant_tools/src/assistant_tools.rs:108:17:
Tool schema for `code_actions` is not compatible with `language_model::LanguageModelToolSchemaFormat::JsonSchemaSubset` (Gemini Models).
Are you using `schema::json_schema_for<T>(format)` to generate the schema?
```


Release Notes:

- N/A
2025-04-14 17:50:01 +02:00
Agus Zubiaga
90bcde116f
agent: Use current shell (#28470)
Release Notes:

- agent: Replace `bash` tool with `terminal` tool which uses the current
shell

---------

Co-authored-by: Bennet <bennet@zed.dev>
Co-authored-by: Antonio <antonio@zed.dev>
2025-04-09 23:38:36 -06:00
Richard Feldman
6db4ab381c
Add code action tool and rename tool (#28453)
Having a separate rename tool seems to make the agent more likely to use
it compared to having it be part of the code actions tool.

Release Notes:

- Added code action tool and rename tool.
2025-04-09 22:38:01 -04:00
Antonio Scandurra
97641c3298
Use tree-sitter when returning symbols to the model for a given file (#28352)
This also increases the threshold for when we return an outline during
`read_file`.

Release Notes:

- Fixed an issue that caused the agent to fail reading large files if
the LSP hadn't started yet.
2025-04-08 16:11:05 -04:00
Agus Zubiaga
cc9cc12f7b
agent: Remove edit_files tool (#28041)
Release Notes:

- agent: Remove `edit_files` tool  in favor of `find_replace`
2025-04-04 16:37:14 +00:00
Bennet Bo Fenner
c8a9a74e6a
Add tool calling support for Gemini models (#27772)
Release Notes:

- N/A
2025-03-31 17:46:42 +02:00
Richard Feldman
9b40770e9f
Add Code Symbols tool (#27733)
Lets you get all the code symbols in the project (like the Code Symbols
panel) or in a particular file (like the Outline panel), optionally
paginated and filtering results by regex. The tool gives the files,
lines, and numbers of all of these, which means they can be used in
conjunction with the read file tool to read subsets of large files
without having to open the entire large file and poke around in it.

<img width="621" alt="Screenshot 2025-03-29 at 12 00 21 PM"
src="https://github.com/user-attachments/assets/d78259d7-2746-44c0-ac18-2e21f2505c0a"
/>

Release Notes:

- N/A
2025-03-31 05:13:13 +00:00
Richard Feldman
078b241223
Add symbol info tool (#27742)
Does various readonly LSP operations: get definition, get declaration,
get implementation, get type definition, and find all references.

<img width="635" alt="Screenshot 2025-03-30 at 1 24 11 AM"
src="https://github.com/user-attachments/assets/87eae2b0-9791-4e7f-b91f-79dfc2b746cc"
/>

Release Notes:

- N/A
2025-03-31 00:23:03 -04:00
Richard Feldman
56eb650f09
Add Batch tool call for calling multiple tools (#27621)
<img width="620" alt="Screenshot 2025-03-27 at 2 29 13 PM"
src="https://github.com/user-attachments/assets/dd023507-61bc-4722-a095-f65f4b6c746a"
/>

We'll iterate on the UI, but first the goal is to just get it to work at
all so we can see if it's useful in terms of getting correct output
faster.

Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <hi@aguz.me>
2025-03-27 18:21:26 -04:00
Richard Feldman
61be869352
Add Open Tool (#27499)
I've seen models try to run `open` in Bash. This is a cross-platform
version of that.

<img width="634" alt="Screenshot 2025-03-26 at 10 27 40 AM"
src="https://github.com/user-attachments/assets/b18cb50f-6e2f-4770-b15c-1040916a420a"
/>

Release Notes:

- N/A
2025-03-27 18:20:59 -04:00
Richard Feldman
9db4c8b710
Add Create Directory Tool (#27505)
`mkdir -p` but it works cross-platform and uses project abstractions.

<img width="629" alt="Screenshot 2025-03-26 at 11 02 37 AM"
src="https://github.com/user-attachments/assets/9ef58d53-3343-4c94-a8f3-b82ab942611b"
/>

Release Notes:

- N/A
2025-03-26 11:59:03 -04:00
Richard Feldman
bf255486c0
Add find-replace-file tool, use it by default over edit-files-tool (#27438)
@agu-z and paired on trying out a "one tool call per edit" approach for
editing files. (The previous approach is still available, it's just
unchecked by default for now.)

Release Notes:

- N/A

---------

Co-authored-by: Agus <agus@zed.dev>
2025-03-25 13:12:50 -04:00
Richard Feldman
7046b9641d
Add create-file-tool (#27381)
<img width="627" alt="Screenshot 2025-03-24 at 12 52 04 PM"
src="https://github.com/user-attachments/assets/0e8c061a-11c5-4d60-a694-55575b6c8f5e"
/>

Release Notes:

- N/A
2025-03-25 10:56:41 -04:00
Richard Feldman
e9e6529df4
Add copy-path tool (#27371)
<img width="631" alt="Screenshot 2025-03-24 at 11 01 10 AM"
src="https://github.com/user-attachments/assets/7e144619-83d0-4455-8d80-cc7ec6a7b03e"
/>

Release Notes:

- N/A
2025-03-24 21:21:55 -04:00
Richard Feldman
43712285bf
Add move_path tool (#27366)
<img width="629" alt="Screenshot 2025-03-24 at 10 06 39 AM"
src="https://github.com/user-attachments/assets/b099fcc0-b2f4-44ee-8c8f-416808363689"
/>

Release Notes:

- N/A

---------

Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-03-24 14:45:19 +00:00
Marshall Bowers
ed4e654fdf
assistant_tools: Add fetch tool (#26999)
This PR adds a new `fetch` tool to the set of tools the Assistant has
available.

This tool accepts a URL and fetches the content as Markdown.

<img width="1394" alt="Screenshot 2025-03-18 at 11 52 21 AM"
src="https://github.com/user-attachments/assets/e5bcde14-a0dd-4835-9d42-8f45def68f4d"
/>

<img width="1394" alt="Screenshot 2025-03-18 at 11 52 37 AM"
src="https://github.com/user-attachments/assets/3bcce4f5-f61b-40d7-8b30-2c673ce3c06a"
/>

Release Notes:

- N/A
2025-03-18 16:25:51 +00:00
Marshall Bowers
b70f21c08d
assistant_tools: Rename RegexSearchTool module to match the others (#27001)
This PR renames the `RegexSearchTool` module to `regex_search_tool.rs`
to match the other tools.

Release Notes:

- N/A
2025-03-18 16:20:15 +00:00
Richard Feldman
8490d0d4ef
Add thinking tool (#26675)
Release Notes:

- N/A
2025-03-14 16:26:22 -04:00
Antonio Scandurra
8cf5af1a84
Introduce DiagnosticsTool (#26670)
Release Notes:

- N/A
2025-03-13 14:53:00 +01:00
Agus Zubiaga
606aa7a78c
Edit tool debugging (#26637)
Adds an `debug: edit tool` action that opens a new view which will help
us debug the edit tool internals. As the edit tool runs, the log
displays:

- Instructions provided by the main model
- Response stream from the editor model
- Parsed edit blocks
- Tool output provided back to main model

The log automatically records all edit tool interactions for staff, so
if you notice something weird, you can debug it retroactively without
having to open the debug tool first. We may want to limit the number of
recorded requests later.

I have a few more ideas for it, but this seems like a good starting
point.


https://github.com/user-attachments/assets/c61f5ce8-08b1-4500-accb-db2a480eb3ab


Release Notes:

- N/A
2025-03-13 04:03:01 +00:00
Richard Feldman
6044773043
Add path search glob tool (#26567)
<img width="638" alt="Screenshot 2025-03-12 at 1 33 31 PM"
src="https://github.com/user-attachments/assets/f29b9dae-59eb-4d7a-bc26-aa4721cb829a"
/>

Release Notes:

- N/A
2025-03-12 22:00:54 +00:00
Richard Feldman
9be7934f12
Add Bash tool (#26597)
<img width="636" alt="Screenshot 2025-03-12 at 4 24 18 PM"
src="https://github.com/user-attachments/assets/6f317031-f495-4a5a-8260-79a56b10d628"
/>

<img width="634" alt="Screenshot 2025-03-12 at 4 24 36 PM"
src="https://github.com/user-attachments/assets/27283432-4f94-49f3-9d61-a0a9c737de40"
/>


Release Notes:

- N/A
2025-03-12 20:51:29 +00:00
Richard Feldman
be8f3b3791
Add delete-path tool (#26590)
Release Notes:

- N/A
2025-03-12 20:16:26 +00:00
Antonio Scandurra
6259ad559b
Add RegexSearchTool (#26555)
Release Notes:

- N/A
2025-03-12 16:23:15 +00:00
Antonio Scandurra
349f57381f
Add ListDirectoryTool (#26549)
Release Notes:

- N/A
2025-03-12 15:17:12 +00:00