Some MCP servers expose tools that take absolute paths as arguments. To
interact with these, the agent needs to know the absolute path to the
project directories, not just their names. This PR changes the system
prompt to include the full path to each worktree, and updates some tool
descriptions to reflect this.
Todo:
* [x] Run evals, make sure assistant still understand how to specify
paths for tools, now that we include abs paths in the system prompt.
Release Notes:
- Improved the agent's ability to use MPC tools that require absolute
paths to files and directories in the project.
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
This problem seems to be specific to Opus 4. Eval shows improvement from
89% to 97%.
Closes: https://github.com/zed-industries/zed/issues/32060
Release Notes:
- N/A
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Closes#29753
The template contains an error: `has_default_user_rules` is always
undefined and should be `has_user_rules` instead.
Release Notes:
- Fixed default user rules ignored during prompt building.
Adds popular examples of long-running commands to system prompt.
Unfortunately, I couldn't add an eval example as the new terminal tool
no longer works in `eval`. We can look into that tomorrow, but I'm
seeing improvements when manually testing this, so I'd like to merge it.
<img
src="https://github.com/user-attachments/assets/ac24e617-e068-466f-875d-c30e1f2465c4"
width=400></img>
Release Notes:
- agent: Discourage long-running commands
This change updates the system prompt to conditionally include
`grep`-related instructions based on whether the `grep` tool is enabled.
Implementation details:
1. Add a `has_tool` handlebars helper.
2. Pass the `model` to all locations where the prompt is built.
3. Use `{{#if has_tool "grep"}}` in the system prompt to gate
`grep`-specific instructions.
Testing:
- Unit tests for the `hasTool` helper.
- Unit tests to verify that `grep`-related instructions are included /
omitted from the prompt as appropriate.
- Manual agent evaluation:
- Setup: Asked the Agent "List all impls of MyTrait in the project"
using a custom "No tools" profile (all tools disabled).
- Before the change: The Agent attempted to call `grep`, encountered an
error, then realized the tool was unavailable.
- After the change: The Agent immediately asked to enable a search tool.
Note: in principle, `grep`/`read_file` tool descriptions alone might be
enough, but to confirm this we need more evaluation. If it turns out to
be true, we'll be able to remove grep-specific instructions from the
system prompt and undo this change.
Release Notes:
- N/A
This PR significantly improves the quality of the initial file search
that occurs when the model doesn't yet know the full path to a file it
needs to read/edit.
Previously, the assertions in file_search often failed on main as the
model attempted to guess full file paths. On this branch, it reliably
calls `find_path` (previously `path_search`) before reading files.
After getting the model to find paths first, I noticed it would try
using `grep` instead of `path_search`. This motivated renaming
`path_search` to `find_path` (continuing the analogy to unix commands)
and adding system prompt instructions about proper tool selection.
Note: I know the command is just called `find`, but that seemed too
general.
In my eval runs, the `file_search` example improved from 40% ± 10% to
98% ± 2%. The only assertion I'm seeing occasionally fail is "glob
starts with `**` or project". We can probably add some instructions in
that regard.
Release Notes:
- N/A
This PR renames the `regex_search` tool to `grep` because I think it
conveys more meaning to the model, the idea of searching the filesystem
with a regular expression. It's also one word and the model seems to be
using it effectively after some additional prompt tuning.
It also takes an include pattern to filter on the specific files we try
to search. I'd like to encourage the model to scope its searches more
aggressively, as in my testing, I'm only seeing it filter on file
extension.
Release Notes:
- N/A
Now that we've established a proper eval in tree, this PR is reboots of
our agent loop back to a set of minimal tools and simpler prompts. We
should aim to get this branch feeling subjectively competitive with
what's on main and then merge it, and build from there.
Let's invest in our eval and use it to drive better performance of the
agent loop. How you can help: Pick an example, and then make the outcome
faster or better. It's fine to even use your own subjective judgment, as
our evaluation criteria likely need tuning as well at this point. Focus
on making the agent work better in your own subjective experience first.
Let's focus on simple/practical improvements to make this thing work
better, then determine how we can craft our judgment criteria to lock
those improvements in.
Release Notes:
- N/A
---------
Co-authored-by: Max <max@zed.dev>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Agus <agus@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
Related to #28490.
- Default prompts from the prompt library are now included as "user
rules" in the system prompt.
- Presence of these user rules is shown at the beginning of the thread
in the UI.
_ Now uses an `Entity<PromptStore>` instead of an `Arc<PromptStore>`.
Motivation for this is emitting a `PromptsUpdatedEvent`.
- Now disallows concurrent reloading of the system prompt. Before this
change it was possible for reloads to race.
Release Notes:
- agent: Added support for including default prompts from the Prompt
Library as "user rules" in the system prompt.
---------
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Release Notes:
- agent: Replace `bash` tool with `terminal` tool which uses the current
shell
---------
Co-authored-by: Bennet <bennet@zed.dev>
Co-authored-by: Antonio <antonio@zed.dev>
Release Notes:
- Adjusted system prompt to direct it to never act on TODO-type comments
it encounters, unless the user directly asked it to do so or they relate
to the current task at hand.
Release Notes:
- Adjusted system prompt to direct it to never remove tests as a way to
have the test suite pass, unless the user directly asks for test
removal.
Sometimes agents do this. I've had some success responding by telling it
not to do this, so trying out having it in the system prompt.
Release Notes:
- Adjusted the system prompt to avoid incomplete code generation.
Currently, it's pretty common that when the agent gets stuck, it deletes
whatever it's stuck on and replaces it with a TODO comment, then
cheerfully reports that it has "simpified" the implementation. This is
worse than leaving the broken code, because at least a human could take
over and try to get it across the finish line.
This system prompt adjustment attempts to make the agent do something
more useful when in this situation: report that it's stuck, explain why
it's stuck, and ask the user what to do.
Release Notes:
- N/A
Adds some guidance for the assistant on how to respond to tool results
and other interactions
Release Notes:
- N/A
Co-authored-by: Richard Feldman <richard@zed.dev>