* Updates to `zed_llm_client-0.8.5` which adds support for `retry_after`
when anthropic provides it.
* Distinguishes upstream provider errors and rate limits from errors
that originate from zed's servers
* Moves `LanguageModelCompletionError::BadInputJson` to
`LanguageModelCompletionEvent::ToolUseJsonParseError`. While arguably
this is an error case, the logic in thread is cleaner with this move.
There is also precedent for inclusion of errors in the event type -
`CompletionRequestStatus::Failed` is how cloud errors arrive.
* Updates `PROVIDER_ID` / `PROVIDER_NAME` constants to use proper types
instead of `&str`, since they can be constructed in a const fashion.
* Removes use of `CLIENT_SUPPORTS_EXA_WEB_SEARCH_PROVIDER_HEADER_NAME`
as the server no longer reads this header and just defaults to that
behavior.
Release notes for this is covered by #33275
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Richard <richard@zed.dev>
Seeing this come up in our server logs when sending requests to
Anthropic: `final assistant content cannot end with trailing
whitespace`.
Release Notes:
- agent: Fixed an issue where Anthropic requests would sometimes fail
because of malformed assistant messages
<img width="484" alt="Screenshot 2025-06-25 at 2 26 16 PM"
src="https://github.com/user-attachments/assets/340f15d7-b115-4895-bae8-b12a915bfda1"
/>
<img width="460" alt="Screenshot 2025-06-25 at 2 26 08 PM"
src="https://github.com/user-attachments/assets/6e587a38-d542-405f-809f-402e87520538"
/>
Now we:
* Automatically retry up to 3 times on upstream Overloaded or 500 errors
(currently for Anthropic only; will add others in future PRs)
* Also automatically retry on rate limit errors (using the provided
duration to wait, if we were given one)
* Give you a notification if you don't have Zed open and we stopped the
thread because of an error
Still todo in future PRs:
* Update collab to report Overloaded and 500 errors differently if
collab itself is passing through an upstream error vs not (currently we
report these as "Zed's API is overloaded" when actually it's the
upstream one!)
* Updating providers other than Anthropic to categorize their errors so
that they benefit from this
* Expanding graceful error handling/retry to other things besides
Overloaded and 500 errors (e.g. connection reset)
Release Notes:
- Automatically retry in Agent Panel instead of erroring out when an
upstream AI API is overloaded or 500s
- Show a notification when an Agent thread errors out and Zed is not the
active window
While working on retries, I discovered some opportunities to reduce
cloning of message segments. These segments have full `String`s (not
`SharedString`s), so cloning them means copying cloning all the bytes of
all the strings in the message, which would be nice to avoid!
Release Notes:
- N/A
The agent now checks if a tool is enabled in the current profile before
calling it. Previously, the agent could still call disabled tools, which
commonly happened after switching profiles in the middle of a thread.
Release Notes:
- Fixed a bug where the agent could use disabled tools sometimes
cc @osyvokon
We were seeing a bunch of errors in our backend when people were using
Claude models with thinking enabled.
In the logs we would see
> an error occurred while interacting with the Anthropic API:
invalid_request_error: messages.x.content.0.type: Expected `thinking` or
`redacted_thinking`, but found `text`. When `thinking` is enabled, a
final `assistant` message must start with a thinking block (preceeding
the lastmost set of `tool_use` and `tool_result` blocks). We recommend
you include thinking blocks from previous turns. To avoid this
requirement, disable `thinking`. Please consult our documentation at
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
However, this issue did not occur frequently and was not easily
reproducible. Turns out it was triggered by us not correctly handling
[Redacted Thinking
Blocks](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#thinking-redaction).
I could constantly reproduce this issue by including this magic string:
`ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
` in the request, which forces `claude-3-7-sonnet` to emit redacted
thinking blocks (confusingly the magic string does not seem to be
working for `claude-sonnet-4`). As soon as we hit a tool call Anthropic
would return an error.
Thanks to @osyvokon for pointing me in the right direction 😄!
Release Notes:
- agent: Fixed an issue where Anthropic models would sometimes return an
error when thinking was enabled
This PR moves the UI-dependent logic in the `agent` crate into its own
crate, `agent_ui`. The remaining `agent` crate no longer depends on
`editor`, `picker`, `ui`, `workspace`, etc.
This has compile time benefits, but the main motivation is to isolate
our core agentic logic, so that we can make agents more
pluggable/configurable.
Release Notes:
- N/A
This PR is in preparation for doing automatic retries for certain
errors, e.g. Overloaded. It doesn't change behavior yet (aside from some
granularity of error messages shown to the user), but rather mostly
changes some error handling to be exhaustive enum matches instead of
`anyhow` downcasts, and leaves some comments for where the behavior
change will be in a future PR.
Release Notes:
- N/A
Closes#31903
Release Notes:
- agent: Fix an issue where an error would occur when MCP servers
specified tools with the same name
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Having `Thread::last_usage` as an override of the initially fetched
usage could cause the initial usage to be displayed when the current
thread is empty or in text threads. Fix is to just store last usage info
in `UserStore` and not have these overrides
Release Notes:
- Agent: Fixed request usage display to always include the most recently
known usage - there were some cases where it would show the initially
requested usage.
Removing it for two reasons:
1. We need a better implementation that doesn't hurt caching and doesn't
distracts the agent too much (see
https://github.com/zed-industries/zed/pull/32876 for more context)
2. Current insertion point of notifications doesn't play well with
Claude Thinking models (see
https://github.com/zed-industries/zed/issues/33000#issuecomment-2991709484)
I think we should get this code back in a form of a tool. But for now,
I'm dropping it to resolve recent issues.
Closes#33000
Release Notes:
- N/A
Previously we were using a mix of `u32` and `usize`, e.g. `max_tokens:
usize, max_output_tokens: Option<u32>` in the same `struct`.
Although [tiktoken](https://github.com/openai/tiktoken) uses `usize`,
token counts should be consistent across targets (e.g. the same model
doesn't suddenly get a smaller context window if you're compiling for
wasm32), and these token counts could end up getting serialized using a
binary protocol, so `usize` is not the right choice for token counts.
I chose to standardize on `u64` over `u32` because we don't store many
of them (so the extra size should be insignificant) and future models
may exceed `u32::MAX` tokens.
Release Notes:
- N/A
When the user edits one of the tracked files, we used to notify the
agent by inserting a user message at the end of the thread. This was
causing a few problems:
- The agent would stop doing its work and start reading changed files
- The agent would write something like, "Thank you for letting me know
about these changed files."
This fix contains two parts:
1. Changing the prompt to indicate this is a service message
2. Moving the message higher in the conversation thread
This works, but it slightly hurts caching.
We may consider making these notification messages stick in history,
trading context tokens count for the cache.
This might be related to #30906
Release Notes:
- N/A
---------
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Bubbles up rate limit information so that we can retry after a certain
duration if needed higher up in the stack.
Also caps the number of concurrent evals running at once to also help.
Release Notes:
- N/A
This allows storing the profile per thread, as well as moving the logic
of which tools are enabled or not to the profile itself.
This makes it much easier to switch between profiles, means there is
less global state being changed on every profile change.
Release Notes:
- agent panel: allow saving the profile per thread
---------
Co-authored-by: Ben Kunkle <ben.kunkle@gmail.com>
Co-authored-by: Cole Miller <m@cole-miller.net>
Release Notes:
- agent: Fixed a panic when re-editing old messages
---------
Co-authored-by: Cole Miller <m@cole-miller.net>
Co-authored-by: Cole Miller <cole@zed.dev>
This PR introduces the "Reject All" and "Accept All" buttons in the
panel's edit bar, which appears as soon as the agent starts editing a
file. I'm also adding here a new method to the thread called
`has_pending_edit_tool_uses`, which is a more specific way of knowing,
in comparison to the `is_generating` method, whether or not the
reject/accept all actions can be triggered.
Previously, without this new method, you'd be waiting for the whole
generation to end (e.g., the agent would be generating markdown with
things like change summary) to be able to click those buttons, when the
edit was already there, ready for you. It always felt like waiting for
the whole thing was unnecessary when you really wanted to just wait for
the _edits_ to be done, as so to avoid any potential conflicting state.
<img
src="https://github.com/user-attachments/assets/0927f3a6-c9ee-46ae-8f7b-97157d39a7b5"
width="500"/>
---
Release Notes:
- agent: Added ability to reject and accept all changes from the agent
panel.
---------
Co-authored-by: Agus Zubiaga <hi@aguz.me>
When reaching the consecutive tool call limit, the agent gets blocked
and without a notification, you wouldn't know that. This PR adds the
ability to be notified when that happens, and you can use either sound
_and_ toast, or just one of them.
Release Notes:
- agent: Added support for getting notified (via toast and/or sound)
when reaching the consecutive tool call limit.
This PR adds a Danger check to remind engineers that any changes to our
various prompts need to be verified against the LLM Worker.
When changes to the prompt files are detected, we will fail the PR with
a message:
<img width="929" alt="Screenshot 2025-05-30 at 8 40 58 AM"
src="https://github.com/user-attachments/assets/79afab4e-e799-45f1-a90e-0fd7c9a73706"
/>
Once the corresponding changes have been made (or no changes to the LLM
Worker have been determined to be necessary), including the indicated
attestation message will convert the errors into informational messages:
<img width="926" alt="Screenshot 2025-05-30 at 8 41 52 AM"
src="https://github.com/user-attachments/assets/ff51c17a-7a76-46a7-b468-a7d864d480c3"
/>
Release Notes:
- N/A
This PR adds a new `intent` field to completion requests to assist in
categorizing them correctly.
Release Notes:
- N/A
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
This expands our deserialization of JSON from models to be more tolerant
of different variations that the model may send, including
capitalization, wrapping things in objects vs. being plain strings, etc.
Also when deserialization fails, it reports the entire error in the JSON
so we can see what failed to deserialize. (Previously these errors were
very unhelpful at diagnosing the problem.)
Finally, also removes the `WrappedText` variant since the custom
deserializer just turns that style of JSON into a normal `Text` variant.
Release Notes:
- N/A
This PR improves the consecutive tool call UX by allowing users to
quickly continue an interrupted with one-click. What we do here is
insert a hidden "Continue" message that will just nudge the LLM to keep
going. We're also using the opportunity to upsell the previously called
"Max Mode", now rebranded as "Burn Mode", which allows users to don't be
interrupted anymore if they ever have 25 consecutive tool calls again.
Release Notes:
- agent: Improve consecutive tool call UX by allowing users to quickly
continue an interrupted thread with one click.
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
This PR replaces some `update()` calls with either `read()` or
`read_with()` when the `update()` call performed read-only operations on
the entity.
Many more likely exist, will follow-up with more PRs.
Release Notes:
- N/A
This is a follow-up to https://github.com/zed-industries/zed/pull/31217
that removes the last turn after we get a `refusal` stop reason, as
advised by the Anthropic docs.
Meant to include it in that PR, but accidentally merged it before
pushing these changes 🤦🏻♂️.
Release Notes:
- N/A
https://github.com/zed-industries/zed/issues/30972 brought up another
case where our context is not enough to track the actual source of the
issue: we get a general top-level error without inner error.
The reason for this was `.ok_or_else(|| anyhow!("failed to read HEAD
SHA"))?; ` on the top level.
The PR finally reworks the way we use anyhow to reduce such issues (or
at least make it simpler to bubble them up later in a fix).
On top of that, uses a few more anyhow methods for better readability.
* `.ok_or_else(|| anyhow!("..."))`, `map_err` and other similar error
conversion/option reporting cases are replaced with `context` and
`with_context` calls
* in addition to that, various `anyhow!("failed to do ...")` are
stripped with `.context("Doing ...")` messages instead to remove the
parasitic `failed to` text
* `anyhow::ensure!` is used instead of `if ... { return Err(...); }`
calls
* `anyhow::bail!` is used instead of `return Err(anyhow!(...));`
Release Notes:
- N/A
Some providers sometimes send `{ "type": "text", "text": ... }` instead
of just the text as a string. Now we accept those instead of erroring.
Release Notes:
- N/A
1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.
With this change, the pass rate of the new eval increased from 10% to
100%.
2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.
3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.
Release Notes:
- N/A
This is very basic support for them. There are a number of other TODOs
before this is really a first-class supported feature, so not adding any
release notes for it; for now, this PR just makes it so that if
read_file tries to read a PNG (which has come up in practice), it at
least correctly sends it to Anthropic instead of messing up.
This also lays the groundwork for future PRs for more first-class
support for images in tool calls across more image file formats and LLM
providers.
Release Notes:
- N/A
---------
Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
The title of a (text) thread would get stuck in "Loading Summary..."
when the request to generate it failed. We now handle this case by
falling back to the default title, and letting the user manually edit
the title or retry generating it.
https://github.com/user-attachments/assets/898d26ad-d31f-4b62-9b05-519d923b1b22
Release Notes:
- agent: Handle thread title generation errors
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
This reverts commit 3615d6d96c.
Ultimately, we want to restore the ability to store a profile
per-thread, but for now reverting this fixes a fairly disruptive bug.
Release Notes:
- Fixed a bug causing the agent to use the wrong profile in some cases.
This allows us to debug the raw edits that were generated when people
report feedback, when running evals and when opening the thread as
Markdown.
Release Notes:
- Improved debug output for agent threads.
When deciding if a model supports tools or not, we weren't reading from
the configured model in a given thread.
This also stores the profile on the thread, which matches the behavior
of the Model and Max Mode, which we also already store per thread.
Hopefully this helps alleviate some confusion.
Release Notes:
- agent: Save profile selection per-Agent thread
Release Notes:
- Fixed a bug that would prevent the agent from working over SSH.
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Cole Miller <m@cole-miller.net>
This PR makes it so we send up an `x-zed-version` header with the
client's version when making a request to llm.zed.dev for edit
predictions and completions.
Release Notes:
- N/A
Adds a new `agent.model_parameters` setting that allows the user to
specify a custom temperature for a provider AND/OR model:
```json5
"model_parameters": [
// To set parameters for all requests to OpenAI models:
{
"provider": "openai",
"temperature": 0.5
},
// To set parameters for all requests in general:
{
"temperature": 0
},
// To set parameters for a specific provider and model:
{
"provider": "zed.dev",
"model": "claude-3-7-sonnet-latest",
"temperature": 1.0
}
],
```
Release Notes:
- agent: Allow customizing temperature by provider/model
---------
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Supersedes: https://github.com/zed-industries/zed/pull/29936
Thanks for your contribution @imumesh18, but we had a slightly different
take on it :)
Release Notes:
- N/A
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>