This PR updates the Agent panel to work with the `CloudUserStore`
instead of the `UserStore`, reducing its reliance on being connected to
Collab to function.
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
This includes making sure that both the agent panel and Zed's edit
prediction have a consistent narrative when it comes to onboarding users
into the AI features, considering the possible different plans and
conditions (such as being signed in/out, account age, etc.)
Release Notes:
- N/A
---------
Co-authored-by: Bennet Bo Fenner <53836821+bennetbo@users.noreply.github.com>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
This PR makes it so all LLM traffic is routed through `cloud.zed.dev`.
We're already routing `llm.zed.dev` to `cloud.zed.dev` on the server,
but we want to standardize on `cloud.zed.dev` moving forward.
Release Notes:
- N/A
This PR makes it so we refresh the list of models whenever the LLM token
is refreshed.
This allows us to add or remove models based on the plan in the new
token.
Release Notes:
- Fixed model list not refreshing when subscribing to Zed Pro.
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
This introduces a new field `thinking_allowed` on `LanguageModelRequest`
which lets us control whether thinking should be enabled if the model
supports it.
We permit thinking in the Inline Assistant, Edit File tool and the Git
Commit message generator, this should make generation faster when using
a thinking model, e.g. `claude-sonnet-4-thinking`
Release Notes:
- N/A
This PR adds a new `zed-cloud` feature flag that can be used to send
traffic to `cloud.zed.dev` instead of `llm.zed.dev`.
This is just so Zed staff can test the new infrastructure. When we're
ready for prime-time we'll reroute traffic on the server.
Release Notes:
- N/A
As we are in the process of improving our Onboarding UX for Zed AI, I
added component previews for the Zed AI Configuration section. This
should make it easier to inspect the different states we can run into.
<img width="1198" alt="image"
src="https://github.com/user-attachments/assets/eb774f27-9091-450d-bfae-c688d533c25e"
/>
Release Notes:
- N/A
* Updates to `zed_llm_client-0.8.5` which adds support for `retry_after`
when anthropic provides it.
* Distinguishes upstream provider errors and rate limits from errors
that originate from zed's servers
* Moves `LanguageModelCompletionError::BadInputJson` to
`LanguageModelCompletionEvent::ToolUseJsonParseError`. While arguably
this is an error case, the logic in thread is cleaner with this move.
There is also precedent for inclusion of errors in the event type -
`CompletionRequestStatus::Failed` is how cloud errors arrive.
* Updates `PROVIDER_ID` / `PROVIDER_NAME` constants to use proper types
instead of `&str`, since they can be constructed in a const fashion.
* Removes use of `CLIENT_SUPPORTS_EXA_WEB_SEARCH_PROVIDER_HEADER_NAME`
as the server no longer reads this header and just defaults to that
behavior.
Release notes for this is covered by #33275
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Richard <richard@zed.dev>
Having `Thread::last_usage` as an override of the initially fetched
usage could cause the initial usage to be displayed when the current
thread is empty or in text threads. Fix is to just store last usage info
in `UserStore` and not have these overrides
Release Notes:
- Agent: Fixed request usage display to always include the most recently
known usage - there were some cases where it would show the initially
requested usage.
Previously we were using a mix of `u32` and `usize`, e.g. `max_tokens:
usize, max_output_tokens: Option<u32>` in the same `struct`.
Although [tiktoken](https://github.com/openai/tiktoken) uses `usize`,
token counts should be consistent across targets (e.g. the same model
doesn't suddenly get a smaller context window if you're compiling for
wasm32), and these token counts could end up getting serialized using a
binary protocol, so `usize` is not the right choice for token counts.
I chose to standardize on `u64` over `u32` because we don't store many
of them (so the extra size should be insignificant) and future models
may exceed `u32::MAX` tokens.
Release Notes:
- N/A
Bubbles up rate limit information so that we can retry after a certain
duration if needed higher up in the stack.
Also caps the number of concurrent evals running at once to also help.
Release Notes:
- N/A
Closes#31243
As described in my issue, the [thinking
budget](https://ai.google.dev/gemini-api/docs/thinking) gets
automatically chosen by Gemini unless it is specifically set to
something. In order to have fast responses (inline assistant) I prefer
to set it to 0.
Release Notes:
- ai: Added `thinking` mode for custom Google models with configurable
token budget
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
This PR adds a new `intent` field to completion requests to assist in
categorizing them correctly.
Release Notes:
- N/A
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
This PR updates the Zed LLM provider to fetch the available models from
the server instead of hard-coding them in the binary.
Release Notes:
- Updated the Zed provider to fetch the list of available language
models from the server.
This PR updates the default/recommended models for the Anthropic and Zed
providers to be Claude Sonnet 4.
Release Notes:
- Updated default/recommended Anthropic models to Claude Sonnet 4.
This PR adds support for [Claude
4](https://www.anthropic.com/news/claude-4).
Release Notes:
- Added support for Claude Opus 4 and Claude Sonnet 4.
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
https://github.com/zed-industries/zed/issues/30972 brought up another
case where our context is not enough to track the actual source of the
issue: we get a general top-level error without inner error.
The reason for this was `.ok_or_else(|| anyhow!("failed to read HEAD
SHA"))?; ` on the top level.
The PR finally reworks the way we use anyhow to reduce such issues (or
at least make it simpler to bubble them up later in a fix).
On top of that, uses a few more anyhow methods for better readability.
* `.ok_or_else(|| anyhow!("..."))`, `map_err` and other similar error
conversion/option reporting cases are replaced with `context` and
`with_context` calls
* in addition to that, various `anyhow!("failed to do ...")` are
stripped with `.context("Doing ...")` messages instead to remove the
parasitic `failed to` text
* `anyhow::ensure!` is used instead of `if ... { return Err(...); }`
calls
* `anyhow::bail!` is used instead of `return Err(anyhow!(...));`
Release Notes:
- N/A
This is very basic support for them. There are a number of other TODOs
before this is really a first-class supported feature, so not adding any
release notes for it; for now, this PR just makes it so that if
read_file tries to read a PNG (which has come up in practice), it at
least correctly sends it to Anthropic instead of messing up.
This also lays the groundwork for future PRs for more first-class
support for images in tool calls across more image file formats and LLM
providers.
Release Notes:
- N/A
---------
Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
This PR removes the individual URL overrides for the LLM service.
We initially had `ZED_PREDICT_EDITS_URL` to allow for directing traffic
to the LLM Worker back when there was still the split of the
Collab-based LLM Service and the Cloudflare-based LLM Worker.
But now that all of the LLM functionality has been moved into the
Worker, we can just direct all traffic there.
Release Notes:
- N/A
This PR updates the copy around the Zed Pro description to be more
accurate.
Release Notes:
- agent: Updated some copy about Zed Pro in the configuration view.
This PR makes it so we send up an `x-zed-version` header with the
client's version when making a request to llm.zed.dev for edit
predictions and completions.
Release Notes:
- N/A
* `CountTokensRequest` now takes a full `GenerateContentRequest` instead
of just content.
* Fixes use of `models/` prefix in `model` field of
`GenerateContentRequest`, since that's required for use in
`CountTokensRequest`. This didn't cause issues before because it was
always cleared and used in the path.
Release Notes:
- N/A
This PR adds a notice when reaching consecutive tool use limits when
using normal mode.
Here's an example with the limit artificially lowered to 2 consecutive
tool uses:
https://github.com/user-attachments/assets/32da8d38-67de-4d6b-8f24-754d2518e5d4
Release Notes:
- agent: Added a notice when reaching consecutive tool use limits when
using a model in normal mode.
This sets us up to display queue position information to the user, once
our language model backend is updated to support request queuing.
The JSON returned by the LLM backend will need to look like this:
```json
{"queue": {"status": "queued", "position": 1}}
{"queue": {"status": "started"}}
{"event": {"THE_UPSTREAM_MODEL_PROVIDER_EVENT": "..."}}
```
Release Notes:
- N/A
---------
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
This PR changes the default fast model for the Zed provider from Claude
3.5 Haiku to Claude 3.5 Sonnet.
We don't offer Claude 3.5 Haiku to users.
Closes https://github.com/zed-industries/zed/issues/29505.
Release Notes:
- agent: Changed the default fast model for the Zed provider to Claude
3.5 Sonnet.
This PR makes it so we pass up the `mode` from the
`LanguageModelRequest` when interacting with the Zed provider instead of
passing a hard-coded value.
Release Notes:
- N/A
This PR adds the `FeatureFlag` suffix to the feature flag types that
were missing them.
This makes the names easier to search in the codebase.
Release Notes:
- N/A
This PR updates the Zed provider to use the `POST /completions`
endpoint.
There is no functional difference from `POST /completion`, but the
pluralized version reads better.
Release Notes:
- N/A
This PR wires the counting of Google AI tokens back up.
It now goes through the LLM service instead of collab's RPC.
Still only available for Zed staff.
Release Notes:
- N/A
This PR removes the `CountLanguageModelTokens` RPC message from collab.
We were only using this for Google AI models through the Zed provider
(which is only available to Zed staff).
For now we're returning `0`, but will bring back soon.
Release Notes:
- N/A