This PR makes it so we refresh the list of models whenever the LLM token
is refreshed.
This allows us to add or remove models based on the plan in the new
token.
Release Notes:
- Fixed model list not refreshing when subscribing to Zed Pro.
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
This introduces a new field `thinking_allowed` on `LanguageModelRequest`
which lets us control whether thinking should be enabled if the model
supports it.
We permit thinking in the Inline Assistant, Edit File tool and the Git
Commit message generator, this should make generation faster when using
a thinking model, e.g. `claude-sonnet-4-thinking`
Release Notes:
- N/A
This PR adds a new `zed-cloud` feature flag that can be used to send
traffic to `cloud.zed.dev` instead of `llm.zed.dev`.
This is just so Zed staff can test the new infrastructure. When we're
ready for prime-time we'll reroute traffic on the server.
Release Notes:
- N/A
Per [GitHub's documentation for VSCode's agent
mode](https://docs.github.com/en/copilot/how-tos/chat/asking-github-copilot-questions-in-your-ide#agent-mode),
a premium request is charged per user-submitted prompt. rather than per
individual request the agent makes to an LLM. This PR matches Zed's
functionality to VSCode's, accurately indicating to GitHub's API whether
a given request is initiated by the user or by an agent, allowing a user
to be metered only for prompts they send.
See also: #31068
Release Notes:
- Improve Copilot premium request tracking
As we are in the process of improving our Onboarding UX for Zed AI, I
added component previews for the Zed AI Configuration section. This
should make it easier to inspect the different states we can run into.
<img width="1198" alt="image"
src="https://github.com/user-attachments/assets/eb774f27-9091-450d-bfae-c688d533c25e"
/>
Release Notes:
- N/A
Closes#26030
Release Notes:
- Fixed Bedrock bug causing streaming responses to return as one big
chunk
---------
Co-authored-by: Peter Tripp <peter@zed.dev>
* Updates to `zed_llm_client-0.8.5` which adds support for `retry_after`
when anthropic provides it.
* Distinguishes upstream provider errors and rate limits from errors
that originate from zed's servers
* Moves `LanguageModelCompletionError::BadInputJson` to
`LanguageModelCompletionEvent::ToolUseJsonParseError`. While arguably
this is an error case, the logic in thread is cleaner with this move.
There is also precedent for inclusion of errors in the event type -
`CompletionRequestStatus::Failed` is how cloud errors arrive.
* Updates `PROVIDER_ID` / `PROVIDER_NAME` constants to use proper types
instead of `&str`, since they can be constructed in a const fashion.
* Removes use of `CLIENT_SUPPORTS_EXA_WEB_SEARCH_PROVIDER_HEADER_NAME`
as the server no longer reads this header and just defaults to that
behavior.
Release notes for this is covered by #33275
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Richard <richard@zed.dev>
Seeing this come up in our server logs when sending requests to
Anthropic: `final assistant content cannot end with trailing
whitespace`.
Release Notes:
- agent: Fixed an issue where Anthropic requests would sometimes fail
because of malformed assistant messages
Context: In this PR: https://github.com/zed-industries/zed/pull/33362,
we started to use underlying open_ai crate for making api calls for
vercel as well. Now whenever we get the error we get something like the
below. Where on part of the error mentions OpenAI but the rest of the
error returns the actual error from provider. This PR tries to make the
error generic for now so that people don't get confused seeing OpenAI in
their v0 integration.
```
Error interacting with language model
Failed to connect to OpenAI API: 403 Forbidden {"success":false,"error":"Premium or Team plan required to access the v0 API: https://v0.dev/chat/settings/billing"}
```
Release Notes:
- N/A
This cleans up our settings to not include any `version` fields, as we
have an actual settings migrator now.
This PR removes `language_models > anthropic > version`,
`language_models > openai > version` and `agent > version`.
We had migration paths in the code for a long time, so in practice
almost everyone should be using the latest version of these settings.
Release Notes:
- Remove `version` fields in settings for `agent`, `language_models >
anthropic`, `language_models > openai`. Your settings will automatically
be migrated. If you're running into issues with this open an issue
[here](https://github.com/zed-industries/zed/issues)
Follow up to previous PRs:
- Return `true` in `supports_images` - v0 supports images already
- Rename model id to match the exact version of the model `v0-1.5-md`
(For now we do not expose `sm`/`lg` variants since they seem not to be
available via the API)
- Provide autocompletion in settings for using `vercel` as a `provider`
Release Notes:
- N/A
Closes#30714
Bedrock converse api expect to see tool options if at least one tool was
used in conversation in the past messages.
Right now if `LanguageModelToolChoice::None` isn't supported edit agent
[remove][1] tools from request. That point breaks Converse API of
Bedrock. As was proposed in [the issue][2] we won't drop tool choose but
instead will deny any of them if model will respond with a tool choose.
[1]:
fceba6c795/crates/assistant_tools/src/edit_agent.rs (L703)
[2]:
https://github.com/zed-industries/zed/issues/30714#issuecomment-2886422716
Release Notes:
- Fixed bedrock tool calls in edit mode
cc @osyvokon
We were seeing a bunch of errors in our backend when people were using
Claude models with thinking enabled.
In the logs we would see
> an error occurred while interacting with the Anthropic API:
invalid_request_error: messages.x.content.0.type: Expected `thinking` or
`redacted_thinking`, but found `text`. When `thinking` is enabled, a
final `assistant` message must start with a thinking block (preceeding
the lastmost set of `tool_use` and `tool_result` blocks). We recommend
you include thinking blocks from previous turns. To avoid this
requirement, disable `thinking`. Please consult our documentation at
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
However, this issue did not occur frequently and was not easily
reproducible. Turns out it was triggered by us not correctly handling
[Redacted Thinking
Blocks](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#thinking-redaction).
I could constantly reproduce this issue by including this magic string:
`ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
` in the request, which forces `claude-3-7-sonnet` to emit redacted
thinking blocks (confusingly the magic string does not seem to be
working for `claude-sonnet-4`). As soon as we hit a tool call Anthropic
would return an error.
Thanks to @osyvokon for pointing me in the right direction 😄!
Release Notes:
- agent: Fixed an issue where Anthropic models would sometimes return an
error when thinking was enabled
Vercel v0 is an OpenAI-compatible model, so this is mostly a dupe of the
OpenAI provider files with some adaptations for v0, including going
ahead and using the custom endpoint for the API URL field.
Release Notes:
- Added support for Vercel as a language model provider.
This PR is in preparation for doing automatic retries for certain
errors, e.g. Overloaded. It doesn't change behavior yet (aside from some
granularity of error messages shown to the user), but rather mostly
changes some error handling to be exhaustive enum matches instead of
`anyhow` downcasts, and leaves some comments for where the behavior
change will be in a future PR.
Release Notes:
- N/A
Did some bit cleanup of code for loading models for settings as that is
not required as we are fetching all the models from openrouter so it's
better to maintain one source of truth
Release Notes:
- Add thinking support to OpenRouter provider
Having `Thread::last_usage` as an override of the initially fetched
usage could cause the initial usage to be displayed when the current
thread is empty or in text threads. Fix is to just store last usage info
in `UserStore` and not have these overrides
Release Notes:
- Agent: Fixed request usage display to always include the most recently
known usage - there were some cases where it would show the initially
requested usage.
Previously, the OpenRouter models list (~412kb) was being downloaded
around 10 times during startup -- even when OpenRouter was not
configured.
This update addresses the issue by:
1. Fetching the models list only when OpenRouter settings change.
2. Skipping API calls if OpenRouter is not configured.
Release Notes:
- Avoid unnecessary requests to OpenRouter
Ran into this while adding support for Vercel v0s models:
- The timestamp seems to be returned in Milliseconds instead of seconds
so it breaks the bounds of `created: u32`. We did not use this field
anywhere so just decided to remove it
- Sometimes the `choices` field can be empty when the last chunk comes
in because it only contains `usage`
Release Notes:
- N/A
The `api_url` setting is one that most providers already support and can
be changed via the `settings.json`. We're adding the ability to change
it via the UI for OpenAI specifically so it can be more easily connected
to v0.
Release Notes:
- agent: Added ability to change the API base URL for OpenAI via the UI
---------
Co-authored-by: Bennet Bo Fenner <53836821+bennetbo@users.noreply.github.com>
Related to #32888, but will not fix the issue.
Turns out these assertions are wrong (Not sure if they were correct at
some point).
I tested with this code:
```
request = LanguageModelRequest {
messages: vec![
LanguageModelRequestMessage {
role: Role::User,
content: vec![MessageContent::Text("Give me 10 jokes".to_string())],
cache: false,
},
LanguageModelRequestMessage {
role: Role::Assistant,
content: vec![MessageContent::Text("Sure, here are 10 jokes:".to_string())],
cache: false,
},
],
..request
};
```
The API happily accepted this and Claude proceeded to tell me 10 jokes.
Release Notes:
- N/A
Updates google_ai to use latest model information from the respective
model cards: https://ai.google.dev/gemini-api/docs/models
Release Notes:
- google: Update to latest Gemini 2.5 models
Previously we were using a mix of `u32` and `usize`, e.g. `max_tokens:
usize, max_output_tokens: Option<u32>` in the same `struct`.
Although [tiktoken](https://github.com/openai/tiktoken) uses `usize`,
token counts should be consistent across targets (e.g. the same model
doesn't suddenly get a smaller context window if you're compiling for
wasm32), and these token counts could end up getting serialized using a
binary protocol, so `usize` is not the right choice for token counts.
I chose to standardize on `u64` over `u32` because we don't store many
of them (so the extra size should be insignificant) and future models
may exceed `u32::MAX` tokens.
Release Notes:
- N/A
We push the usage data whenever we receive it from the provider to make
sure the counting is correct after the turn has ended.
- [x] Ollama
- [x] Copilot
- [x] Mistral
- [x] OpenRouter
- [x] LMStudio
Put all the changes into a single PR open to move these to separate PR
if that makes the review and testing easier.
Release Notes:
- N/A
This addresses:
https://github.com/zed-industries/zed/pull/32248#issuecomment-2952060834.
This PR address two main things one allowing enterprise users to use
copilot chat and completion while also introducing the new way to handle
copilot url specific their subscription. Simplifying the UX around the
github copilot and removes the burden of users figuring out what url to
use for their subscription.
- [x] Pass enterprise_uri to copilot lsp so that it can redirect users
to their enterprise server. Ref:
https://github.com/github/copilot-language-server-release#configuration-management
- [x] Remove the old ui and config language_models.copilot which allowed
users to specify their copilot_chat specific endpoint. We now derive
that automatically using token endpoint for copilot so that we can send
the requests to specific copilot endpoint for depending upon the url
returned by copilot server.
- [x] Tested this for checking the both enterprise and non-enterprise
flow work. Thanks to @theherk for the help to debug and test it.
- [ ] Udpdate the zed.dev/docs to refelect how to setup enterprise
copilot.
What this doesn't do at the moment:
* Currently zed doesn't allow to have two seperate accounts as the token
used in chat is same as the one generated by lsp. After this changes
also this behaviour remains same and users can't have both enterprise
and personal copilot installed.
P.S: Might need to do some bit of code cleanup and other things but
overall I felt this PR was ready for atleast first pass of review to
gather feedback around the implementation and code itself.
Release Notes:
- Add enterprise support for GitHub copilot
---------
Signed-off-by: Umesh Yadav <git@umesh.dev>
- [x] Manual Testing(Tested this with Qwen2.5 VL 32B Instruct (free) and
Llama 4 Scout (free), Llama 4 Maverick (free). Llama models have some
issues in write profile due to one of the in built tools schema, so I
tested it with minimal profile.
Closes #ISSUE
Release Notes:
- Add image support to OpenRouter models
---------
Signed-off-by: Umesh Yadav <umesh4257@gmail.com>
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Bubbles up rate limit information so that we can retry after a certain
duration if needed higher up in the stack.
Also caps the number of concurrent evals running at once to also help.
Release Notes:
- N/A
Add support for environment variables as authentication alternatives to
OAuth flow for Copilot. Closes#31172
We can include the token in HTTPS request headers to hopefully resolve
the rate limiting issue in #9483. This change will be part of a separate
PR.
Release Notes:
- Added support for manually providing an OAuth token for GitHub Copilot
Chat by assigning the GH_COPILOT_TOKEN environment variable
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Tested with following models. Hallucinates with whites outline images
like white lined zed logo but works fine with zed black outlined logo:
Pixtral 12B (pixtral-12b-latest)
Pixtral Large (pixtral-large-latest)
Mistral Medium (mistral-medium-latest)
Mistral Small (mistral-small-latest)
After this PR, almost all of the zed's llm provider who support images
are now supported. Only remaining one is LMStudio. Hopefully we will get
that one as well soon.
Release Notes:
- Add support for images to mistral models
---------
Signed-off-by: Umesh Yadav <git@umesh.dev>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Bennet Bo Fenner <bennet@zed.dev>
It works similar to how deepseek works where the thinking is returned as
reasoning_content and we don't have to send the reasoning_content back
in the request.
This is a experiment feature which can be enabled from settings like
this:
<img width="1381" alt="Screenshot 2025-06-08 at 4 26 06 PM"
src="https://github.com/user-attachments/assets/d2f60f3c-0f93-45fc-bae2-4ded42981820"
/>
Here is how it looks to use(tested with
`deepseek/deepseek-r1-0528-qwen3-8b`
<img width="528" alt="Screenshot 2025-06-08 at 5 12 33 PM"
src="https://github.com/user-attachments/assets/f7716f52-5417-4f14-82b8-e853de054f63"
/>
Release Notes:
- Add thinking support to LM Studio provider
For DeepSeek provider thinking is returned as reasoning_content and we
don't have to send the reasoning_content back in the request.
Release Notes:
- Add thinking support to DeepSeek provider
Recently in this PR: https://github.com/zed-industries/zed/pull/32248
github copilot settings was introduced. This had missing settings update
which was leading to github copilot models not getting fetched. This had
missing subscription to update the settings inside the copilot language
model provider. Which caused it not show models at all.
cc @osiewicz
Release Notes:
- N/A
---------
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
The info for `max_tokens` for the model is included in
`{api_url}/models`
I don't think this needs to be `.clamp` like in
`crates/ollama/src/ollama.rs` `get_max_tokens`, but it might need to be
## Before:
Every model shows 2k

## After:

### Json from `{api_url}/models` with model not loaded
```json
{
"id": "qwen2.5-coder-1.5b-instruct-mlx",
"object": "model",
"type": "llm",
"publisher": "lmstudio-community",
"arch": "qwen2",
"compatibility_type": "mlx",
"quantization": "4bit",
"state": "not-loaded",
"max_context_length": 32768
},
```
## Notes
The response from `{api_url}/models` seems to return the `max_tokens`
for the model, not the currently configured context length, but I think
showing the `max_tokens` for the model is better than setting 2k for
everything
`loaded_context_length` exists, but only if the model is loaded at the
startup of zed, which usually isn't the case
maybe `fetch_models` should be rerun when swapping lmstudio models
### Currently configured context
this isn't shown in `{api_url}/models`

### Json from `{api_url}/models` with model loaded
```json
{
"id": "qwen2.5-coder-1.5b-instruct-mlx",
"object": "model",
"type": "llm",
"publisher": "lmstudio-community",
"arch": "qwen2",
"compatibility_type": "mlx",
"quantization": "4bit",
"state": "loaded",
"max_context_length": 32768,
"loaded_context_length": 4096
},
```
Release Notes:
- lmstudio: Fixed showing `max_tokens` in the assistant panel
---------
Co-authored-by: Peter Tripp <peter@zed.dev>