Previously we were using a mix of `u32` and `usize`, e.g. `max_tokens:
usize, max_output_tokens: Option<u32>` in the same `struct`.
Although [tiktoken](https://github.com/openai/tiktoken) uses `usize`,
token counts should be consistent across targets (e.g. the same model
doesn't suddenly get a smaller context window if you're compiling for
wasm32), and these token counts could end up getting serialized using a
binary protocol, so `usize` is not the right choice for token counts.
I chose to standardize on `u64` over `u32` because we don't store many
of them (so the extra size should be insignificant) and future models
may exceed `u32::MAX` tokens.
Release Notes:
- N/A
We push the usage data whenever we receive it from the provider to make
sure the counting is correct after the turn has ended.
- [x] Ollama
- [x] Copilot
- [x] Mistral
- [x] OpenRouter
- [x] LMStudio
Put all the changes into a single PR open to move these to separate PR
if that makes the review and testing easier.
Release Notes:
- N/A
Removes the load_model trait method and its implementations in Ollama
and LM Studio providers, along with associated preload_model functions
and unused imports.
Release Notes:
- N/A
This PR updates how we handle Ollama responses, leveraging the new
[v0.9.0](https://github.com/ollama/ollama/releases/tag/v0.9.0) release.
Previously, thinking text was embedded within the model's main content,
leading to it appearing directly in the agent's response. Now, thinking
content is provided as a separate parameter, allowing us to display it
correctly within the agent panel, similar to other providers. I have
tested this with qwen3:8b and works nicely. ~~We can release this once
the ollama is release is stable.~~ It's released now as stable.
<img width="433" alt="image"
src="https://github.com/user-attachments/assets/2983ef06-6679-4033-82c2-231ea9cd6434"
/>
Release Notes:
- Add thinking support for ollama
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Ollama increased their default context size from 2048 to 4096 tokens in
version v0.6.7, which released over a month ago.
https://github.com/ollama/ollama/releases/tag/v0.6.7
Release Notes:
- ollama: Update default model context to 4096 (matching upstream)
Mistral just released a sota coding model:
https://mistral.ai/news/devstral
This PR adds support for it in both ollama and mistral
Release Notes:
- Add DevstralSmallLatest model to Mistral and Ollama
https://github.com/zed-industries/zed/issues/30972 brought up another
case where our context is not enough to track the actual source of the
issue: we get a general top-level error without inner error.
The reason for this was `.ok_or_else(|| anyhow!("failed to read HEAD
SHA"))?; ` on the top level.
The PR finally reworks the way we use anyhow to reduce such issues (or
at least make it simpler to bubble them up later in a fix).
On top of that, uses a few more anyhow methods for better readability.
* `.ok_or_else(|| anyhow!("..."))`, `map_err` and other similar error
conversion/option reporting cases are replaced with `context` and
`with_context` calls
* in addition to that, various `anyhow!("failed to do ...")` are
stripped with `.context("Doing ...")` messages instead to remove the
parasitic `failed to` text
* `anyhow::ensure!` is used instead of `if ... { return Err(...); }`
calls
* `anyhow::bail!` is used instead of `return Err(anyhow!(...));`
Release Notes:
- N/A
The goal of this PR is to support tool calls using ollama. A lot of the
serialization work was done in
https://github.com/zed-industries/zed/pull/15803 however the abstraction
over language models always disables tools.
## Changelog:
- Use `serde_json::Value` inside `OllamaFunctionCall` just as it's used
in `OllamaFunctionCall`. This fixes deserialization of ollama tool
calls.
- Added deserialization tests using json from official ollama api docs.
- Fetch model capabilities during model enumeration from ollama provider
- Added `supports_tools` setting to manually configure if a model
supports tools
## TODO:
- [x] Fix tool call serialization/deserialization
- [x] Fetch model capabilities from ollama api
- [x] Add tests for parsing model capabilities
- [ ] Documentation for `supports_tools` field for ollama language model
config
- [ ] Convert between generic language model types
- [x] Pass tools to ollama
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Nathan Sobo <nathan@zed.dev>
If you have the VRAM you can increase the context by adding this to your
settings.json:
```json
"language_models": {
"ollama": {
"available_models": [
{ "max_tokens": 65536, "name": "qwen3", "display_name": "Qwen3-64k" }
]
}
},
```
Release Notes:
- ollama: Add support for Qwen3. Defaults to 16K token context. See:
[Assistant Configuration
Docs](https://zed.dev/docs/assistant/configuration#ollama-context) to
increase.
The `ollama` crate has a `use schemars::JsonSchema` statement even when
building with default features, which doesn't include the `schemars`
crate.
Release Notes:
- N/A
This adds a "workspace-hack" crate, see
[mozilla's](https://hg.mozilla.org/mozilla-central/file/3a265fdc9f33e5946f0ca0a04af73acd7e6d1a39/build/workspace-hack/Cargo.toml#l7)
for a concise explanation of why this is useful. For us in practice this
means that if I were to run all the tests (`cargo nextest r
--workspace`) and then `cargo r`, all the deps from the previous cargo
command will be reused. Before this PR it would rebuild many deps due to
resolving different sets of features for them. For me this frequently
caused long rebuilds when things "should" already be cached.
To avoid manually maintaining our workspace-hack crate, we will use
[cargo hakari](https://docs.rs/cargo-hakari) to update the build files
when there's a necessary change. I've added a step to CI that checks
whether the workspace-hack crate is up to date, and instructs you to
re-run `script/update-workspace-hack` when it fails.
Finally, to make sure that people can still depend on crates in our
workspace without pulling in all the workspace deps, we use a `[patch]`
section following [hakari's
instructions](https://docs.rs/cargo-hakari/0.9.36/cargo_hakari/patch_directive/index.html)
One possible followup task would be making guppy use our
`rust-toolchain.toml` instead of having to duplicate that list in its
config, I opened an issue for that upstream: guppy-rs/guppy#481.
TODO:
- [x] Fix the extension test failure
- [x] Ensure the dev dependencies aren't being unified by Hakari into
the main dependencies
- [x] Ensure that the remote-server binary continues to not depend on
LibSSL
Release Notes:
- N/A
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
There's still a bit more work to do on this, but this PR is compiling
(with warnings) after eliminating the key types. When the tasks below
are complete, this will be the new narrative for GPUI:
- `Entity<T>` - This replaces `View<T>`/`Model<T>`. It represents a unit
of state, and if `T` implements `Render`, then `Entity<T>` implements
`Element`.
- `&mut App` This replaces `AppContext` and represents the app.
- `&mut Context<T>` This replaces `ModelContext` and derefs to `App`. It
is provided by the framework when updating an entity.
- `&mut Window` Broken out of `&mut WindowContext` which no longer
exists. Every method that once took `&mut WindowContext` now takes `&mut
Window, &mut App` and every method that took `&mut ViewContext<T>` now
takes `&mut Window, &mut Context<T>`
Not pictured here are the two other failed attempts. It's been quite a
month!
Tasks:
- [x] Remove `View`, `ViewContext`, `WindowContext` and thread through
`Window`
- [x] [@cole-miller @mikayla-maki] Redraw window when entities change
- [x] [@cole-miller @mikayla-maki] Get examples and Zed running
- [x] [@cole-miller @mikayla-maki] Fix Zed rendering
- [x] [@mikayla-maki] Fix todo! macros and comments
- [x] Fix a bug where the editor would not be redrawn because of view
caching
- [x] remove publicness window.notify() and replace with
`AppContext::notify`
- [x] remove `observe_new_window_models`, replace with
`observe_new_models` with an optional window
- [x] Fix a bug where the project panel would not be redrawn because of
the wrong refresh() call being used
- [x] Fix the tests
- [x] Fix warnings by eliminating `Window` params or using `_`
- [x] Fix conflicts
- [x] Simplify generic code where possible
- [x] Rename types
- [ ] Update docs
### issues post merge
- [x] Issues switching between normal and insert mode
- [x] Assistant re-rendering failure
- [x] Vim test failures
- [x] Mac build issue
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Joseph <joseph@zed.dev>
Co-authored-by: max <max@zed.dev>
Co-authored-by: Michael Sloan <michael@zed.dev>
Co-authored-by: Mikayla Maki <mikaylamaki@Mikaylas-MacBook-Pro.local>
Co-authored-by: Mikayla <mikayla.c.maki@gmail.com>
Co-authored-by: joão <joao@zed.dev>
Add `phi4` maximum context length (128K).
By default this clamps to `16384` but if you have enough video memory
you can set it higher or connect to a non-local machine via settings:
```json
"language_models": {
"ollama": {
"api_url": "http://localhost:11434",
"available_models": [
{
"name": "phi4",
"display_name": "Phi4 64K",
"max_tokens": 65536
}
]
}
}
```
Release Notes:
- Improve support for Phi4 with ollama.
Since Ollama/llama.cpp do not currently YARN for context length
extension, the context length is limited to `32768`. This can be
confirmed by the Ollama model card.
See corresponding issue on Ollama repo :
https://github.com/ollama/ollama/issues/6865
Co-authored-by: Patrick Samson <1416027+patricksamson@users.noreply.github.com>
This removes the `low_speed_timeout` setting from all providers as a
response to issue #19509.
Reason being that the original `low_speed_timeout` was only as part of
#9913 because users wanted to _get rid of timeouts_. They wanted to bump
the default timeout from 5sec to a lot more.
Then, in the meantime, the meaning of `low_speed_timeout` changed in
#19055 and was changed to a normal `timeout`, which is a different thing
and breaks slower LLMs that don't reply with a complete response in the
configured timeout.
So we figured: let's remove the whole thing and replace it with a
default _connect_ timeout to make sure that we can connect to a server
in 10s, but then give the server as long as it wants to complete its
response.
Closes#19509
Release Notes:
- Removed the `low_speed_timeout` setting from LLM provider settings,
since it was only used to _increase_ the timeout to give LLMs more time,
but since we don't have any other use for it, we simply remove the
setting to give LLMs as long as they need.
---------
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>
This PR is a refactor to pave the way for allowing the user to view and
edit workflow step resolutions. I've made tool calls work more like
normal streaming completions for all providers. The `use_any_tool`
method returns a stream of strings (which contain chunks of JSON). I've
also done some minor cleanup of language model providers in general,
removing the duplication around handling streaming responses.
Release Notes:
- N/A
- [x] OpenAI
- [ ] ~Google~ Moved into a separate branch at:
https://github.com/zed-industries/zed/tree/tool-calls-in-google-ai I've
ran into issues with having the API digest our schema without tripping
over itself - the function call parameters are malformed and whatnot. We
can resume from that branch if needed.
- [x] Ollama
- [x] Cloud
- [ ] ~Copilot Chat (?)~
Release Notes:
- Added tool calling capabilities to OpenAI and Ollama models.
This adds the ability to set the keep alive as an integer, including
`-1` for staying alive indefinitely until a new model is loaded or
Ollama exits. I've also set the default to `-1` so that models stay
ready to go for Zed to use.
Release Notes:
- N/A
This PR adds a missing LICENSE file to the recently-added `ollama`
crate.
Also added the missing `lints.workspace = true` to the `Cargo.toml`.
Release Notes:
- N/A
Closes#4424.
A few design decisions that may need some rethinking or later PRs:
* Other providers have a check for authentication. I use this
opportunity to fetch the models which doubles as a way of finding out if
the Ollama server is running.
* Ollama has _no_ API for getting the max tokens per model
* Ollama has _no_ API for getting the current token count
https://github.com/ollama/ollama/issues/1716
* Ollama does allow setting the `num_ctx` so I've defaulted this to
4096. It can be overridden in settings.
* Ollama models will be "slow" to start inference because they're
loading the model into memory. It's faster after that. There's no UI
affordance to show that the model is being loaded.
Release Notes:
- Added an Ollama Provider for the assistant. If you have
[Ollama](https://ollama.com/) running locally on your machine, you can
enable it in your settings under:
```jsonc
"assistant": {
"version": "1",
"provider": {
"name": "ollama",
// Recommended setting to allow for model startup
"low_speed_timeout_in_seconds": 30,
}
}
```
Chat like usual
<img width="1840" alt="image"
src="https://github.com/zed-industries/zed/assets/836375/4e0af266-4c4f-4d9e-9d74-1a91f76a12fe">
Interact with any model from the [Ollama
Library](https://ollama.com/library)
<img width="587" alt="image"
src="https://github.com/zed-industries/zed/assets/836375/87433ac6-bf87-4a99-89e1-96a93bf8de8a">
Open up the terminal to download new models via `ollama pull`:
