Commit graph

69 commits

Author SHA1 Message Date
Kirill Bulatov
16366cf9f2
Use anyhow more idiomatically (#31052)
https://github.com/zed-industries/zed/issues/30972 brought up another
case where our context is not enough to track the actual source of the
issue: we get a general top-level error without inner error.

The reason for this was `.ok_or_else(|| anyhow!("failed to read HEAD
SHA"))?; ` on the top level.

The PR finally reworks the way we use anyhow to reduce such issues (or
at least make it simpler to bubble them up later in a fix).
On top of that, uses a few more anyhow methods for better readability.

* `.ok_or_else(|| anyhow!("..."))`, `map_err` and other similar error
conversion/option reporting cases are replaced with `context` and
`with_context` calls
* in addition to that, various `anyhow!("failed to do ...")` are
stripped with `.context("Doing ...")` messages instead to remove the
parasitic `failed to` text
* `anyhow::ensure!` is used instead of `if ... { return Err(...); }`
calls
* `anyhow::bail!` is used instead of `return Err(anyhow!(...));`

Release Notes:

- N/A
2025-05-20 23:06:07 +00:00
Richard Feldman
8fdf309a4a
Have read_file support images (#30435)
This is very basic support for them. There are a number of other TODOs
before this is really a first-class supported feature, so not adding any
release notes for it; for now, this PR just makes it so that if
read_file tries to read a PNG (which has come up in practice), it at
least correctly sends it to Anthropic instead of messing up.

This also lays the groundwork for future PRs for more first-class
support for images in tool calls across more image file formats and LLM
providers.

Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
2025-05-13 10:58:00 +02:00
Antonio Scandurra
9f6809a28d
Reuse conversation cache when streaming edits (#30245)
Release Notes:

- Improved latency when the agent applies edits.
2025-05-08 14:36:34 +02:00
Max Brunsfeld
c3d9cdecab
Change cloud language model provider JSON protocol to surface errors and usage information (#29830)
Release Notes:

- N/A

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-05-04 17:37:42 +00:00
Marshall Bowers
f0515d1c34
agent: Show a notice when reaching consecutive tool use limits (#29833)
This PR adds a notice when reaching consecutive tool use limits when
using normal mode.

Here's an example with the limit artificially lowered to 2 consecutive
tool uses:


https://github.com/user-attachments/assets/32da8d38-67de-4d6b-8f24-754d2518e5d4

Release Notes:

- agent: Added a notice when reaching consecutive tool use limits when
using a model in normal mode.
2025-05-03 02:09:54 +00:00
Max Brunsfeld
04772bf17d
Add support for queuing status updates in cloud language model provider (#29818)
This sets us up to display queue position information to the user, once
our language model backend is updated to support request queuing.

The JSON returned by the LLM backend will need to look like this:

```json
{"queue": {"status": "queued", "position": 1}}
{"queue": {"status": "started"}}
{"event": {"THE_UPSTREAM_MODEL_PROVIDER_EVENT": "..."}} 
```

Release Notes:

- N/A

---------

Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-05-02 20:36:39 +00:00
Max Brunsfeld
17903a0999
Associate each thread with a model (#29573)
This PR makes it possible to use different LLM models in the agent
panels of two different projects, simultaneously. It also properly
restores a thread's original model when restoring it from the history,
rather than having it use the default model. As before, newly-created
threads will use the current default model.

Release Notes:

- Enabled different project windows to use different models in the agent
panel
- Enhanced the agent panel so that when revisiting old threads, their
original model will be used.

---------

Co-authored-by: Richard Feldman <oss@rtfeldman.com>
2025-04-28 23:43:16 +00:00
Marshall Bowers
ce93961fe0
agent: Add "max mode" toggle (#29549)
This PR adds a "max mode" toggle to the Agent panel, for models that
support it.

Only visible to folks in the `new-billing` feature flag.

Icon is just a placeholder.

Release Notes:

- N/A
2025-04-28 16:50:47 +00:00
Richard Feldman
720dfee803
Treat invalid JSON in tool calls as failed tool calls (#29375)
Release Notes:

- N/A

---------

Co-authored-by: Max <max@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-04-24 16:54:27 -04:00
Nathan Sobo
8836c6fb42
Introduce LanguageModelToolUse::raw_input (#29322)
This is to enable alternative streaming solutions at the application
layer. I'm not sure we really should have performed parsing of the input
at this layer. Either way I want to experiment with streaming approaches
in a separate crate on a branch, and this will help.

/cc @maxdeviant @bennetbo @rtfeldman

Closes #ISSUE

Release Notes:

- N/A
2025-04-24 02:30:48 +00:00
Richard Feldman
4f2f9ff762
Streaming tool calls (#29179)
https://github.com/user-attachments/assets/7854a737-ef83-414c-b397-45122e4f32e8



Release Notes:

- Create file and edit file tools now stream their tool descriptions, so
you can see what they're doing sooner.

---------

Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-04-21 22:28:32 +00:00
Michael Sloan
fbf7caf93e
Default to fast model for thread summaries and titles + don't include system prompt / context / thinking segments (#29102)
* Adds a fast / cheaper model to providers and defaults thread
summarization to this model. Initial motivation for this was that
https://github.com/zed-industries/zed/pull/29099 would cause these
requests to fail when used with a thinking model. It doesn't seem
correct to use a thinking model for summarization.

* Skips system prompt, context, and thinking segments.

* If tool use is happening, allows 2 tool uses + one more agent response
before summarizing.

Downside of this is that there was potential for some prefix cache reuse
before, especially for title summarization (thread summarization omitted
tool results and so would not share a prefix for those). This seems fine
as these requests should typically be fairly small. Even for full thread
summarization, skipping all tool use / context should greatly reduce the
token use.

Release Notes:

- N/A
2025-04-19 23:26:29 +00:00
Bennet Bo Fenner
bafc086d27
agent: Preserve thinking blocks between requests (#29055)
Looks like the required backend component of this was deployed.

https://github.com/zed-industries/monorepo/actions/runs/14541199197

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Nathan Sobo <nathan@zed.dev>
2025-04-19 20:12:03 +00:00
Marshall Bowers
676cc109a3
agent: Report usage from thread summarization requests (#29012)
This PR makes it so the thread summarization also reports the model
request usage, to prevent the case where the count would appear to jump
by 2 the next time a message was sent after summarization.

Release Notes:

- N/A
2025-04-17 23:05:12 +00:00
Marshall Bowers
d93141bded
agent: Extract usage information from response headers (#29002)
This PR updates the Agent to extract the usage information from the
response headers, if they are present.

For now we just log the information, but we'll be using this soon to
populate some UI.

Release Notes:

- N/A
2025-04-17 20:11:07 +00:00
Agus Zubiaga
0286b8ab3e
agent: Fix conversation token usage and estimate unsent message (#28878)
The UI was mistakenly using the cumulative token usage for the token
counter. It will now display the last request token count, plus an
estimation of the tokens in the message editor and context entries that
haven't been sent yet.


https://github.com/user-attachments/assets/0438c501-b850-4397-9135-57214ca3c07a

Additionally, when the user edits a message, we'll display the actual
token count up to it and estimate the tokens in the new message.

Note: We don't currently estimate the delta when switching profiles. In
the future, we want to use the count tokens API to measure every part of
the request and display a breakdown.

Release Notes:

- agent: Made the token count more accurate and added back estimation of
used tokens as you type and add context.

---------

Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
2025-04-16 16:27:36 -03:00
Bennet Bo Fenner
c381a500f8
agent: Show a warning when some tools are incompatible with the selected model (#28755)
WIP

<img width="644" alt="image"
src="https://github.com/user-attachments/assets/b24e1a57-f82e-457c-b788-1b314ade7c84"
/>


<img width="644" alt="image"
src="https://github.com/user-attachments/assets/b158953c-2015-4cc8-b8ed-35c6fcbe162d"
/>


Release Notes:

- agent: Improve compatibility with Gemini Tool Calling APIs. When a
tool is incompatible with the Gemini APIs a warning indicator will be
displayed. Incompatible tools will be automatically excluded from the
conversation

---------

Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
2025-04-15 16:58:11 +00:00
Michael Sloan
6b80eb556c
Add judge to new eval + provide LSP diagnostics (#28713)
Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <antonio@zed.dev>
Co-authored-by: agus <agus@zed.dev>
2025-04-14 20:18:47 +00:00
Agus Zubiaga
b45230784d
agent: Handle context window exceeded errors from Anthropic (#28688)
![CleanShot 2025-04-14 at 11 15
38@2x](https://github.com/user-attachments/assets/9e803ffb-74fd-486b-bebc-2155a407a9fa)

Release Notes:

- agent: Handle context window exceeded errors from Anthropic
2025-04-14 14:39:33 +00:00
Bennet Bo Fenner
b22faf96e0
agent: Refine language model selector (#28597)
Release Notes:

- agent: Show recommended models in the agent model selector and display
the provider in the model selector's trigger.

---------

Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Co-authored-by: Danilo Leal <67129314+danilo-leal@users.noreply.github.com>
2025-04-11 23:02:50 +00:00
Danilo Leal
b9724d9cbe
agent: Add token count in the thread view (#28037)
This PR adds the token count to the active thread view. It doesn't
behaves quite like Assistant 1 where it updates as you type, though; it
updates after you submit the message.

<img
src="https://github.com/user-attachments/assets/82d2a180-554a-43ee-b776-3743359b609b"
width="700" />

---

Release Notes:

- agent: Add token count in the thread view

---------

Co-authored-by: Agus Zubiaga <hi@aguz.me>
2025-04-03 15:43:58 -03:00
Marshall Bowers
889bc13b7d
language_model: Remove use_any_tool method from LanguageModel (#27930)
This PR removes the `use_any_tool` method from the `LanguageModel`
trait.

It was not being used anywhere, and doesn't really fit in our new tool
use story.

Release Notes:

- N/A
2025-04-02 15:49:21 +00:00
Danilo Leal
192097f58f
assistant2: Ensure errors are also displayed in populated new thread view (#27869)
Follow-up to https://github.com/zed-industries/zed/pull/27812

This PR makes sure these errors cases also show up in the panel's empty
state even when there is past data.

| No ToS | Missing Provider |
|--------|--------|
| ![CleanShot 2025-04-01 at 4  49
36@2x](https://github.com/user-attachments/assets/6da6bdc9-daa6-4a7b-a224-989eb845e205)
| ![CleanShot 2025-04-01 at 4  50
04@2x](https://github.com/user-attachments/assets/bddf62cb-3727-44b5-b115-9a88313c6d85)
|

Release Notes:

- N/A
2025-04-01 17:06:34 -03:00
Marshall Bowers
5880271b11
language_model: Add supports_tools method to LanguageModel (#27867)
This PR adds a new `supports_tools` method to the `LanguageModel` trait
to indicate whether a given model supports tool use.

Release Notes:

- N/A
2025-04-01 19:56:05 +00:00
Piotr Osiewicz
dc64ec9cc8
chore: Bump Rust edition to 2024 (#27800)
Follow-up to https://github.com/zed-industries/zed/pull/27791

Release Notes:

- N/A
2025-03-31 20:55:27 +02:00
Bennet Bo Fenner
c8a9a74e6a
Add tool calling support for Gemini models (#27772)
Release Notes:

- N/A
2025-03-31 17:46:42 +02:00
Thomas Mickley-Doyle
1e8b50f471
Add token usage to LanguageModelTextStream (#27490)
Release Notes:

- N/A

---------

Co-authored-by: Michael Sloan <michael@zed.dev>
2025-03-26 22:21:01 +00:00
Marshall Bowers
4a30b960d4
language_model: Remove dependency on ui (#27448)
This PR removes the dependency on the `ui` crate from the
`language_model` crate.

We were only depending on it to import `IconName`—which now lives in
`icons`—and some re-exported GPUI items.

Release Notes:

- N/A
2025-03-25 18:30:46 +00:00
Bennet Bo Fenner
a709d4c7c6
assistant: Add support for claude-3-7-sonnet-thinking (#27085)
Closes #25671

Release Notes:

- Added support for `claude-3-7-sonnet-thinking` in the assistant panel

---------

Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Agus Zubiaga <hi@aguz.me>
2025-03-21 12:29:07 +00:00
Michael Sloan
8e0e291bd5
Track cumulative token usage in assistant2 when using anthropic API (#26738)
Release Notes:

- N/A
2025-03-13 22:56:16 +00:00
Marshall Bowers
e7df5ce61c
assistant2: Avoid unnecessary String cloning in tool use (#25725)
This PR removes some unnecessary `String` cloning in the tool use paths.

We now store the data in `Arc<str>`s for cheap cloning.

Release Notes:

- N/A
2025-02-27 03:16:09 +00:00
Marshall Bowers
def342e35c
Remove dependents of language_models (#25511)
This PR removes the dependents of the `language_models` crate.

The following types have been moved from `language_models` to
`language_model` to facilitate this:

- `LlmApiToken`
- `RefreshLlmTokenListener`
- `MaxMonthlySpendReachedError`
- `PaymentRequiredError`

With this change only `zed` now depends on `language_models`.

Release Notes:

- N/A
2025-02-24 22:46:45 +00:00
Marshall Bowers
e5b97a5e48
Move report_assistant_event into language_model crate (#25508)
This PR moves the `report_assistant_event` function from the
`language_models` crate to the `language_model` crate.

This allows us to drop some dependencies on `language_models`.

Release Notes:

- N/A
2025-02-24 22:27:26 +00:00
Antonio Scandurra
f517050548
Partially fix assistant onboarding (#25313)
While investigating #24896, I noticed two issues:

1. The default configuration for the `zed.dev` provider was using the
wrong string for Claude 3.5 Sonnet. This meant the provider would always
result as not configured until the user selected it from the model
picker, because we couldn't deserialize that string to a valid
`anthropic::Model` enum variant.
2. When clicking on `Open New Chat`/`Start New Thread` in the provider
configuration, we would select `Claude 3.5 Haiku` by default instead of
Claude 3.5 Sonnet.

Release Notes:

- Fixed some issues that caused AI providers to sometimes be
misconfigured.
2025-02-24 07:29:55 +00:00
Marshall Bowers
7a6b652ebc
language_model: Return AuthenticateErrors from LanguageModelProvider::authenticate (#25126)
This PR updates the `LanguageModelProvider::authenticate` method to
return an `AuthenticateError` instead of an `anyhow::Error`.

This allows us to model the "credentials not found" state explicitly as
`AuthenticateError::CredentialsNotFound`, which enables the caller to
check for this state and act accordingly.

Planning to use this in #25123 to silence errors about missing
credentials when authenticating providers in the background.

Release Notes:

- N/A
2025-02-19 00:01:48 +00:00
Mikayla Maki
a6b1514246
Fix missed renames in #22632 (#23688)
Fix a bug where a GPUI macro still used `ModelContext`
Rename `AsyncAppContext` -> `AsyncApp`
Rename update_model, read_model, insert_model, and reserve_model to
update_entity, read_entity, insert_entity, and reserve_entity

Release Notes:

- N/A
2025-01-26 23:37:34 +00:00
Nathan Sobo
6fca1d2b0b
Eliminate GPUI View, ViewContext, and WindowContext types (#22632)
There's still a bit more work to do on this, but this PR is compiling
(with warnings) after eliminating the key types. When the tasks below
are complete, this will be the new narrative for GPUI:

- `Entity<T>` - This replaces `View<T>`/`Model<T>`. It represents a unit
of state, and if `T` implements `Render`, then `Entity<T>` implements
`Element`.
- `&mut App` This replaces `AppContext` and represents the app.
- `&mut Context<T>` This replaces `ModelContext` and derefs to `App`. It
is provided by the framework when updating an entity.
- `&mut Window` Broken out of `&mut WindowContext` which no longer
exists. Every method that once took `&mut WindowContext` now takes `&mut
Window, &mut App` and every method that took `&mut ViewContext<T>` now
takes `&mut Window, &mut Context<T>`

Not pictured here are the two other failed attempts. It's been quite a
month!

Tasks:

- [x] Remove `View`, `ViewContext`, `WindowContext` and thread through
`Window`
- [x] [@cole-miller @mikayla-maki] Redraw window when entities change
- [x] [@cole-miller @mikayla-maki] Get examples and Zed running
- [x] [@cole-miller @mikayla-maki] Fix Zed rendering
- [x] [@mikayla-maki] Fix todo! macros and comments
- [x] Fix a bug where the editor would not be redrawn because of view
caching
- [x] remove publicness window.notify() and replace with
`AppContext::notify`
- [x] remove `observe_new_window_models`, replace with
`observe_new_models` with an optional window
- [x] Fix a bug where the project panel would not be redrawn because of
the wrong refresh() call being used
- [x] Fix the tests
- [x] Fix warnings by eliminating `Window` params or using `_`
- [x] Fix conflicts
- [x] Simplify generic code where possible
- [x] Rename types
- [ ] Update docs

### issues post merge

- [x] Issues switching between normal and insert mode
- [x] Assistant re-rendering failure
- [x] Vim test failures
- [x] Mac build issue



Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Joseph <joseph@zed.dev>
Co-authored-by: max <max@zed.dev>
Co-authored-by: Michael Sloan <michael@zed.dev>
Co-authored-by: Mikayla Maki <mikaylamaki@Mikaylas-MacBook-Pro.local>
Co-authored-by: Mikayla <mikayla.c.maki@gmail.com>
Co-authored-by: joão <joao@zed.dev>
2025-01-26 03:02:45 +00:00
Agus Zubiaga
ba16b4eb65
assistant2: Show accept terms UI in thread empty state (#23630)
<img
src="https://github.com/user-attachments/assets/cea93cfb-8a40-48c4-9d90-f1751c79603b"
width=400>



Release Notes:

- N/A

---------

Co-authored-by: Danilo <danilo@zed.dev>
2025-01-24 19:34:46 -03:00
Cole Miller
ee6f834028
Fuse LLM completion stream to avoid a panic (#21914)
`LanguageModel::stream_completion_text` can poll the `stream_completion`
stream (ultimately a `futures::Unfold`) after it's returned
`Ready(None)`, which leads to a panic; avoid this by fusing the stream.

Release Notes:

- Fixed a panic when streaming language model completions
2024-12-12 11:39:35 -05:00
Marshall Bowers
f3140f54d8
assistant2: Wire up error messages (#21426)
This PR wires up the error messages for Assistant 2 so that they are
shown to the user:

<img width="1138" alt="Screenshot 2024-12-02 at 4 28 02 PM"
src="https://github.com/user-attachments/assets/d8a5b9bd-0cef-4304-b561-b2edadbc70ef">
<img width="1138" alt="Screenshot 2024-12-02 at 4 29 09 PM"
src="https://github.com/user-attachments/assets/0dd70841-0d5a-4de6-bebe-82c563246b65">
<img width="1138" alt="Screenshot 2024-12-02 at 4 32 49 PM"
src="https://github.com/user-attachments/assets/a8838866-fad1-43a9-8935-490dc1936016">

@danilo-leal I kept the existing UX from Assistant 1, as I didn't see
any errors in the design prototype, but we can revisit if another
approach would work better.

Release Notes:

- N/A
2024-12-02 16:54:46 -05:00
Marshall Bowers
968ffaa3fd
assistant2: Restructure storage of tool uses and results (#21194)
This PR restructures the storage of the tool uses and results in
`assistant2` so that they don't live on the individual messages.

It also introduces a `LanguageModelToolUseId` newtype for better type
safety.

Release Notes:

- N/A
2024-11-25 21:53:27 -05:00
Marshall Bowers
cbba44900d
Add language_models crate to house language model providers (#20945)
This PR adds a new `language_models` crate to house the various language
model providers.

By extracting the provider definitions out of `language_model`, we're
able to remove `language_model`'s dependency on `editor`, which improves
incremental compilation when changing `editor`.

Release Notes:

- N/A
2024-11-20 18:49:34 -05:00
Boris Cherny
b87c4a1e13
assistant: Add health telemetry (#19928)
This PR adds a bit of telemetry for Anthropic models, in order to
understand model health. With this logging, we can monitor and diagnose
dips in performance, for example due to model rollouts.

Release Notes:

- N/A

---------

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2024-10-31 16:21:26 -07:00
Boris Cherny
01ad22683d
telemetry: Add language_name and model_provider (#18640)
This PR adds a bit more metadata for assistant logging.

Release Notes:

- Assistant: Added `language_name` and `model_provider` fields to
telemetry events.

---------

Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
Co-authored-by: Max <max@zed.dev>
2024-10-04 14:37:27 -04:00
Marshall Bowers
497356b2ba
language_model: Add tool uses to message content (#17381)
This PR updates the message content for an LLM request to allow it
contain tool uses.

We need to send the tool uses back to the model in order for it to
recognize the subsequent tool results.

Release Notes:

- N/A
2024-09-04 19:29:11 -04:00
Marshall Bowers
f38956943b
assistant: Propagate LLM stop reason upwards (#17358)
This PR makes it so we propagate the `stop_reason` from Anthropic up to
the Assistant so that we can take action based on it.

The `extract_content_from_events` function was moved from `anthropic` to
the `anthropic` module in `language_model` since it is more useful if it
is able to name the `LanguageModelCompletionEvent` type, as otherwise
we'd need an additional layer of plumbing.

Release Notes:

- N/A
2024-09-04 12:31:10 -04:00
Marshall Bowers
452272e5df
assistant: Stream tool uses as structured data (#17322)
This PR adjusts the approach we use to encoding tool uses in the
completion response to use a structured format rather than simply
injecting it into the response stream as text.

In #17170 we would encode the tool uses as XML and insert them as text.
This would require then re-parsing the tool uses out of the buffer in
order to use them.

The approach taken in this PR is to make `stream_completion` return a
stream of `LanguageModelCompletionEvent`s. Each of these events can be
either text, or a tool use.

A new `stream_completion_text` method has been added to `LanguageModel`
for scenarios where we only care about textual content (currently,
everywhere that isn't the Assistant context editor).

Release Notes:

- N/A
2024-09-03 15:04:51 -04:00
Nathan Sobo
b9176fe4bb
Add custom icon for Anthropic hosted models (#16436)
This commit adds a custom icon for Anthropic hosted models.


![CleanShot 2024-08-18 at 15 40
38@2x](https://github.com/user-attachments/assets/d467ccab-9628-4258-89fc-782e0d4a48d4)
![CleanShot 2024-08-18 at 15 40
34@2x](https://github.com/user-attachments/assets/7efaff9c-6a58-47ba-87ea-e0fe0586fedc)


- Adding a new SVG icon for Anthropic hosted models.
  - The new icon is located at: `assets/icons/ai_anthropic_hosted.svg`
- Updating the LanguageModel trait to include an optional icon method
- Implementing the icon method for CloudModel to return the custom icon
for Anthropic hosted models
- Updating the UI components to use the model-specific icon when
available
- Adding a new IconName variant for the Anthropic hosted icon

We should change the non-hosted icon in some small way to distinguish it
from the hosted version. I duplicated the path for now so we can
hopefully add it for the next release.

Release Notes:

- N/A
2024-08-18 16:07:15 -06:00
Roy Williams
b4f5f5024e
Support 8192 output tokens for Claude Sonnet 3.5 (#16358)
Release Notes:

- Added support for 8192 output tokens from Claude Sonnet 3.5
(https://x.com/alexalbert__/status/1812921642143900036)
2024-08-16 11:47:39 -04:00
Roy Williams
46fb917e02
Implement Anthropic prompt caching (#16274)
Release Notes:

- Adds support for Prompt Caching in Anthropic. For models that support
it this can dramatically lower cost while improving performance.
2024-08-15 22:21:06 -05:00