This PR adds a `can_use_web_search_tool` field to the LLM token claims.
Currently anyone in the `assistant2` feature flag will have access to
the web search tool.
Co-authored-by: Bennet <bennet@zed.dev>
Release Notes:
- N/A
This PR updates the `plan` field in the LLM token to be based on the
subscription.
We weren't using this field anywhere outside of the new billing code, so
it is safe to change its meaning.
Release Notes:
- N/A
This PR adds support for transferring any existing usage from a trial
subscription to a Zed Pro subscription when the user upgrades.
Release Notes:
- N/A
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
This PR makes it so we use more types and constants from the
`zed_llm_client` crate to avoid duplicating information.
Also updates the current usage endpoint to use limits derived from the
`Plan`.
Release Notes:
- N/A
This PR adds a `plan` column to the `subscription_usages` table.
These tables don't have any records in them yet, so it's fine to make
the column required without a default.
Release Notes:
- N/A
This PR adds tracking for input and output tokens per minute separately
from the current aggregate tokens per minute.
We are not yet rate-limiting based on these measures.
Release Notes:
- N/A
This PR adds new granular tokens per minute columns to the `models`
table in preparation for more fine-grained rate limits.
The following columns have been added:
- `max_input_tokens_per_minute`
- `max_output_tokens_per_minute`
These mirror the "Maximum input tokens per minute (ITPM)" and "Maximum
output tokens per minute (OTPM)" [rate limits from
Anthropic](https://docs.anthropic.com/en/api/rate-limits#rate-limits).
Release Notes:
- N/A
This PR makes the account age-related fields required in
`LlmTokenClaims`.
We've also removed the account age check from the LLM token issuance
endpoint, instead having it solely be enforced in the `POST /completion`
endpoint.
This change will be safe to deploy at ~8:01PM EDT.
Release Notes:
- N/A
This PR defers the account age check to the `POST /completion` endpoint
instead of doing it when an LLM token is generated.
This will allow us to lift the account age restriction for using Edit
Prediction.
Note: We're still temporarily performing the account age check when
issuing the LLM token until this change is deployed and the LLM tokens
have had a chance to cycle.
Release Notes:
- N/A
This PR cleans up the LLM token creation a bit.
We now pass in the entire list of feature flags to the
`LlmTokenClaims::create` method to prevent having a bunch of confusable
`bool` parameters.
Release Notes:
- N/A
This reverts commit 9ef0501853 due to a
panic.
```
{
"thread": "main",
"payload": "9 is not a valid char boundary in path \"crates/…/LiveKitBridge/\"",
"location_data": {
"file": "crates/file_finder/src/file_finder.rs",
"line": 646
}
}
```
Release Notes:
- N/A
This PR makes progress on #7711 by identifying any common prefix of the
paths in the file finder's search results, and replacing the "interior"
of that prefix---every path segment but the first and last---with `...`,
when a heuristic indicates that the longest path would otherwise
overflow the modal.
The elision is not applied to any segment that contains a match for the
search query.
There may be more work to do on #7711 in the case of long result paths
that do not share a significant common prefix.
Release Notes:
- Improved display of long paths in the file finder modal
Co-authored-by: Max <max@zed.dev>
This PR removes the `POST /predict_edits` endpoint from the LLM service,
as it has been superseded by the corresponding endpoint running in
Cloudflare Workers.
All traffic is already being routed to the Cloudflare Workers via the
Workers route, so nothing is hitting this endpoint running in the LLM
service anymore.
You can see the drop off in requests to this endpoint on this graph when
the Workers route was added:
<img width="472" alt="Screenshot 2025-01-30 at 9 18 04 PM"
src="https://github.com/user-attachments/assets/fa60f7c8-2737-4329-88a3-17093bdb5a29"
/>
We also don't use the `fireworks` crate anymore in this repo, so it has
been removed.
Release Notes:
- N/A
This PR adjusts the billing logic to not write any records to
`billing_events` if:
- The user is staff, as we don't want to bill staff members
- Billing is disabled (we currently enable billing based on the presence
of the Stripe API key)
Release Notes:
- N/A
This PR adds usage-based billing for LLM interactions in the Assistant.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
This PR makes the `has_llm_subscription` and
`max_monthly_spend_in_cents` fields in the `LlmTokenClaims` required.
This change will be safe to deploy in ~45 minutes.
Release Notes:
- N/A
This PR adds a new `Cents` type that can be used to represent a monetary
value in cents.
This cuts down on the primitive obsession we were using when dealing
with money in the billing code.
Release Notes:
- N/A
This PR makes the `github_user_login` field required in the
`LlmTokenClaims`.
We previously added this in
https://github.com/zed-industries/zed/pull/16316 and made it optional
for backwards-compatibility.
It's been more than long enough for all of the previous LLM tokens to
have expired, so we can now make the field required.
Release Notes:
- N/A
This PR reworks our existing billing code in preparation for charging
based on LLM usage.
We aren't yet exercising the new billing-related code outside of
development.
There are some noteworthy changes for our existing LLM usage tracking:
- A new `monthly_usages` table has been added for tracking usage
per-user, per-model, per-month
- The per-month usage measures have been removed, in favor of the
`monthly_usages` table
- All of the per-month metrics in the Clickhouse rows have been changed
from a rolling 30-day window to a calendar month
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max <max@zed.dev>
This PR extends the LLM usage tracking to support tracking usage for
cache writes and reads for Anthropic models.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Add `/auto` behind a feature flag that's disabled for now, even for
staff.
We've decided on a different design for context inference, but there are
parts of /auto that will be useful for that, so we want them in the code
base even if they're unused for now.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This PR adds a `GET /models` endpoint to the LLM service.
This endpoint returns the models that the authenticated user has access
to.
This is the first step towards populating the models for the hosted
service from the server.
Release Notes:
- N/A
- Cloudflare provides ISO-3166-1 country code for protectorates. Expand our allowlist to include the territories of countries on the allowlist (US, UK, France, Australia, New Zealand).
- Also include the country_code in the error message when we block.
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This PR fixes an issue where active user counts were being computed
across _all_ measures instead of the per-minute measures.
We now compute them using the tokens per minute measure, as we're
concerned with usage in recent minutes.
Release Notes:
- N/A
This PR fixes an issue where the active user count spanned individual
models.
We now track the active user counts on a per-model basis.
Release Notes:
- N/A