This PR adds a lifetime spending limit on LLM usage.
Exceeding this limit will prevent further use of the Zed LLM provider.
Currently the cap is $1,000.
Release Notes:
- N/A
This PR adds additional reporting of the active user counts as separate
logs.
We were already reporting these on individual rate limit events/logs,
but it seems like something that would be good to report on independent
of user activity.
Release Notes:
- N/A
This PR fixes an issue where the active user count spanned individual
models.
We now track the active user counts on a per-model basis.
Release Notes:
- N/A
This PR adds traces for when users hit LLM rate limits.
We were already emitting telemetry events for these to Clickhouse, but
it will be handy to have them available in Axiom as well.
Release Notes:
- N/A
This PR adds the `is_staff` field to the `upstream rate limit` spans.
Since we use different API keys for staff vs non-staff, it will be
useful to break down the rate limits accordingly.
Release Notes:
- N/A
This PR reworks how we do checks for model names in the LLM service.
We now normalize the model names using the models defined in the
database.
Release Notes:
- N/A
This PR updates the LLM service to include the GitHub login on its
spans.
We need to pass this information through on the LLM token, so it will
temporarily be `None` until this change is deployed and new tokens have
been issued.
Release Notes:
- N/A
This PR adds the ability to revoke access tokens for the LLM service.
There is a new `revoked_access_tokens` table that contains the
identifiers (`jti`) of revoked access tokens.
To revoke an access token, insert a record into this table:
```sql
insert into revoked_access_tokens (jti) values ('1e887b9e-37f5-49e8-8feb-3274e5a86b67');
```
We now attach the `jti` as `authn.jti` to the tracing spans so that we
can associate an access token with a given request to the LLM service.
Release Notes:
- N/A
Now, when an anthropic request is invalid or anthropic's API is down,
we'll expose that to the user instead of just returning a generic 500.
Release Notes:
- N/A
Co-authored-by: Marshall <marshall@zed.dev>
This PR makes it so Zed staff can use a separate Anthropic API key for
the LLM service.
We also added an `is_staff` column to the `usages` table so that we can
exclude staff usage from the "active users" metrics that influence the
rate limits.
Release Notes:
- N/A
---------
Co-authored-by: Max <max@zed.dev>
This PR makes it so hitting upstream rate limits from Anthropic result
in an HTTP 429 response instead of an HTTP 500.
To do this we need to surface structured errors out of the `anthropic`
crate.
Release Notes:
- N/A
This PR makes it so staff members will be exempt from rate limiting by
the LLM service.
This is just a temporary measure until we can tweak the rate-limiting
heuristics.
Staff members are still subject to upstream LLM provider rate limits.
Release Notes:
- N/A
When Anthropic releases a new version of their models, Zed AI users
should always get access to the new version even when using an old
version of zed.
Co-Authored-By: Thorsten <thorsten@zed.dev>
Release Notes:
- N/A
Co-authored-by: Thorsten <thorsten@zed.dev>
This PR puts the initial infrastructure for the LLM service's database
in place.
The LLM service will be using a separate Postgres database, with its own
set of migrations.
Currently we only connect to the database in development, as we don't
yet have the database setup for the staging/production environments.
Release Notes:
- N/A
This PR updates the LLM service to authorize access to language model
providers based on the requester's country.
We detect the country using Cloudflare's
[`CF-IPCountry`](https://developers.cloudflare.com/fundamentals/reference/http-request-headers/#cf-ipcountry)
header.
The country code is then checked against the list of supported countries
for the given LLM provider. Countries that are not supported will
receive an `HTTP 451: Unavailable For Legal Reasons` response.
Release Notes:
- N/A
This PR introduces a separate backend service for making LLM calls.
It exposes an HTTP interface that can be called by Zed clients. To call
these endpoints, the client must provide a `Bearer` token. These tokens
are issued/refreshed by the collab service over RPC.
We're adding this in a backwards-compatible way. Right now the access
tokens can only be minted for Zed staff, and calling this separate LLM
service is behind the `llm-service` feature flag (which is not
automatically enabled for Zed staff).
Release Notes:
- N/A
---------
Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This is just a refactor that we're landing ahead of any functional
changes to make sure we haven't broken anything.
Release Notes:
- N/A
Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Jason <jason@zed.dev>