Commit graph

26 commits

Author SHA1 Message Date
Marshall Bowers
96bcceed40
collab: Add traces for user LLM rate limits (#16610)
This PR adds traces for when users hit LLM rate limits.

We were already emitting telemetry events for these to Clickhouse, but
it will be handy to have them available in Axiom as well.

Release Notes:

- N/A
2024-08-21 15:13:55 -04:00
Marshall Bowers
de41c151c8
collab: Add is_staff to upstream rate limit spans (#16463)
This PR adds the `is_staff` field to the `upstream rate limit` spans.

Since we use different API keys for staff vs non-staff, it will be
useful to break down the rate limits accordingly.

Release Notes:

- N/A
2024-08-19 10:15:25 -04:00
Marshall Bowers
3d997e5fd6
collab: Add is_staff to spans (#16389)
This PR adds the `is_staff` field to our LLM spans so that we can
distinguish between staff and non-staff traffic.

Release Notes:

- N/A
2024-08-16 18:42:44 -04:00
Max Brunsfeld
1b1070e0f7
Add tracing needed for LLM rate limit dashboards (#16388)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-16 17:52:31 -04:00
Marshall Bowers
7a5acc0b0c
collab: Rework model name checks (#16365)
This PR reworks how we do checks for model names in the LLM service.

We now normalize the model names using the models defined in the
database.

Release Notes:

- N/A
2024-08-16 13:54:28 -04:00
Marshall Bowers
9233418cb8
collab: Attach GitHub login to LLM spans (#16316)
This PR updates the LLM service to include the GitHub login on its
spans.

We need to pass this information through on the LLM token, so it will
temporarily be `None` until this change is deployed and new tokens have
been issued.

Release Notes:

- N/A
2024-08-15 17:06:20 -04:00
Marshall Bowers
5e05821d18
collab: Attach user_id to LLM spans (#16311)
This PR updates the LLM service to attach the user ID to the spans.

Release Notes:

- N/A
2024-08-15 15:49:12 -04:00
Marshall Bowers
b4c22cc861
collab: Add ability to revoke LLM service access tokens (#16143)
This PR adds the ability to revoke access tokens for the LLM service.

There is a new `revoked_access_tokens` table that contains the
identifiers (`jti`) of revoked access tokens.

To revoke an access token, insert a record into this table:

```sql
insert into revoked_access_tokens (jti) values ('1e887b9e-37f5-49e8-8feb-3274e5a86b67');
```

We now attach the `jti` as `authn.jti` to the tracing spans so that we
can associate an access token with a given request to the LLM service.

Release Notes:

- N/A
2024-08-12 21:47:05 -04:00
Max Brunsfeld
dbcd06642c
Track lifetime spending for each user and model (#16137)
Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-12 20:15:26 -04:00
Max Brunsfeld
a3c79218c4
Report telemetry events for rate limit errors (#16130)
clickhouse telemetry schema:

```
CREATE TABLE default.llm_rate_limit_events
(
    `time` DateTime64(3),
    `user_id` Int32,
    `is_staff` Bool,
    `plan` LowCardinality(String),
    `model` String,
    `provider` LowCardinality(String),
    `usage_measure` LowCardinality(String),
    `requests_this_minute` UInt64,
    `tokens_this_minute` UInt64,
    `tokens_this_day` UInt64,
    `max_requests_per_minute` UInt64,
    `max_tokens_per_minute` UInt64,
    `max_tokens_per_day` UInt64,
    `users_in_recent_minutes` UInt64,
    `users_in_recent_days` UInt64
)
ORDER BY tuple()
```

Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-12 16:31:11 -04:00
Max Brunsfeld
1674e12ccb
Expose anthropic API errors to the client (#16129)
Now, when an anthropic request is invalid or anthropic's API is down,
we'll expose that to the user instead of just returning a generic 500.

Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-12 13:11:48 -07:00
Marshall Bowers
f3ec8d425f
collab: Use a separate Anthropic API key for Zed staff (#16128)
This PR makes it so Zed staff can use a separate Anthropic API key for
the LLM service.

We also added an `is_staff` column to the `usages` table so that we can
exclude staff usage from the "active users" metrics that influence the
rate limits.

Release Notes:

- N/A

---------

Co-authored-by: Max <max@zed.dev>
2024-08-12 15:20:34 -04:00
Marshall Bowers
ebdb755fef
Surface upstream rate limits from Anthropic (#16118)
This PR makes it so hitting upstream rate limits from Anthropic result
in an HTTP 429 response instead of an HTTP 500.

To do this we need to surface structured errors out of the `anthropic`
crate.

Release Notes:

- N/A
2024-08-12 11:59:24 -04:00
Marshall Bowers
3140d6ce8c
collab: Temporarily bypass LLM rate limiting for staff (#16089)
This PR makes it so staff members will be exempt from rate limiting by
the LLM service.

This is just a temporary measure until we can tweak the rate-limiting
heuristics.

Staff members are still subject to upstream LLM provider rate limits.

Release Notes:

- N/A
2024-08-11 14:41:49 -04:00
Max Brunsfeld
33e120d964
Capture telemetry data on per-user monthly LLM spending (#16050)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 16:38:37 -07:00
Max Brunsfeld
8688b2ad19
Add telemetry for LLM usage (#16049)
Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 18:15:57 -04:00
Max Brunsfeld
fbebb73d7b
Use LLM service for tool call requests (#16046)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 16:22:58 -04:00
Max Brunsfeld
b1c69c2178
Fix usage recording in llm service (#16044)
Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 11:48:18 -07:00
Max Brunsfeld
225726ba4a
Remove code paths that skip LLM db in prod (#16008)
Release Notes:

- N/A
2024-08-09 10:41:50 -04:00
Max Brunsfeld
06625bfe94
Apply rate limits in LLM service (#15997)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-08-08 15:46:33 -07:00
Bennet Bo Fenner
514b79e461
collab: Always use newest anthropic model version (#15978)
When Anthropic releases a new version of their models, Zed AI users
should always get access to the new version even when using an old
version of zed.

Co-Authored-By: Thorsten <thorsten@zed.dev>

Release Notes:

- N/A

Co-authored-by: Thorsten <thorsten@zed.dev>
2024-08-08 15:24:08 +02:00
Marshall Bowers
7f6d0919c9
collab: Setup database for LLM service (#15882)
This PR puts the initial infrastructure for the LLM service's database
in place.

The LLM service will be using a separate Postgres database, with its own
set of migrations.

Currently we only connect to the database in development, as we don't
yet have the database setup for the staging/production environments.

Release Notes:

- N/A
2024-08-06 17:18:08 -04:00
Marshall Bowers
cf5f4dddf5
Authorize access to language model providers based on country (#15859)
This PR updates the LLM service to authorize access to language model
providers based on the requester's country.

We detect the country using Cloudflare's
[`CF-IPCountry`](https://developers.cloudflare.com/fundamentals/reference/http-request-headers/#cf-ipcountry)
header.

The country code is then checked against the list of supported countries
for the given LLM provider. Countries that are not supported will
receive an `HTTP 451: Unavailable For Legal Reasons` response.

Release Notes:

- N/A
2024-08-06 11:49:04 -04:00
Marshall Bowers
ca9511393b
collab: Add support for more providers to the LLM service (#15832)
This PR adds support for additional providers to the LLM service:

- OpenAI
- Google
- Custom Zed models (through Hugging Face)

Release Notes:

- N/A
2024-08-05 21:16:18 -04:00
Max Brunsfeld
8e9c2b1125
Introduce a separate backend service for LLM calls (#15831)
This PR introduces a separate backend service for making LLM calls.

It exposes an HTTP interface that can be called by Zed clients. To call
these endpoints, the client must provide a `Bearer` token. These tokens
are issued/refreshed by the collab service over RPC.

We're adding this in a backwards-compatible way. Right now the access
tokens can only be minted for Zed staff, and calling this separate LLM
service is behind the `llm-service` feature flag (which is not
automatically enabled for Zed staff).

Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-08-05 20:26:21 -04:00
Max Brunsfeld
27779e33fb
Refactor: Restructure collab main function to prepare for new subcommand: serve llm (#15824)
This is just a refactor that we're landing ahead of any functional
changes to make sure we haven't broken anything.

Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Jason <jason@zed.dev>
2024-08-05 12:07:38 -07:00