Commit graph

27 commits

Author SHA1 Message Date
Richard Feldman
91ffa02e2c
/auto (#16696)
Add `/auto` behind a feature flag that's disabled for now, even for
staff.

We've decided on a different design for context inference, but there are
parts of /auto that will be useful for that, so we want them in the code
base even if they're unused for now.

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-09-13 13:17:49 -04:00
Piotr Osiewicz
e6c1c51b37
chore: Fix several style lints (#17488)
It's not comprehensive enough to start linting on `style` group, but
hey, it's a start.

Release Notes:

- N/A
2024-09-06 11:58:39 +02:00
Marshall Bowers
30056254f3
collab: Add GET /models endpoint to LLM service (#17307)
This PR adds a `GET /models` endpoint to the LLM service.

This endpoint returns the models that the authenticated user has access
to.

This is the first step towards populating the models for the hosted
service from the server.

Release Notes:

- N/A
2024-09-03 11:41:32 -04:00
Peter Tripp
4d6bb52d1f
Anthropic/OpenAI: Add country codes for territories (#17089)
- Cloudflare provides ISO-3166-1 country code for protectorates. Expand our allowlist to include the territories of countries on the allowlist (US, UK, France, Australia, New Zealand). 
- Also include the country_code in the error message when we block. 

Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-08-29 11:32:29 -04:00
Marshall Bowers
93a7682659
collab: Count active users based on the tokens per minute measure (#16911)
This PR fixes an issue where active user counts were being computed
across _all_ measures instead of the per-minute measures.

We now compute them using the tokens per minute measure, as we're
concerned with usage in recent minutes.

Release Notes:

- N/A
2024-08-26 15:04:55 -04:00
Marshall Bowers
0229d3ccac
collab: Track active user counts independently for each model (#16624)
This PR fixes an issue where the active user count spanned individual
models.

We now track the active user counts on a per-model basis.

Release Notes:

- N/A
2024-08-21 17:19:47 -04:00
Max Brunsfeld
b5bd8a5c5d
Add logic for closed beta LLM models (#16482)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-19 11:09:52 -07:00
Max Brunsfeld
1b1070e0f7
Add tracing needed for LLM rate limit dashboards (#16388)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-16 17:52:31 -04:00
Marshall Bowers
a9441879c3
collab: Fix writing LLM rate limit events to Clickhouse (#16367)
This PR fixes the writing of LLM rate limit events to Clickhouse.

We had a table in the table name: `llm_rate_limits` instead of
`llm_rate_limit_events`.

I also extracted a helper function to write to Clickhouse so we can use
it anywhere we need to.

Release Notes:

- N/A
2024-08-16 14:03:34 -04:00
Marshall Bowers
7a5acc0b0c
collab: Rework model name checks (#16365)
This PR reworks how we do checks for model names in the LLM service.

We now normalize the model names using the models defined in the
database.

Release Notes:

- N/A
2024-08-16 13:54:28 -04:00
Marshall Bowers
9233418cb8
collab: Attach GitHub login to LLM spans (#16316)
This PR updates the LLM service to include the GitHub login on its
spans.

We need to pass this information through on the LLM token, so it will
temporarily be `None` until this change is deployed and new tokens have
been issued.

Release Notes:

- N/A
2024-08-15 17:06:20 -04:00
Max Brunsfeld
6b7664ef4a
Fix bugs preventing non-staff users from using LLM service (#16307)
- db deadlock in GetLlmToken for non-staff users
- typo in allowed model name for non-staff users

Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Joseph <joseph@zed.dev>
2024-08-15 11:21:19 -07:00
Marshall Bowers
b4c22cc861
collab: Add ability to revoke LLM service access tokens (#16143)
This PR adds the ability to revoke access tokens for the LLM service.

There is a new `revoked_access_tokens` table that contains the
identifiers (`jti`) of revoked access tokens.

To revoke an access token, insert a record into this table:

```sql
insert into revoked_access_tokens (jti) values ('1e887b9e-37f5-49e8-8feb-3274e5a86b67');
```

We now attach the `jti` as `authn.jti` to the tracing spans so that we
can associate an access token with a given request to the LLM service.

Release Notes:

- N/A
2024-08-12 21:47:05 -04:00
Max Brunsfeld
dbcd06642c
Track lifetime spending for each user and model (#16137)
Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-12 20:15:26 -04:00
Max Brunsfeld
a3c79218c4
Report telemetry events for rate limit errors (#16130)
clickhouse telemetry schema:

```
CREATE TABLE default.llm_rate_limit_events
(
    `time` DateTime64(3),
    `user_id` Int32,
    `is_staff` Bool,
    `plan` LowCardinality(String),
    `model` String,
    `provider` LowCardinality(String),
    `usage_measure` LowCardinality(String),
    `requests_this_minute` UInt64,
    `tokens_this_minute` UInt64,
    `tokens_this_day` UInt64,
    `max_requests_per_minute` UInt64,
    `max_tokens_per_minute` UInt64,
    `max_tokens_per_day` UInt64,
    `users_in_recent_minutes` UInt64,
    `users_in_recent_days` UInt64
)
ORDER BY tuple()
```

Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-12 16:31:11 -04:00
Marshall Bowers
f3ec8d425f
collab: Use a separate Anthropic API key for Zed staff (#16128)
This PR makes it so Zed staff can use a separate Anthropic API key for
the LLM service.

We also added an `is_staff` column to the `usages` table so that we can
exclude staff usage from the "active users" metrics that influence the
rate limits.

Release Notes:

- N/A

---------

Co-authored-by: Max <max@zed.dev>
2024-08-12 15:20:34 -04:00
Max Brunsfeld
33e120d964
Capture telemetry data on per-user monthly LLM spending (#16050)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 16:38:37 -07:00
Max Brunsfeld
8688b2ad19
Add telemetry for LLM usage (#16049)
Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 18:15:57 -04:00
Max Brunsfeld
423c7b999a
Larger rate limit integers (#16047)
Tokens per day may exceed the range of Postgres's 32-bit `integer` data
type.

Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-09 14:07:49 -07:00
Max Brunsfeld
240b7c641c
Fix llm queries (#16006)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
2024-08-08 17:21:38 -07:00
Max Brunsfeld
06625bfe94
Apply rate limits in LLM service (#15997)
Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-08-08 15:46:33 -07:00
Bennet Bo Fenner
3a52d6cc52
assistant: Limit model access for Zed AI users to Claude-3.5-sonnet (#15904)
This prevents users from accessing other models, such as OpenAI's GPT-4
or Google's Gemini-Pro.
Staff members can still access all models.

Co-authored-by: Thorsten <thorsten@zed.dev>

Release Notes:

- N/A

---------

Co-authored-by: Thorsten <thorsten@zed.dev>
2024-08-07 16:26:56 +02:00
Marshall Bowers
a54e16b7ea
collab: Add usages table to LLM database (#15884)
This PR adds a `usages` table to the LLM database.

We'll use this to track usage for rate-limiting purposes.

Release Notes:

- N/A
2024-08-06 18:40:10 -04:00
Marshall Bowers
b19f85f9b5
collab: Remove unused parameter to run_database_migrations (#15883)
This PR removes the unused `ignore_checksum_mismatch` parameter to
`run_database_migrations`.

We were always passing `false`, which meant the behavior didn't need to
be parameterized.

Release Notes:

- N/A
2024-08-06 17:31:52 -04:00
Marshall Bowers
7f6d0919c9
collab: Setup database for LLM service (#15882)
This PR puts the initial infrastructure for the LLM service's database
in place.

The LLM service will be using a separate Postgres database, with its own
set of migrations.

Currently we only connect to the database in development, as we don't
yet have the database setup for the staging/production environments.

Release Notes:

- N/A
2024-08-06 17:18:08 -04:00
Marshall Bowers
cf5f4dddf5
Authorize access to language model providers based on country (#15859)
This PR updates the LLM service to authorize access to language model
providers based on the requester's country.

We detect the country using Cloudflare's
[`CF-IPCountry`](https://developers.cloudflare.com/fundamentals/reference/http-request-headers/#cf-ipcountry)
header.

The country code is then checked against the list of supported countries
for the given LLM provider. Countries that are not supported will
receive an `HTTP 451: Unavailable For Legal Reasons` response.

Release Notes:

- N/A
2024-08-06 11:49:04 -04:00
Max Brunsfeld
8e9c2b1125
Introduce a separate backend service for LLM calls (#15831)
This PR introduces a separate backend service for making LLM calls.

It exposes an HTTP interface that can be called by Zed clients. To call
these endpoints, the client must provide a `Bearer` token. These tokens
are issued/refreshed by the collab service over RPC.

We're adding this in a backwards-compatible way. Right now the access
tokens can only be minted for Zed staff, and calling this separate LLM
service is behind the `llm-service` feature flag (which is not
automatically enabled for Zed staff).

Release Notes:

- N/A

---------

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-08-05 20:26:21 -04:00