This PR attaches two new properties to the `Language Model Used` event:
- `has_llm_subscription` - This will tell us if a user is a paid
subscriber.
- `max_monthly_spend_in_cents` - This will indicate what their maximum
monthly spend is set to.
Release Notes:
- N/A
This removes the `low_speed_timeout` setting from all providers as a
response to issue #19509.
Reason being that the original `low_speed_timeout` was only as part of
#9913 because users wanted to _get rid of timeouts_. They wanted to bump
the default timeout from 5sec to a lot more.
Then, in the meantime, the meaning of `low_speed_timeout` changed in
#19055 and was changed to a normal `timeout`, which is a different thing
and breaks slower LLMs that don't reply with a complete response in the
configured timeout.
So we figured: let's remove the whole thing and replace it with a
default _connect_ timeout to make sure that we can connect to a server
in 10s, but then give the server as long as it wants to complete its
response.
Closes#19509
Release Notes:
- Removed the `low_speed_timeout` setting from LLM provider settings,
since it was only used to _increase_ the timeout to give LLMs more time,
but since we don't have any other use for it, we simply remove the
setting to give LLMs as long as they need.
---------
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>
This PR updates the usage limit check to exempt Zed staff members from
usage limits.
We previously had some affordances for the rate limits, but hadn't yet
updated it for the usage-based billing.
Release Notes:
- N/A
This PR removes the conditional checks around the billing-related
enforcement for LLM completions.
These were just in place to prevent executing any billing code before we
had rolled it out. Now that it is rolled out, we don't need this
conditional execution anymore.
Release Notes:
- N/A
This PR removes the lifetime spending limit that was added in #16780.
We had previously added this as a way to prevent runaway usage, but now
that we have a cap on free usage per month with paid access after that,
we don't need this check anymore.
Release Notes:
- N/A
This PR adjusts the billing logic to not write any records to
`billing_events` if:
- The user is staff, as we don't want to bill staff members
- Billing is disabled (we currently enable billing based on the presence
of the Stripe API key)
Release Notes:
- N/A
This PR adds usage-based billing for LLM interactions in the Assistant.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
This PR makes the `has_llm_subscription` and
`max_monthly_spend_in_cents` fields in the `LlmTokenClaims` required.
This change will be safe to deploy in ~45 minutes.
Release Notes:
- N/A
This PR adds a new `billing_preferences` table.
Right now there is a single preference: the maximum monthly spend for
LLM usage.
Release Notes:
- N/A
---------
Co-authored-by: Richard <richard@zed.dev>
This PR renames the `MONTHLY_SPENDING_LIMIT` constant to
`FREE_TIER_MONTHLY_SPENDING_LIMIT` to clarify it.
This will help distinguish it from the user's specified limit on their
paid monthly spending.
Release Notes:
- N/A
This PR adds a new `Cents` type that can be used to represent a monetary
value in cents.
This cuts down on the primitive obsession we were using when dealing
with money in the billing code.
Release Notes:
- N/A
This PR reworks our existing billing code in preparation for charging
based on LLM usage.
We aren't yet exercising the new billing-related code outside of
development.
There are some noteworthy changes for our existing LLM usage tracking:
- A new `monthly_usages` table has been added for tracking usage
per-user, per-model, per-month
- The per-month usage measures have been removed, in favor of the
`monthly_usages` table
- All of the per-month metrics in the Clickhouse rows have been changed
from a rolling 30-day window to a calendar month
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max <max@zed.dev>
This PR extends the LLM usage tracking to support tracking usage for
cache writes and reads for Anthropic models.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
REplace isahc with ureq everywhere gpui is used.
This should allow us to make http requests without libssl; and avoid a
long-tail of panics caused by ishac.
Release Notes:
- (potentially breaking change) updated our http client
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
Add `/auto` behind a feature flag that's disabled for now, even for
staff.
We've decided on a different design for context inference, but there are
parts of /auto that will be useful for that, so we want them in the code
base even if they're unused for now.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This PR adds a `GET /models` endpoint to the LLM service.
This endpoint returns the models that the authenticated user has access
to.
This is the first step towards populating the models for the hosted
service from the server.
Release Notes:
- N/A
This PR adds a lifetime spending limit on LLM usage.
Exceeding this limit will prevent further use of the Zed LLM provider.
Currently the cap is $1,000.
Release Notes:
- N/A
This PR adds additional reporting of the active user counts as separate
logs.
We were already reporting these on individual rate limit events/logs,
but it seems like something that would be good to report on independent
of user activity.
Release Notes:
- N/A
This PR fixes an issue where the active user count spanned individual
models.
We now track the active user counts on a per-model basis.
Release Notes:
- N/A
This PR adds traces for when users hit LLM rate limits.
We were already emitting telemetry events for these to Clickhouse, but
it will be handy to have them available in Axiom as well.
Release Notes:
- N/A
This PR adds the `is_staff` field to the `upstream rate limit` spans.
Since we use different API keys for staff vs non-staff, it will be
useful to break down the rate limits accordingly.
Release Notes:
- N/A
This PR reworks how we do checks for model names in the LLM service.
We now normalize the model names using the models defined in the
database.
Release Notes:
- N/A
This PR updates the LLM service to include the GitHub login on its
spans.
We need to pass this information through on the LLM token, so it will
temporarily be `None` until this change is deployed and new tokens have
been issued.
Release Notes:
- N/A
This PR adds the ability to revoke access tokens for the LLM service.
There is a new `revoked_access_tokens` table that contains the
identifiers (`jti`) of revoked access tokens.
To revoke an access token, insert a record into this table:
```sql
insert into revoked_access_tokens (jti) values ('1e887b9e-37f5-49e8-8feb-3274e5a86b67');
```
We now attach the `jti` as `authn.jti` to the tracing spans so that we
can associate an access token with a given request to the LLM service.
Release Notes:
- N/A
Now, when an anthropic request is invalid or anthropic's API is down,
we'll expose that to the user instead of just returning a generic 500.
Release Notes:
- N/A
Co-authored-by: Marshall <marshall@zed.dev>
This PR makes it so Zed staff can use a separate Anthropic API key for
the LLM service.
We also added an `is_staff` column to the `usages` table so that we can
exclude staff usage from the "active users" metrics that influence the
rate limits.
Release Notes:
- N/A
---------
Co-authored-by: Max <max@zed.dev>
This PR makes it so hitting upstream rate limits from Anthropic result
in an HTTP 429 response instead of an HTTP 500.
To do this we need to surface structured errors out of the `anthropic`
crate.
Release Notes:
- N/A
This PR makes it so staff members will be exempt from rate limiting by
the LLM service.
This is just a temporary measure until we can tweak the rate-limiting
heuristics.
Staff members are still subject to upstream LLM provider rate limits.
Release Notes:
- N/A