This PR updates the `user rate limit` and `user usage` log lines to
include some more information that will be useful for graphing in Axiom.
Release Notes:
- N/A
This PR updates the usage measures used for rate limiting when using
Claude 3.7 Sonnet.
Instead of using the combined `tokens_per_minute` measure we now rate
limit individually on `input_tokens_per_minute` (which exclude cache
reads) and `output_tokens_per_minute`.
Release Notes:
- N/A
This PR adds tracking for input and output tokens per minute separately
from the current aggregate tokens per minute.
We are not yet rate-limiting based on these measures.
Release Notes:
- N/A
This PR makes the account age-related fields required in
`LlmTokenClaims`.
We've also removed the account age check from the LLM token issuance
endpoint, instead having it solely be enforced in the `POST /completion`
endpoint.
This change will be safe to deploy at ~8:01PM EDT.
Release Notes:
- N/A
This PR defers the account age check to the `POST /completion` endpoint
instead of doing it when an LLM token is generated.
This will allow us to lift the account age restriction for using Edit
Prediction.
Note: We're still temporarily performing the account age check when
issuing the LLM token until this change is deployed and the LLM tokens
have had a chance to cycle.
Release Notes:
- N/A
This is a follow-up to https://github.com/zed-industries/zed/pull/25573.
We were still using the spend for a particular model when determining if
the user was over their maximum monthly spend instead of looking at the
usage across all models.
Release Notes:
- N/A
This PR adjusts the usage checks for the LLM free tier.
Previously we would limit the usage on a per-model basis, meaning the
user would get $10/mo free for each model they had access to.
We now have usage for all models count towards the free tier limit.
Release Notes:
- N/A
This PR removes the `POST /predict_edits` endpoint from the LLM service,
as it has been superseded by the corresponding endpoint running in
Cloudflare Workers.
All traffic is already being routed to the Cloudflare Workers via the
Workers route, so nothing is hitting this endpoint running in the LLM
service anymore.
You can see the drop off in requests to this endpoint on this graph when
the Workers route was added:
<img width="472" alt="Screenshot 2025-01-30 at 9 18 04 PM"
src="https://github.com/user-attachments/assets/fa60f7c8-2737-4329-88a3-17093bdb5a29"
/>
We also don't use the `fireworks` crate anymore in this repo, so it has
been removed.
Release Notes:
- N/A
Realized that the logic in #23814 was more than needed, and harder to
maintain. Something like that could make sense if using the tokenizer
and wanting to precisely hit a token limit. However in the case of edit
predictions it's more of a latency+expense vs capability tradeoff, and
so such precision is unnecessary.
Happily this change didn't require much extra work, just copy-modifying
parts of that change was sufficient.
Release Notes:
- N/A
This PR attaches two new properties to the `Language Model Used` event:
- `has_llm_subscription` - This will tell us if a user is a paid
subscriber.
- `max_monthly_spend_in_cents` - This will indicate what their maximum
monthly spend is set to.
Release Notes:
- N/A
This removes the `low_speed_timeout` setting from all providers as a
response to issue #19509.
Reason being that the original `low_speed_timeout` was only as part of
#9913 because users wanted to _get rid of timeouts_. They wanted to bump
the default timeout from 5sec to a lot more.
Then, in the meantime, the meaning of `low_speed_timeout` changed in
#19055 and was changed to a normal `timeout`, which is a different thing
and breaks slower LLMs that don't reply with a complete response in the
configured timeout.
So we figured: let's remove the whole thing and replace it with a
default _connect_ timeout to make sure that we can connect to a server
in 10s, but then give the server as long as it wants to complete its
response.
Closes#19509
Release Notes:
- Removed the `low_speed_timeout` setting from LLM provider settings,
since it was only used to _increase_ the timeout to give LLMs more time,
but since we don't have any other use for it, we simply remove the
setting to give LLMs as long as they need.
---------
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>
This PR updates the usage limit check to exempt Zed staff members from
usage limits.
We previously had some affordances for the rate limits, but hadn't yet
updated it for the usage-based billing.
Release Notes:
- N/A
This PR removes the conditional checks around the billing-related
enforcement for LLM completions.
These were just in place to prevent executing any billing code before we
had rolled it out. Now that it is rolled out, we don't need this
conditional execution anymore.
Release Notes:
- N/A
This PR removes the lifetime spending limit that was added in #16780.
We had previously added this as a way to prevent runaway usage, but now
that we have a cap on free usage per month with paid access after that,
we don't need this check anymore.
Release Notes:
- N/A
This PR adjusts the billing logic to not write any records to
`billing_events` if:
- The user is staff, as we don't want to bill staff members
- Billing is disabled (we currently enable billing based on the presence
of the Stripe API key)
Release Notes:
- N/A
This PR adds usage-based billing for LLM interactions in the Assistant.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
This PR makes the `has_llm_subscription` and
`max_monthly_spend_in_cents` fields in the `LlmTokenClaims` required.
This change will be safe to deploy in ~45 minutes.
Release Notes:
- N/A
This PR adds a new `billing_preferences` table.
Right now there is a single preference: the maximum monthly spend for
LLM usage.
Release Notes:
- N/A
---------
Co-authored-by: Richard <richard@zed.dev>
This PR renames the `MONTHLY_SPENDING_LIMIT` constant to
`FREE_TIER_MONTHLY_SPENDING_LIMIT` to clarify it.
This will help distinguish it from the user's specified limit on their
paid monthly spending.
Release Notes:
- N/A
This PR adds a new `Cents` type that can be used to represent a monetary
value in cents.
This cuts down on the primitive obsession we were using when dealing
with money in the billing code.
Release Notes:
- N/A
This PR reworks our existing billing code in preparation for charging
based on LLM usage.
We aren't yet exercising the new billing-related code outside of
development.
There are some noteworthy changes for our existing LLM usage tracking:
- A new `monthly_usages` table has been added for tracking usage
per-user, per-model, per-month
- The per-month usage measures have been removed, in favor of the
`monthly_usages` table
- All of the per-month metrics in the Clickhouse rows have been changed
from a rolling 30-day window to a calendar month
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max <max@zed.dev>
This PR extends the LLM usage tracking to support tracking usage for
cache writes and reads for Anthropic models.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
REplace isahc with ureq everywhere gpui is used.
This should allow us to make http requests without libssl; and avoid a
long-tail of panics caused by ishac.
Release Notes:
- (potentially breaking change) updated our http client
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
Add `/auto` behind a feature flag that's disabled for now, even for
staff.
We've decided on a different design for context inference, but there are
parts of /auto that will be useful for that, so we want them in the code
base even if they're unused for now.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This PR adds a `GET /models` endpoint to the LLM service.
This endpoint returns the models that the authenticated user has access
to.
This is the first step towards populating the models for the hosted
service from the server.
Release Notes:
- N/A