Commit graph

79 commits

Author SHA1 Message Date
Marshall Bowers
9bd3dbcf28
collab: Include more information on some LLM usage log lines (#28116)
This PR updates the `user rate limit` and `user usage` log lines to
include some more information that will be useful for graphing in Axiom.

Release Notes:

- N/A
2025-04-04 18:33:23 +00:00
Marshall Bowers
558d61b907
collab: Adjust rate-limiting measures for Claude 3.7 Sonnet (#28111)
This PR updates the usage measures used for rate limiting when using
Claude 3.7 Sonnet.

Instead of using the combined `tokens_per_minute` measure we now rate
limit individually on `input_tokens_per_minute` (which exclude cache
reads) and `output_tokens_per_minute`.

Release Notes:

- N/A
2025-04-04 13:37:24 -04:00
Marshall Bowers
1a899fda60
collab: Capture upstream input/output rate limits from Anthropic (#28106)
This PR makes it so we capture the upstream rate limit information from
Anthropic for input and output tokens.

Release Notes:

- N/A
2025-04-04 17:09:00 +00:00
Marshall Bowers
183f57f318
collab: Include max input/output tokens per minute on "Language Model Rate Limited" event (#28108)
This PR adds the max input/output tokens per minute on the "Language
Model Rate Limited" event.

Missed this in https://github.com/zed-industries/zed/pull/28097.

Release Notes:

- N/A
2025-04-04 16:57:43 +00:00
Marshall Bowers
5fe86f7e70
collab: Track input and output tokens per minute separately (#28097)
This PR adds tracking for input and output tokens per minute separately
from the current aggregate tokens per minute.

We are not yet rate-limiting based on these measures.

Release Notes:

- N/A
2025-04-04 15:37:06 +00:00
Piotr Osiewicz
dc64ec9cc8
chore: Bump Rust edition to 2024 (#27800)
Follow-up to https://github.com/zed-industries/zed/pull/27791

Release Notes:

- N/A
2025-03-31 20:55:27 +02:00
Marshall Bowers
cd5d7e82d0
collab: Make account age-related fields required in LlmTokenClaims (#26959)
This PR makes the account age-related fields required in
`LlmTokenClaims`.

We've also removed the account age check from the LLM token issuance
endpoint, instead having it solely be enforced in the `POST /completion`
endpoint.

This change will be safe to deploy at ~8:01PM EDT.

Release Notes:

- N/A
2025-03-17 19:54:44 -04:00
Marshall Bowers
0851842d2c
collab: Defer account age check to POST /completion endpoint (#26956)
This PR defers the account age check to the `POST /completion` endpoint
instead of doing it when an LLM token is generated.

This will allow us to lift the account age restriction for using Edit
Prediction.

Note: We're still temporarily performing the account age check when
issuing the LLM token until this change is deployed and the LLM tokens
have had a chance to cycle.

Release Notes:

- N/A
2025-03-17 22:42:29 +00:00
Marshall Bowers
7f166298db
collab: Adjust maximum spending limit check (#25596)
This is a follow-up to https://github.com/zed-industries/zed/pull/25573.

We were still using the spend for a particular model when determining if
the user was over their maximum monthly spend instead of looking at the
usage across all models.

Release Notes:

- N/A
2025-02-25 16:45:01 -05:00
Marshall Bowers
3a3621f2d8
collab: Limit free tier usage across all models (#25573)
This PR adjusts the usage checks for the LLM free tier.

Previously we would limit the usage on a per-model basis, meaning the
user would get $10/mo free for each model they had access to.

We now have usage for all models count towards the free tier limit.

Release Notes:

- N/A
2025-02-25 16:42:55 +00:00
Peter Tripp
10a4760f90
Add Anthropic Claude 3.7 support (#25497) 2025-02-24 16:10:26 -05:00
Marshall Bowers
8be73bf187
collab: Remove unused POST /predict_edits endpoint from LLM service (#23997)
This PR removes the `POST /predict_edits` endpoint from the LLM service,
as it has been superseded by the corresponding endpoint running in
Cloudflare Workers.

All traffic is already being routed to the Cloudflare Workers via the
Workers route, so nothing is hitting this endpoint running in the LLM
service anymore.

You can see the drop off in requests to this endpoint on this graph when
the Workers route was added:

<img width="472" alt="Screenshot 2025-01-30 at 9 18 04 PM"
src="https://github.com/user-attachments/assets/fa60f7c8-2737-4329-88a3-17093bdb5a29"
/>

We also don't use the `fireworks` crate anymore in this repo, so it has
been removed.

Release Notes:

- N/A
2025-01-31 03:21:40 +00:00
Michael Sloan
87b0f62041
Implement simpler logic for edit predictions prompt byte limits (#23983)
Realized that the logic in #23814 was more than needed, and harder to
maintain. Something like that could make sense if using the tokenizer
and wanting to precisely hit a token limit. However in the case of edit
predictions it's more of a latency+expense vs capability tradeoff, and
so such precision is unnecessary.

Happily this change didn't require much extra work, just copy-modifying
parts of that change was sufficient.

Release Notes:

- N/A
2025-01-30 15:27:42 -07:00
Agus Zubiaga
e23e03592b
zeta: Onboarding and title bar banner (#23797)
Release Notes:

- N/A

---------

Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Co-authored-by: Danilo <danilo@zed.dev>
Co-authored-by: João Marcos <joao@zed.dev>
2025-01-30 16:55:32 -03:00
Thorsten Ball
0cb41754e2
llm: Sample ~10% of staff members inputs/outputs to LLM (#23537)
Release Notes:

- N/A
2025-01-23 15:32:25 +01:00
Antonio Scandurra
880f3ff243
Timeout if completion takes longer than 2s (#23215)
Release Notes:

- N/A
2025-01-16 11:13:25 +01:00
Agus Zubiaga
4a7630204a
Check for predict-edits feature flag, remove is_staff check (#23165)
Release Notes:

- N/A

---------

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>
2025-01-15 13:52:10 +00:00
Antonio Scandurra
c26553de82
Add more metrics for Fireworks Completion Requested (#23062)
Release Notes:

- N/A

Co-authored-by: Thorsten <thorsten@zed.dev>
2025-01-13 12:04:28 +00:00
Thorsten Ball
1fcc9b36ba
zeta: Report Fireworks request data to Snowflake (#22973)
Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Conrad <conrad@zed.dev>
2025-01-10 22:40:54 +00:00
Antonio Scandurra
c3301077af
Log errors when a prediction fails (#22961)
Release Notes:

- N/A

---------

Co-authored-by: Thorsten <thorsten@zed.dev>
2025-01-10 14:07:17 +00:00
Antonio Scandurra
a8ef0f2426
Include outline when predicting edits with Zeta (#22895)
Release Notes:

- N/A

Co-authored-by: Thorsten <thorsten@zed.dev>
2025-01-09 14:26:33 +00:00
Conrad Irwin
03efd0d1d9
Stop sending data to Clickhouse (#21763)
Release Notes:

- N/A
2024-12-10 08:47:29 -07:00
Marshall Bowers
158cdc33ba
collab: Attach additional properties to Language Model Used event (#21770)
This PR attaches two new properties to the `Language Model Used` event:

- `has_llm_subscription` - This will tell us if a user is a paid
subscriber.
- `max_monthly_spend_in_cents` - This will indicate what their maximum
monthly spend is set to.

Release Notes:

- N/A
2024-12-09 17:13:41 -05:00
Antonio Scandurra
77b8296fbb
Introduce staff-only inline completion provider (#21739)
Release Notes:

- N/A

---------

Co-authored-by: Thorsten Ball <mrnugget@gmail.com>
Co-authored-by: Bennet <bennet@zed.dev>
Co-authored-by: Thorsten <thorsten@zed.dev>
2024-12-09 14:26:36 +01:00
Conrad Irwin
984bb192ba
Send llm events to snowflake too (#21091)
Closes #ISSUE

Release Notes:

- N/A
2024-11-22 20:40:39 -07:00
Thorsten Ball
aee01f2c50
assistant: Remove low_speed_timeout (#20681)
This removes the `low_speed_timeout` setting from all providers as a
response to issue #19509.

Reason being that the original `low_speed_timeout` was only as part of
#9913 because users wanted to _get rid of timeouts_. They wanted to bump
the default timeout from 5sec to a lot more.

Then, in the meantime, the meaning of `low_speed_timeout` changed in
#19055 and was changed to a normal `timeout`, which is a different thing
and breaks slower LLMs that don't reply with a complete response in the
configured timeout.

So we figured: let's remove the whole thing and replace it with a
default _connect_ timeout to make sure that we can connect to a server
in 10s, but then give the server as long as it wants to complete its
response.

Closes #19509

Release Notes:

- Removed the `low_speed_timeout` setting from LLM provider settings,
since it was only used to _increase_ the timeout to give LLMs more time,
but since we don't have any other use for it, we simply remove the
setting to give LLMs as long as they need.

---------

Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>
2024-11-15 07:37:31 +01:00
Marshall Bowers
a451bcc3c4
collab: Exempt staff from LLM usage limits (#19836)
This PR updates the usage limit check to exempt Zed staff members from
usage limits.

We previously had some affordances for the rate limits, but hadn't yet
updated it for the usage-based billing.

Release Notes:

- N/A
2024-10-28 11:45:18 -04:00
Marshall Bowers
1a4b253ee5
collab: Add support for a custom monthly allowance for LLM usage (#19525)
This PR adds support for setting a monthly LLM usage allowance for
certain users.

Release Notes:

- N/A
2024-10-21 17:12:33 -04:00
Marshall Bowers
b44bed0115
collab: Unconditionally execute billing checks (#19432)
This PR removes the conditional checks around the billing-related
enforcement for LLM completions.

These were just in place to prevent executing any billing code before we
had rolled it out. Now that it is rolled out, we don't need this
conditional execution anymore.

Release Notes:

- N/A
2024-10-18 15:55:28 -04:00
Antonio Scandurra
8c910540ed
Subtract FREE_TIER_MONTHLY_SPENDING_LIMIT from reported monthly spend (#19358)
Release Notes:

- N/A
2024-10-17 13:09:50 +02:00
Marshall Bowers
f6fad3b09e
collab: Remove lifetime spending limit in favor of LLM usage billing (#19321)
This PR removes the lifetime spending limit that was added in #16780.

We had previously added this as a way to prevent runaway usage, but now
that we have a cap on free usage per month with paid access after that,
we don't need this check anymore.

Release Notes:

- N/A
2024-10-16 18:14:07 -04:00
Antonio Scandurra
474e670bbd
Increase monthly free tier spend from 5 dollars to 10 dollars (#19291)
Release Notes:

- N/A

Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
2024-10-16 12:22:24 -04:00
Mikayla Maki
22ac178f9d
Restore HTTP client transition, but use reqwest everywhere (#19055)
Release Notes:

- N/A
2024-10-11 14:58:58 -07:00
Marshall Bowers
c709b66f35
collab: Don't record billing events if billing is not enabled (#19102)
This PR adjusts the billing logic to not write any records to
`billing_events` if:

- The user is staff, as we don't want to bill staff members
- Billing is disabled (we currently enable billing based on the presence
of the Stripe API key)

Release Notes:

- N/A
2024-10-11 17:54:10 -04:00
Marshall Bowers
22ea7cef7a
collab: Add usage-based billing for LLM interactions (#19081)
This PR adds usage-based billing for LLM interactions in the Assistant.

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
2024-10-11 13:36:54 -04:00
Marshall Bowers
69711660ab
collab: Make LLM billing fields required in LlmTokenClaims (#18959)
This PR makes the `has_llm_subscription` and
`max_monthly_spend_in_cents` fields in the `LlmTokenClaims` required.

This change will be safe to deploy in ~45 minutes.

Release Notes:

- N/A
2024-10-09 18:42:22 -04:00
Marshall Bowers
d316577fd5
collab: Add billing preferences for maximum LLM monthly spend (#18948)
This PR adds a new `billing_preferences` table.

Right now there is a single preference: the maximum monthly spend for
LLM usage.

Release Notes:

- N/A

---------

Co-authored-by: Richard <richard@zed.dev>
2024-10-09 16:29:07 -04:00
Marshall Bowers
f1053ff525
collab: Clarify naming around free tier spending limits (#18936)
This PR renames the `MONTHLY_SPENDING_LIMIT` constant to
`FREE_TIER_MONTHLY_SPENDING_LIMIT` to clarify it.

This will help distinguish it from the user's specified limit on their
paid monthly spending.

Release Notes:

- N/A
2024-10-09 15:05:53 -04:00
Marshall Bowers
817a41c4dc
collab: Add a Cents type (#18935)
This PR adds a new `Cents` type that can be used to represent a monetary
value in cents.

This cuts down on the primitive obsession we were using when dealing
with money in the billing code.

Release Notes:

- N/A
2024-10-09 14:22:32 -04:00
Mikayla Maki
5d5c4b6677
Revert http client changes (#18892)
These proved to be too unstable. Will restore these changes once the issues have been fixed.

Release Notes:

- N/A
2024-10-09 01:07:18 -07:00
Marshall Bowers
f861479890
collab: Update billing code for LLM usage billing (#18879)
This PR reworks our existing billing code in preparation for charging
based on LLM usage.

We aren't yet exercising the new billing-related code outside of
development.

There are some noteworthy changes for our existing LLM usage tracking:

- A new `monthly_usages` table has been added for tracking usage
per-user, per-model, per-month
- The per-month usage measures have been removed, in favor of the
`monthly_usages` table
- All of the per-month metrics in the Clickhouse rows have been changed
from a rolling 30-day window to a calendar month

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max <max@zed.dev>
2024-10-08 18:29:38 -04:00
Marshall Bowers
d55f025906
collab: Track cache writes/reads in LLM usage (#18834)
This PR extends the LLM usage tracking to support tracking usage for
cache writes and reads for Anthropic models.

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
2024-10-07 17:32:49 -04:00
Conrad Irwin
3a5deb5c6f
Replace isahc with async ureq (#18414)
REplace isahc with ureq everywhere gpui is used.

This should allow us to make http requests without libssl; and avoid a
long-tail of panics caused by ishac.

Release Notes:

- (potentially breaking change) updated our http client

---------

Co-authored-by: Mikayla <mikayla@zed.dev>
2024-10-02 12:30:48 -07:00
Richard Feldman
caaa9a00a9
Remove Qwen2 model (#18444)
Removed deprecated Qwen2 7B Instruct model from zed.dev provider (staff
only).

Release Notes:

- N/A
2024-09-27 13:30:25 -04:00
Piotr Osiewicz
2c8a6ee7cc
remote_server: Remove dependency on libssl and libcrypto (#15446)
Fixes: #15599
Release Notes:

- N/A

---------

Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Conrad <conrad@zed.dev>
2024-09-18 23:29:34 +02:00
Richard Feldman
91ffa02e2c
/auto (#16696)
Add `/auto` behind a feature flag that's disabled for now, even for
staff.

We've decided on a different design for context inference, but there are
parts of /auto that will be useful for that, so we want them in the code
base even if they're unused for now.

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
2024-09-13 13:17:49 -04:00
Piotr Osiewicz
e6c1c51b37
chore: Fix several style lints (#17488)
It's not comprehensive enough to start linting on `style` group, but
hey, it's a start.

Release Notes:

- N/A
2024-09-06 11:58:39 +02:00
Bennet Bo Fenner
f413ea90bf
assistant: Fix Google AI provider not respecting low_speed_timeout_in_seconds (#17423)
Release Notes:

- Fixed an issue when using Google Gemini models, where the setting
`low_speed_timeout_in_seconds` was not respected
2024-09-05 18:16:30 +02:00
Marshall Bowers
30056254f3
collab: Add GET /models endpoint to LLM service (#17307)
This PR adds a `GET /models` endpoint to the LLM service.

This endpoint returns the models that the authenticated user has access
to.

This is the first step towards populating the models for the hosted
service from the server.

Release Notes:

- N/A
2024-09-03 11:41:32 -04:00
Marshall Bowers
d666cc5fba
collab: Report when upstream rate limit is exceeded (#17083)
This PR makes it so we report a trace when the upstream rate limit is
exceeded.

Release Notes:

- N/A
2024-08-29 08:54:45 -04:00