Yehowshua/ZIm - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
Oleksiy Syvokon	3884de937b	assistant: Partial fix for HTML entities in tools params (#32148 ) This problem seems to be specific to Opus 4. Eval shows improvement from 89% to 97%. Closes: https://github.com/zed-industries/zed/issues/32060 Release Notes: - N/A Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>	2025-06-05 10:36:55 +00:00
Danilo Leal	63c1033448	agent: Generate a notification when reaching tool use limit (#31894 ) When reaching the consecutive tool call limit, the agent gets blocked and without a notification, you wouldn't know that. This PR adds the ability to be notified when that happens, and you can use either sound _and_ toast, or just one of them. Release Notes: - agent: Added support for getting notified (via toast and/or sound) when reaching the consecutive tool call limit.	2025-06-02 21:57:42 -03:00
Marshall Bowers	a23ee61a4b	Pass up intent with completion requests (#31710 ) This PR adds a new `intent` field to completion requests to assist in categorizing them correctly. Release Notes: - N/A --------- Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>	2025-05-29 20:43:12 +00:00
Oleksiy Syvokon	cb187b0b4d	evals: Configurable number of max dialog turns (#31680 ) Release Notes: - N/A	2025-05-29 10:35:29 +00:00
Richard Feldman	00fd045844	Make language model deserialization more resilient (#31311 ) This expands our deserialization of JSON from models to be more tolerant of different variations that the model may send, including capitalization, wrapping things in objects vs. being plain strings, etc. Also when deserialization fails, it reports the entire error in the JSON so we can see what failed to deserialize. (Previously these errors were very unhelpful at diagnosing the problem.) Finally, also removes the `WrappedText` variant since the custom deserializer just turns that style of JSON into a normal `Text` variant. Release Notes: - N/A	2025-05-28 12:06:07 -04:00
Marshall Bowers	8faeb34367	Rename `assistant_settings` to `agent_settings` (#31513 ) This PR renames the `assistant_settings` crate to `agent_settings`, as well a number of constructs within it. Release Notes: - N/A	2025-05-27 15:16:55 +00:00
Oleksiy Syvokon	61a40e293d	evals: Allow threads explorer to search for JSON files recursively (#31509 ) It's just more convenient to call it from CLI this way. + minor fixes in evals Release Notes: - N/A	2025-05-27 14:18:47 +00:00
Joseph T. Lyons	c208532693	Use read-only access methods for read-only entity operations (#31479 ) Another follow-up to #31254 Release Notes: - N/A	2025-05-26 23:04:31 -04:00
Oleksiy Syvokon	68a46c3627	evals: Configurable judge model (#31282 ) This is needed for apples-to-apples comparison of different agent models. Another change is that now `cargo -p eval` accepts model names as `provider_id/model_id` instead of separate `--provider` and `--model` params. Release Notes: - N/A	2025-05-23 15:03:09 +00:00
Marshall Bowers	cb52acbf3d	eval: Don't read the model from the user settings (#31230 ) This PR fixes an issue where the eval was incorrectly pulling the provider/model from the user settings, which could cause problems when running certain evals. Was introduced in #30168 due to the restructuring after the removal of the `assistant` crate. Release Notes: - N/A	2025-05-23 00:21:35 +00:00
Marshall Bowers	5c0b161563	Handle new `refusal` stop reason from Claude 4 models (#31217 ) This PR adds support for handling the new [`refusal` stop reason](https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals) from Claude 4 models. <img width="409" alt="Screenshot 2025-05-22 at 4 31 56 PM" src="https://github.com/user-attachments/assets/707b04f5-5a52-4a19-95d9-cbd2be2dd86f" /> Release Notes: - Added handling for `"stop_reason": "refusal"` from Claude 4 models.	2025-05-22 16:56:59 -04:00
Piotr Osiewicz	77dadfedfe	chore: Make terminal_view own the TerminalSlashCommand (#31070 ) This reduces 'touch crates/editor/src/editor.rs && cargo +nightly build' from 8.9s to 8.5s. That same scenario used to take 8s less than a week ago. :) I'm measuring with nightly rustc, because it's compile times are better than those of stable thanks to https://github.com/rust-lang/rust/pull/138522 main (8.2s total): ![image](https://github.com/user-attachments/assets/767a2ac4-7bba-4147-bd16-9b09eed5b433) [cargo-timing.html.zip](https://github.com/user-attachments/files/20364175/cargo-timing.html.zip) #`22be776` (7.5s total): [cargo-timing-20250521T085303.892834Z.html.zip](https://github.com/user-attachments/files/20364391/cargo-timing-20250521T085303.892834Z.html.zip) ![image](https://github.com/user-attachments/assets/c4476df9-cb6e-4403-b0db-de00521f1fd0) Release Notes: - N/A	2025-05-21 09:27:54 +00:00
Kirill Bulatov	16366cf9f2	Use `anyhow` more idiomatically (#31052 ) https://github.com/zed-industries/zed/issues/30972 brought up another case where our context is not enough to track the actual source of the issue: we get a general top-level error without inner error. The reason for this was `.ok_or_else(\|\| anyhow!("failed to read HEAD SHA"))?; ` on the top level. The PR finally reworks the way we use anyhow to reduce such issues (or at least make it simpler to bubble them up later in a fix). On top of that, uses a few more anyhow methods for better readability. * `.ok_or_else(\|\| anyhow!("..."))`, `map_err` and other similar error conversion/option reporting cases are replaced with `context` and `with_context` calls * in addition to that, various `anyhow!("failed to do ...")` are stripped with `.context("Doing ...")` messages instead to remove the parasitic `failed to` text * `anyhow::ensure!` is used instead of `if ... { return Err(...); }` calls * `anyhow::bail!` is used instead of `return Err(anyhow!(...));` Release Notes: - N/A	2025-05-20 23:06:07 +00:00
Richard Feldman	4bb04cef9d	Accept wrapped text content from LLM providers (#31048 ) Some providers sometimes send `{ "type": "text", "text": ... }` instead of just the text as a string. Now we accept those instead of erroring. Release Notes: - N/A	2025-05-20 20:50:02 +00:00
Piotr Osiewicz	a092e2dc03	extension: Add debug_adapters to extension manifest (#30676 ) Also pass worktree to the get_dap_binary. Release Notes: - N/A	2025-05-20 11:01:33 +02:00
Oleksiy Syvokon	6420df3975	eval: Count execution errors as failures (#30712 ) - Evals returning an error (e.g., LLM API format mismatch) were silently skipped in the aggregated results. Now we count them as a failure (0% success score). - Setting the `VERBOSE` environment variable to something non-empty disables string truncation Release Notes: - N/A	2025-05-14 20:44:19 +03:00
Oleksiy Syvokon	255d8f7cf8	agent: Overwrite files more cautiously (#30649 ) 1. The `edit_file` tool tended to use `create_or_overwrite` a bit too often, leading to corruption of long files. This change replaces the boolean flag with an `EditFileMode` enum, which helps Agent make a more deliberate choice when overwriting files. With this change, the pass rate of the new eval increased from 10% to 100%. 2. eval: Added ability to run eval on top of an existing thread. Threads can now be loaded from JSON files in the `SerializedThread` format, which makes it easy to use real threads as starting points for tests/evals. 3. Don't try to restore tool cards when running in headless or eval mode -- we don't have a window to properly do this. Release Notes: - N/A	2025-05-14 10:40:44 +03:00
Richard Feldman	8fdf309a4a	Have read_file support images (#30435 ) This is very basic support for them. There are a number of other TODOs before this is really a first-class supported feature, so not adding any release notes for it; for now, this PR just makes it so that if read_file tries to read a PNG (which has come up in practice), it at least correctly sends it to Anthropic instead of messing up. This also lays the groundwork for future PRs for more first-class support for images in tool calls across more image file formats and LLM providers. Release Notes: - N/A --------- Co-authored-by: Agus Zubiaga <hi@aguz.me> Co-authored-by: Agus Zubiaga <agus@zed.dev>	2025-05-13 10:58:00 +02:00
Richard Feldman	49887d6934	Add no_tools_enabled eval (#30537 ) This is our first eval of the Minimal tool profile. Right now they're all passing; the value of having it is to catch regressions in the system prompt (which has special logic in it for the case where no tools are enabled). Release Notes: - N/A	2025-05-12 08:52:03 +00:00
Max Brunsfeld	65b13968a2	Wait to locate system-installed Node until the shell environment is loaded (#30416 ) Release Notes: - Fixed a race condition that sometimes prevented a system-installed `node` binary from being detected. - Fixed a bug where the `node.path` setting was not respected when invoking npm.	2025-05-09 19:24:28 +00:00
Antonio Scandurra	1b593f616f	Include `EditAgent`'s raw output when inspecting thread (#30337 ) This allows us to debug the raw edits that were generated when people report feedback, when running evals and when opening the thread as Markdown. Release Notes: - Improved debug output for agent threads.	2025-05-09 06:58:45 +00:00
Antonio Scandurra	9f6809a28d	Reuse conversation cache when streaming edits (#30245 ) Release Notes: - Improved latency when the agent applies edits.	2025-05-08 14:36:34 +02:00
Antonio Scandurra	89430a019c	Fix agent reading and editing files over SSH (#30144 ) Release Notes: - Fixed a bug that would prevent the agent from working over SSH. --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Richard Feldman <oss@rtfeldman.com> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Cole Miller <m@cole-miller.net>	2025-05-07 17:07:01 +00:00
Oleksiy Syvokon	ac007139ab	evals: Enable Python LSP (#29987 ) We now have one eval that uses a Python repo Release Notes: - N/A	2025-05-06 10:28:59 +00:00
Cole Miller	c12e6376b8	Terminal tool improvements (#29924 ) WIP - On macOS/Linux, run the command in bash instead of the user's shell - Try to prevent the agent from running commands that expect interaction Release Notes: - Agent Beta: Switched to using `bash` (if available) instead of the user's shell when calling the terminal tool. - Agent Beta: Prevented the agent from hanging when trying to run interactive commands. --------- Co-authored-by: WeetHet <stas.ale66@gmail.com>	2025-05-05 15:57:03 -04:00
Bennet Bo Fenner	9cb5ffac25	context_store: Refactor state management (#29910 ) Because we instantiated `ContextServerManager` both in `agent` and `assistant-context-editor`, and these two entities track the running MCP servers separately, we were effectively running every MCP server twice. This PR moves the `ContextServerManager` into the project crate (now called `ContextServerStore`). The store can be accessed via a project instance. This ensures that we only instantiate one `ContextServerStore` per project. Also, this PR adds a bunch of tests to ensure that the `ContextServerStore` behaves correctly (Previously there were none). Closes #28714 Closes #29530 Release Notes: - N/A	2025-05-05 21:36:12 +02:00
Oleksiy Syvokon	8199664a5a	agent: Handle attempts to use hallucinated tools (#29946 ) This change: 1. Catches attempts to use missing tools. If this happens, we now send Agent a message listing available tools, after which Agent can gracefully recover. Prior behavior: thread would stop in a broken state. Example of a hallucinated call and a message we send back: ![image](https://github.com/user-attachments/assets/92a8f700-b192-4038-8c7e-0a74ca2e0146) 2. Adds evals for hallucinated tool use and imagined edits 3. Adds ability to configure a profile name in evals. Release Notes: - N/A	2025-05-05 19:31:11 +00:00
Marshall Bowers	3db4744e18	agent: Remove unneeded tracking of request usage (#29894 ) This PR removes some unneeded tracking of the model request usage in the `ActiveThread` and `ThreadEvent::UsageUpdated` events. Release Notes: - N/A	2025-05-05 01:16:53 +00:00
Michael Sloan	bb82d9ca82	agent eval: Fix `--model` arg and add `--provider` (#29883 ) Release Notes: - N/A	2025-05-04 13:43:57 -06:00
Max Brunsfeld	c3d9cdecab	Change cloud language model provider JSON protocol to surface errors and usage information (#29830 ) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Marshall Bowers <git@maxdeviant.com>	2025-05-04 17:37:42 +00:00
Antonio Scandurra	4d51602e7b	Encourage editing over re-creating a file from scratch (#29870 ) I also introduced a new eval to prove the encouragement actually makes a difference. Release Notes: - Improved agent behavior when streaming edits, encouraging it to editing files as opposed to creating them from scratch	2025-05-04 13:18:28 +00:00
Agus Zubiaga	64316309aa	agent: Review edits in single-file editors (#29820 ) Enables reviewing agent edits from single-file editors in addition to the multibuffer experience we already had. https://github.com/user-attachments/assets/a2c287f0-51d6-43a1-8537-821498b91983 This feature can be turned off by setting `assistant.single_file_review: false`. Release Notes: - agent: Review edits in single-file editors	2025-05-02 17:57:16 -03:00
Max Brunsfeld	04772bf17d	Add support for queuing status updates in cloud language model provider (#29818 ) This sets us up to display queue position information to the user, once our language model backend is updated to support request queuing. The JSON returned by the LLM backend will need to look like this: ```json {"queue": {"status": "queued", "position": 1}} {"queue": {"status": "started"}} {"event": {"THE_UPSTREAM_MODEL_PROVIDER_EVENT": "..."}} ``` Release Notes: - N/A --------- Co-authored-by: Marshall Bowers <git@maxdeviant.com>	2025-05-02 20:36:39 +00:00
Cole Miller	9547d42b15	Support @-mentions in inline assists and when editing old agent panel messages (#29734 ) Closes #ISSUE Co-authored-by: Bennet <bennet@zed.dev> Release Notes: - Added support for context `@mentions` in the inline prompt editor and when editing past messages in the agent panel. --------- Co-authored-by: Bennet Bo Fenner <bennet@zed.dev> Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>	2025-05-02 20:08:53 +00:00
Richard Feldman	9efc09c5a6	Add eval for open_tool (#29801 ) Also have its description say it should only be used on request Release Notes: - N/A	2025-05-02 15:56:07 +00:00
Bennet Bo Fenner	24eb039752	context servers: Show configuration modal when extension is installed (#29309 ) WIP Release Notes: - N/A --------- Co-authored-by: Danilo Leal <67129314+danilo-leal@users.noreply.github.com> Co-authored-by: Danilo Leal <daniloleal09@gmail.com> Co-authored-by: Marshall Bowers <git@maxdeviant.com> Co-authored-by: Cole Miller <m@cole-miller.net> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-05-01 20:02:14 +02:00
Antonio Scandurra	f891dfb358	Introduce a new `StreamingEditFileTool` (#29733 ) This pull request introduces a new tool for streaming edits. The short-term goal is for this tool to replace the existing `EditFileTool`, but we want to get this out the door as soon as possible so that we can start testing it. `StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It will be enabled by default for anyone who has the `agent-stream-edits` feature flag, as well as people that set `assistant.stream_edits` to `true` in their settings. ### Implementation Streaming is achieved by requesting a completion while the `edit_file` tool gets called. We invoke the model by taking the existing conversation with the agent and appending a prompt specifically tailored for editing. In that prompt, we ask the model to produce a stream of `<old_text>`/`<new_text>` tags. As the model streams text in, we incrementally parse it and start editing as soon as we can. ### Evals Note that, as part of this pull request, I also defined some new evals that I used to drive the behavior of the recursive LLM call. To run them, use this command: ```bash cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output ``` Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro. I recommend running them one at a time, because right now we don't really have a way of orchestrating of all these evals. I think we should invest into that effort once the new agent panel goes live. Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de> Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-05-01 17:37:43 +02:00
Richard Feldman	afeb3d4fd9	Make eval more resilient to bad input from LLM (#29703 ) I saw a slice panic (for begin > end) in a debug build of the eval. This should just be a failed assertion, not a panic that takes out the whole eval run! Release Notes: - N/A	2025-04-30 18:13:45 -04:00
Richard Feldman	04c68dc0cf	Make the default repetitions be 8, and concurrency 4 (#29576 ) This is based on having observed that there is a lot of variation between runs on `n=1` and `n=3`. * With `n=8` two runs on the same branch give answers that seem close enough to be reasonably consistent. * With higher concurrency, trying to run this many repetitions seems to lead language servers to time out a lot, causing evals to fail. Release Notes: - N/A	2025-04-30 15:21:19 -04:00
Richard Feldman	c8685dc90f	Fix eval judging missing final response (#29638 ) Fixed issue where eval thread judges were not considering the last response in the thread. The problem was that they were getting the full list of messages from `last_request`, which (being a request!) did not have the response yet. Release Notes: - N/A	2025-04-29 23:02:46 -04:00
Richard Feldman	d566864891	Make code block eval resilient to indentation (#29633 ) This reduces spurious failures in the eval. Release Notes: - N/A	2025-04-30 02:13:13 +00:00
Richard Feldman	d7004030b3	Code block evals (#29619 ) Add a targeted eval for code block formatting, and revise the system prompt accordingly. ### Eval before, n=8 <img width="728" alt="eval before" src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2" /> ### Eval after prompt change, n=8 (excluding the new evals, so just testing the prompt change) <img width="717" alt="eval after" src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e" /> Release Notes: - N/A	2025-04-29 18:52:09 -04:00
Agus Zubiaga	fd17f2d8ae	agent: Enrich `grep` tool output with syntax information (#29601 ) The `grep` tool used to include 4 lines of context around the match, but the lines included would often be unhelpful. This PR improves this behavior by using the range of the parent syntax node that contains the full line(s) matched. The match headers will also now include symbol breadcrumbs so that the model can already gather code structure before/without reading files. ````md ### impl GitRepository for RealGitRepository › fn compare_checkpoints › L1278-1284 ```rust let result = git .run(&[ "diff-tree", "--quiet", &left.commit_sha.to_string(), &right.commit_sha.to_string(), ]) ``` ```` This positively impacts the `add_arg_to_trait_method` eval example with better diff output, fewer tool failures, and reduced total turns. Note: We have some plans to use a an "elision" approach where we would combine all matches for a given file, skipping lines between them while keeping symbol declaration lines. The theory is that this would be map more closely to the expected input for edits. For now, this PR is a significant improvement. Release Notes: - Agent: Enrich `grep` tool output with syntax information	2025-04-29 17:03:02 +00:00
Danilo Leal	bbe8d6a654	agent: Cancel pending in-edit user message upon new message submit (#29565 ) Previously, if you clicked on a user message to edit it, and then, while the user message has the editor pending, sent a new message via the textarea, the whole thread would be grayed out because we hadn't dismissed the to-be-edited pending user message. That's now fixed. Release Notes: - agent: Fixed a bug that would make the whole thread be grayed out upon sending a new message while a user message had a pending edit.	2025-04-28 18:51:41 -03:00
Marshall Bowers	ce93961fe0	agent: Add "max mode" toggle (#29549 ) This PR adds a "max mode" toggle to the Agent panel, for models that support it. Only visible to folks in the `new-billing` feature flag. Icon is just a placeholder. Release Notes: - N/A	2025-04-28 16:50:47 +00:00
Finn Evers	3a1bd38503	reqwest_client: Only register proxies with valid proxy URIs (#27773 ) Closes #27641 This PR fixes invalid proxy URIs being registered despite the URI not being a valid proxy URI. Whilst investigating #27641 , I noticed that currently any proxy URI passed to `RequestClient::proxy_and_user_agent` will be assigned to the created client, even if the URI is not a valid proxy URI. Given a test as an example: We create an URI here and pass it as a proxy to `ReqwestClient::proxy_and_user_agent`: https://github.com/zed-industries/zed/blob/main/crates/reqwest_client/src/reqwest_client.rs#L272-L273 In `ReqwestClient::proxy_and_user_agent`we take the proxy parameter here `9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L46)` and set it unconditionally here: `9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L62)` , not considering at all whether the proxy was successfully created above. Concluding, we currently do not actually check whether a proxy was successfully created, but rather whether an URI is equal to itself, which trivially holds. The existing test for a malformed proxy URI `9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L293-L297)` does not check whether invalid proxies cause an error, but rather checks whether `http::Uri::from_static` panics on an invalid URI, [which it does as documented](https://docs.rs/http/latest/http/uri/struct.Uri.html#panics). Thus, the tests currently do not really check anything proxy-related and invalid proxies are assigned as valid proxies. --- This PR fixes the behaviour by considering whether the proxy was actually properly parsed and only assigning it if that is the case. Furthermore, it improves logging in case of errors so issues like the linked one are easier to debug (for the linked issue, the log will now include that the proxy schema is not supported in the logs). Lastly, it also updates the test for a malformed proxy URI. The test now actually checks that malformed proxy URIs are not registered for the client rather than testing the `http` crate. The update also initially caused the [test for a `socks4a` proxy](`9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L280C1-L282C50)`) to fail. This happened because the reqwest-library introduced supports for `socks4a` proxies in [version 0.12.13](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md#v01213). Thus, this PR includes a bump of the reqwest library to add proper support for socks4a proxies. Release Notes: - Added support for socks4a proxies. --------- Co-authored-by: Peter Tripp <peter@zed.dev>	2025-04-28 11:12:16 -04:00
tidely	f060918b57	zed: Remove unnecessary clones (#29513 ) `App::http_client` and `Client::http_client` both return an owned `Arc` which it clones internally. This means we can remove unnecessary clones when calling these methods. Release Notes: - N/A	2025-04-27 19:23:37 -07:00
Michael Sloan	609c528ceb	Refactor markdown formatting utilities to avoid building intermediate strings (#29511 ) These were nearly always used when using `format!` / `write!` etc, so it makes sense to not have an intermediate `String`. Release Notes: - N/A	2025-04-27 19:04:51 +00:00
Marshall Bowers	b28756ae3f	eval: Use workspace dependencies (#29430 ) This PR updates the `eval` crate to use workspace dependencies. Also did a bit of cleanup of the `Cargo.toml`. Release Notes: - N/A	2025-04-25 16:11:26 +00:00
Marshall Bowers	a5405fcbd7	eval: Add support for reading from a `.env` file (#29426 ) This PR adds support for the eval to read environment variables from a `.env` file located in the `crates/eval` directory. For instance, you can use it to set your Anthropic API key: ``` ANTHROPIC_API_KEY=<secret> ``` Release Notes: - N/A	2025-04-25 15:53:02 +00:00

1 2 3

101 commits