Yehowshua/ZIm - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
Oleksiy Syvokon	2edf85f054	evals: Switch disable_cursor_blinking to determenistic asserts (#34398 ) Release Notes: - N/A	2025-07-14 11:26:15 +00:00
Bennet Bo Fenner	41fe2a2ab4	agent: Disable thinking when using inline assistant/edit file tool (#34141 ) This introduces a new field `thinking_allowed` on `LanguageModelRequest` which lets us control whether thinking should be enabled if the model supports it. We permit thinking in the Inline Assistant, Edit File tool and the Git Commit message generator, this should make generation faster when using a thinking model, e.g. `claude-sonnet-4-thinking` Release Notes: - N/A	2025-07-09 18:05:39 +00:00
Michael Sloan	d497f52e17	agent: Improve error handling and retry for zed-provided models (#33565 ) * Updates to `zed_llm_client-0.8.5` which adds support for `retry_after` when anthropic provides it. * Distinguishes upstream provider errors and rate limits from errors that originate from zed's servers * Moves `LanguageModelCompletionError::BadInputJson` to `LanguageModelCompletionEvent::ToolUseJsonParseError`. While arguably this is an error case, the logic in thread is cleaner with this move. There is also precedent for inclusion of errors in the event type - `CompletionRequestStatus::Failed` is how cloud errors arrive. * Updates `PROVIDER_ID` / `PROVIDER_NAME` constants to use proper types instead of `&str`, since they can be constructed in a const fashion. * Removes use of `CLIENT_SUPPORTS_EXA_WEB_SEARCH_PROVIDER_HEADER_NAME` as the server no longer reads this header and just defaults to that behavior. Release notes for this is covered by #33275 Release Notes: - N/A --------- Co-authored-by: Richard Feldman <oss@rtfeldman.com> Co-authored-by: Richard <richard@zed.dev>	2025-06-30 21:01:32 -06:00
Bennet Bo Fenner	224de2ec6c	settings: Remove version fields (#33372 ) This cleans up our settings to not include any `version` fields, as we have an actual settings migrator now. This PR removes `language_models > anthropic > version`, `language_models > openai > version` and `agent > version`. We had migration paths in the code for a long time, so in practice almost everyone should be using the latest version of these settings. Release Notes: - Remove `version` fields in settings for `agent`, `language_models > anthropic`, `language_models > openai`. Your settings will automatically be migrated. If you're running into issues with this open an issue [here](https://github.com/zed-industries/zed/issues)	2025-06-25 19:05:29 +02:00
Richard Feldman	c610ebfb03	Thread Anthropic errors into LanguageModelKnownError (#33261 ) This PR is in preparation for doing automatic retries for certain errors, e.g. Overloaded. It doesn't change behavior yet (aside from some granularity of error messages shown to the user), but rather mostly changes some error handling to be exhaustive enum matches instead of `anyhow` downcasts, and leaves some comments for where the behavior change will be in a future PR. Release Notes: - N/A	2025-06-23 18:48:26 +00:00
Oleksiy Syvokon	fceba6c795	edit_file: Add diff-fenced output format (#32737 ) This format is enabled for Google models as they seem to prefer it. A relevant unit eval's pass rate has increased from 0.77 to 0.98. Diff-fenced format looks like this (markdown fences and a line hint are optional): ```diff <<<<<<< SEARCH line=42 ... ======= ... >>>>>>> REPLACE ``` Release Notes: - Agent: Gemini models now use the diff-fenced format when making edits	2025-06-16 14:28:18 +00:00
Max Brunsfeld	a994666888	Include full abs paths of worktrees in system prompt (#32725 ) Some MCP servers expose tools that take absolute paths as arguments. To interact with these, the agent needs to know the absolute path to the project directories, not just their names. This PR changes the system prompt to include the full path to each worktree, and updates some tool descriptions to reflect this. Todo: * [x] Run evals, make sure assistant still understand how to specify paths for tools, now that we include abs paths in the system prompt. Release Notes: - Improved the agent's ability to use MPC tools that require absolute paths to files and directories in the project. --------- Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>	2025-06-15 15:45:26 +02:00
Oleksiy Syvokon	5d293ae8ac	edit_file: Let agent specify locations of edit chunks (#32628 ) These changes help the agent edit files when `<old_text>` matches more than one location. First, the agent can specify an optional `<old_text line=XX>` parameter. When this is provided and multiple matches exist, we use this hint to identify the best match. Second, when there is ambiguity in matches, we now return the agent a more helpful message listing the line numbers of all possible matches. Together, these changes should reduce the number of misplaced edits and agent confusion. I have ensured the LLM Worker works with these prompt changes. Release Notes: - Agent: Improved locating edits	2025-06-14 09:59:30 +03:00
Ben Brandt	2ecc24eb26	eval: Add jitter to retry attempts (#32542 ) Adds some jitter to avoid the issue that all requests will retry at roughly the same time in eval where we have a lot of concurrent requests. Release Notes: - N/A	2025-06-11 12:56:23 +00:00
Ben Brandt	e4bd115a63	More resilient eval (#32257 ) Bubbles up rate limit information so that we can retry after a certain duration if needed higher up in the stack. Also caps the number of concurrent evals running at once to also help. Release Notes: - N/A	2025-06-09 18:07:22 +00:00
Ben Brandt	ddf70b3bb8	Add mismatched tag threshold parameter to eval function (#32190 ) Replace hardcoded 0.10 threshold with configurable parameter and set 0.05 default for most tests, with 0.2 for from_pixels_constructor eval that produces more mismatched tags. Release Notes: - N/A	2025-06-05 21:30:05 +00:00
Oleksiy Syvokon	04cd3fcd23	google: Add latest versions of Gemini 2.5 Pro and Flash Preview (#32183 ) Release Notes: - Added the latest versions of Gemini 2.5 Pro and Flash Preview	2025-06-05 19:30:34 +00:00
Ben Brandt	dda614091a	eval: Add eval unit tests as a CI job (#32152 ) We run the unit evals once a day in the middle of the night, and trigger a Slack post if it fails. Release Notes: - N/A --------- Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-06-05 13:16:27 +00:00
Oleksiy Syvokon	6253b95f82	agent: Fix creating files with Gemini (#31439 ) This change instructs models to wrap new file content in Markdown fences and introduces a parser for this format. The reasons are: 1. This is the format we put a lot of effort into explaining in the system prompt. 2. Gemini really prefers to do it. 3. It adds an option for a model to think before writing the content The `eval_zode` pass rate for GEmini models goes from 0% to 100%. Other models were already at 100%, this hasn't changed. Release Notes: - N/A	2025-05-26 16:36:21 +00:00
Oleksiy Syvokon	ab017129d8	agent: Improve Gemini support in the edit_file tool (#31116 ) This change improves `eval_extract_handle_command_output` results for all models: Model \| Pass rate before \| Pass rate after ----------------------------\|------------------\|---------------- claude-3.7-sonnet \| 0.96 \| 0.98 gemini-2.5-pro \| 0.35 \| 0.86 gpt-4.1 \| 0.81 \| 1.00 Part of this improvement comes from more robust evaluation, which now accepts multiple possible outcomes. Another part is from the prompt adaptation: addressing common Gemini failure modes, adding a few-shot example, and, in the final commit, auto-rewriting instructions for clarity and conciseness. This change still needs validation from larger end-to-end evals. Release Notes: - N/A	2025-05-22 12:01:43 +03:00
Oleksiy Syvokon	91bc5aefa4	evals: Add system prompt to edit agent evals + fix edit agent (#31082 ) 1. Add system prompt: this is how it's called from threads. Previously, we were sending 2. Fix an issue with writing agent thought into a newly created empty file. Release Notes: - N/A --------- Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com>	2025-05-21 10:14:58 +00:00
Kirill Bulatov	16366cf9f2	Use `anyhow` more idiomatically (#31052 ) https://github.com/zed-industries/zed/issues/30972 brought up another case where our context is not enough to track the actual source of the issue: we get a general top-level error without inner error. The reason for this was `.ok_or_else(\|\| anyhow!("failed to read HEAD SHA"))?; ` on the top level. The PR finally reworks the way we use anyhow to reduce such issues (or at least make it simpler to bubble them up later in a fix). On top of that, uses a few more anyhow methods for better readability. * `.ok_or_else(\|\| anyhow!("..."))`, `map_err` and other similar error conversion/option reporting cases are replaced with `context` and `with_context` calls * in addition to that, various `anyhow!("failed to do ...")` are stripped with `.context("Doing ...")` messages instead to remove the parasitic `failed to` text * `anyhow::ensure!` is used instead of `if ... { return Err(...); }` calls * `anyhow::bail!` is used instead of `return Err(anyhow!(...));` Release Notes: - N/A	2025-05-20 23:06:07 +00:00
Oleksiy Syvokon	5e5a124ae1	evals: Eval for creating an empty file (#31034 ) This eval checks that Edit Agent can create an empty file without writing its thoughts into it. This issue is not specific to empty files, but it's easier to reproduce with them. For some mysterious reason, I could easily reproduce this issue roughly 90% of the time in actual Zed. However, once I extract the exact LLM request before the failure point and generate from that, the reproduction rate drops to 2%! Things I've tried to make sure it's not a fluke: disabling prompt caching, capturing the LLM request via a proxy server, running the prompt on Claude separately from evals. Every time it was mostly giving good outcomes, which doesn't match my actual experience in Zed. At some point I discovered that simply adding one insignificant space or a newline to the prompt suddenly results in an outcome I tried to reproduce almost perfectly. This weirdness happens even outside the Zed code base and even when using a different subscription. The result is the same: an extra newline or space changes the model behavior significantly enough, so that the pass rate drops from 99% to 0-3% I have no explanation to this. Release Notes: - N/A	2025-05-20 20:03:08 +03:00
Oleksiy Syvokon	5112fcebeb	evals: Make LLMs configurable in edit_agent evals (#30813 ) Release Notes: - N/A	2025-05-16 11:10:15 +00:00
Oleksiy Syvokon	255d8f7cf8	agent: Overwrite files more cautiously (#30649 ) 1. The `edit_file` tool tended to use `create_or_overwrite` a bit too often, leading to corruption of long files. This change replaces the boolean flag with an `EditFileMode` enum, which helps Agent make a more deliberate choice when overwriting files. With this change, the pass rate of the new eval increased from 10% to 100%. 2. eval: Added ability to run eval on top of an existing thread. Threads can now be loaded from JSON files in the `SerializedThread` format, which makes it easy to use real threads as starting points for tests/evals. 3. Don't try to restore tool cards when running in headless or eval mode -- we don't have a window to properly do this. Release Notes: - N/A	2025-05-14 10:40:44 +03:00
Richard Feldman	8fdf309a4a	Have read_file support images (#30435 ) This is very basic support for them. There are a number of other TODOs before this is really a first-class supported feature, so not adding any release notes for it; for now, this PR just makes it so that if read_file tries to read a PNG (which has come up in practice), it at least correctly sends it to Anthropic instead of messing up. This also lays the groundwork for future PRs for more first-class support for images in tool calls across more image file formats and LLM providers. Release Notes: - N/A --------- Co-authored-by: Agus Zubiaga <hi@aguz.me> Co-authored-by: Agus Zubiaga <agus@zed.dev>	2025-05-13 10:58:00 +02:00
Antonio Scandurra	1b593f616f	Include `EditAgent`'s raw output when inspecting thread (#30337 ) This allows us to debug the raw edits that were generated when people report feedback, when running evals and when opening the thread as Markdown. Release Notes: - Improved debug output for agent threads.	2025-05-09 06:58:45 +00:00
Antonio Scandurra	9f6809a28d	Reuse conversation cache when streaming edits (#30245 ) Release Notes: - Improved latency when the agent applies edits.	2025-05-08 14:36:34 +02:00
Antonio Scandurra	89430a019c	Fix agent reading and editing files over SSH (#30144 ) Release Notes: - Fixed a bug that would prevent the agent from working over SSH. --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Richard Feldman <oss@rtfeldman.com> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Cole Miller <m@cole-miller.net>	2025-05-07 17:07:01 +00:00
Mikayla Maki	0cdd8bdded	Restore tool cards on thread deserialization (#30053 ) Release Notes: - N/A --------- Co-authored-by: Julia Ryan <juliaryan3.14@gmail.com>	2025-05-06 18:16:34 -07:00
Antonio Scandurra	07e6e49583	Add new editing eval scenario and improve it substantially (#29997 ) This improves the new eval scenario by ~80% (`0.29` vs `0.525`) without decreasing performance in the other evals. Release Notes: - Improved the performance of the `edit_file` tool.	2025-05-06 12:22:42 +00:00
Antonio Scandurra	210c338df4	Restore original file content when rejecting an overwritten file (#29974 ) Release Notes: - Fixed a bug that would cause rejecting a hunk from the agent to delete the file if the agent had decided to rewrite that file from scratch.	2025-05-06 07:05:55 +00:00
Antonio Scandurra	545ae27079	Add the ability to follow the agent as it makes edits (#29839 ) Nathan here: I also tacked on a bunch of UI refinement. Release Notes: - Introduced the ability to follow the agent around as it reads and edits files. --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>	2025-05-04 08:28:39 +00:00
Antonio Scandurra	35539847a4	Allow `StreamingEditFileTool` to also create files (#29785 ) Refs #29733 This pull request introduces a new field to the `StreamingEditFileTool` that lets the model create or overwrite a file in a streaming way. When one of the `assistant.stream_edits` setting / `agent-stream-edits` feature flag is enabled, we are going to disable the `CreateFileTool` so that the agent model can only use `StreamingEditFileTool` for file creation. Release Notes: - N/A --------- Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com> Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-05-02 09:57:04 +00:00
Antonio Scandurra	f891dfb358	Introduce a new `StreamingEditFileTool` (#29733 ) This pull request introduces a new tool for streaming edits. The short-term goal is for this tool to replace the existing `EditFileTool`, but we want to get this out the door as soon as possible so that we can start testing it. `StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It will be enabled by default for anyone who has the `agent-stream-edits` feature flag, as well as people that set `assistant.stream_edits` to `true` in their settings. ### Implementation Streaming is achieved by requesting a completion while the `edit_file` tool gets called. We invoke the model by taking the existing conversation with the agent and appending a prompt specifically tailored for editing. In that prompt, we ask the model to produce a stream of `<old_text>`/`<new_text>` tags. As the model streams text in, we incrementally parse it and start editing as soon as we can. ### Evals Note that, as part of this pull request, I also defined some new evals that I used to drive the behavior of the recursive LLM call. To run them, use this command: ```bash cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output ``` Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro. I recommend running them one at a time, because right now we don't really have a way of orchestrating of all these evals. I think we should invest into that effort once the new agent panel goes live. Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de> Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-05-01 17:37:43 +02:00

30 commits