Yehowshua/ZIm - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
Piotr Osiewicz	8f567383e4	Auto-fix clippy::collapsible_if violations (#36428 ) Release Notes: - N/A	2025-08-19 13:27:24 +00:00
Piotr Osiewicz	9e0e233319	Fix clippy::needless_borrow lint violations (#36444 ) Release Notes: - N/A	2025-08-18 21:54:35 +00:00
Michael Sloan	aedf195e97	Use distinct user agents in agent eval and zeta-cli (#35897 ) Agent eval now also uses a proper Zed version Release Notes: - N/A	2025-08-08 23:26:38 +00:00
Michael Sloan	6052115825	zeta: Add CLI tool for querying edit predictions and related context (#35491 ) Release Notes: - N/A --------- Co-authored-by: Marshall Bowers <git@maxdeviant.com>	2025-08-01 21:08:09 +00:00
Antonio Scandurra	f888f3fc0b	Start separating authentication from connection to collab (#35471 ) This pull request should be idempotent, but lays the groundwork for avoiding to connect to collab in order to interact with AI features provided by Zed. Release Notes: - N/A --------- Co-authored-by: Marshall Bowers <git@maxdeviant.com> Co-authored-by: Richard Feldman <oss@rtfeldman.com>	2025-08-01 17:37:38 +00:00
Marshall Bowers	72d354de6c	Update Agent panel to work with `CloudUserStore` (#35436 ) This PR updates the Agent panel to work with the `CloudUserStore` instead of the `UserStore`, reducing its reliance on being connected to Collab to function. Release Notes: - N/A --------- Co-authored-by: Richard Feldman <oss@rtfeldman.com>	2025-08-01 01:44:43 +00:00
Kirill Bulatov	c6603e4fba	Stop extensions' servers and message loops before removing their files (#34208 ) Fixes an issue that caused Windows to fail when removing extension's directories, as Zed had never stop any related processes. Now: * Zed shuts down and waits until the end when the language servers are shut down * Adds `impl Drop for WasmExtension` where does `self.tx.close_channel();` to stop a receiver loop that holds the "lock" on the extension's work dir. The extension was dropped, but the channel was not closed for some reason. * Does more unregistration to ensure `Arc<WasmExtension>` with the `tx` does not leak further * Tidies up the related errors which had never reported a problematic path before Release Notes: - N/A --------- Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com> Co-authored-by: Smit <smit@zed.dev>	2025-07-10 19:25:10 +00:00
Julia Ryan	0068de0386	debugger: Handle the `envFile` setting for Go (#33666 ) Fixes #32984 Release Notes: - The Go debugger now respects the `envFile` setting.	2025-07-01 09:14:59 -07:00
Bennet Bo Fenner	224de2ec6c	settings: Remove version fields (#33372 ) This cleans up our settings to not include any `version` fields, as we have an actual settings migrator now. This PR removes `language_models > anthropic > version`, `language_models > openai > version` and `agent > version`. We had migration paths in the code for a long time, so in practice almost everyone should be using the latest version of these settings. Release Notes: - Remove `version` fields in settings for `agent`, `language_models > anthropic`, `language_models > openai`. Your settings will automatically be migrated. If you're running into issues with this open an issue [here](https://github.com/zed-industries/zed/issues)	2025-06-25 19:05:29 +02:00
Max Brunsfeld	2283ec5de2	Extract an agent_ui crate from agent (#33284 ) This PR moves the UI-dependent logic in the `agent` crate into its own crate, `agent_ui`. The remaining `agent` crate no longer depends on `editor`, `picker`, `ui`, `workspace`, etc. This has compile time benefits, but the main motivation is to isolate our core agentic logic, so that we can make agents more pluggable/configurable. Release Notes: - N/A	2025-06-23 18:00:28 -07:00
Antonio Scandurra	019a14bcde	Replace `async-watch` with a custom watch (#32245 ) The `async-watch` crate doesn't seem to be maintained and we noticed several panics coming from it, such as: ``` [bug] failed to observe change after notificaton. zed::reliability::init_panic_hook::{{closure}}::hea8cdcb6299fad6b+154543526 std::panicking::rust_panic_with_hook::h33b18b24045abff4+127578547 std::panicking::begin_panic_handler::{{closure}}::hf8313cc2fd0126bc+127577770 std::sys::backtrace::__rust_end_short_backtrace::h57fe07c8aea5c98a+127571385 __rustc[95feac21a9532783]::rust_begin_unwind+127576909 core::panicking::panic_fmt::hd54fb667be51beea+9433328 core::option::expect_failed::h8456634a3dada3e4+9433291 assistant_tools::edit_agent::EditAgent::apply_edit_chunks::{{closure}}::habe2e1a32b267fd4+26921553 gpui::app::async_context::AsyncApp::spawn::{{closure}}::h12f5f25757f572ea+25923441 async_task::raw::RawTask<F,T,S,M>::run::h3cca0d402690ccba+25186815 <gpui::platform::linux::x11::client::X11Client as gpui::platform::linux::platform::LinuxClient>::run::h26264aefbcfbc14b+73961666 gpui::platform::linux::platform::<impl gpui::platform::Platform for P>::run::hb12dcd4abad715b5+73562509 gpui::app::Application::run::h0f936a5f855a3f9f+150676820 zed::main::ha17f9a25fe257d35+154788471 std::sys::backtrace::__rust_begin_short_backtrace::h1edd02429370b2bd+154624579 std::rt::lang_start::{{closure}}::h3d2e300f10059b0a+154264777 std::rt::lang_start_internal::h418648f91f5be3a1+127502049 main+154806636 __libc_start_main+46051972301573 _start+12358494 ``` I didn't find an executor-agnostic watch crate that was well maintained (we already tried postage and async-watch), so decided to implement it our own version. Release Notes: - Fixed a panic that could sometimes occur when the agent performed edits.	2025-06-06 16:00:09 +00:00
Oleksiy Syvokon	68a46c3627	evals: Configurable judge model (#31282 ) This is needed for apples-to-apples comparison of different agent models. Another change is that now `cargo -p eval` accepts model names as `provider_id/model_id` instead of separate `--provider` and `--model` params. Release Notes: - N/A	2025-05-23 15:03:09 +00:00
Marshall Bowers	cb52acbf3d	eval: Don't read the model from the user settings (#31230 ) This PR fixes an issue where the eval was incorrectly pulling the provider/model from the user settings, which could cause problems when running certain evals. Was introduced in #30168 due to the restructuring after the removal of the `assistant` crate. Release Notes: - N/A	2025-05-23 00:21:35 +00:00
Piotr Osiewicz	77dadfedfe	chore: Make terminal_view own the TerminalSlashCommand (#31070 ) This reduces 'touch crates/editor/src/editor.rs && cargo +nightly build' from 8.9s to 8.5s. That same scenario used to take 8s less than a week ago. :) I'm measuring with nightly rustc, because it's compile times are better than those of stable thanks to https://github.com/rust-lang/rust/pull/138522 main (8.2s total): ![image](https://github.com/user-attachments/assets/767a2ac4-7bba-4147-bd16-9b09eed5b433) [cargo-timing.html.zip](https://github.com/user-attachments/files/20364175/cargo-timing.html.zip) #`22be776` (7.5s total): [cargo-timing-20250521T085303.892834Z.html.zip](https://github.com/user-attachments/files/20364391/cargo-timing-20250521T085303.892834Z.html.zip) ![image](https://github.com/user-attachments/assets/c4476df9-cb6e-4403-b0db-de00521f1fd0) Release Notes: - N/A	2025-05-21 09:27:54 +00:00
Kirill Bulatov	16366cf9f2	Use `anyhow` more idiomatically (#31052 ) https://github.com/zed-industries/zed/issues/30972 brought up another case where our context is not enough to track the actual source of the issue: we get a general top-level error without inner error. The reason for this was `.ok_or_else(\|\| anyhow!("failed to read HEAD SHA"))?; ` on the top level. The PR finally reworks the way we use anyhow to reduce such issues (or at least make it simpler to bubble them up later in a fix). On top of that, uses a few more anyhow methods for better readability. * `.ok_or_else(\|\| anyhow!("..."))`, `map_err` and other similar error conversion/option reporting cases are replaced with `context` and `with_context` calls * in addition to that, various `anyhow!("failed to do ...")` are stripped with `.context("Doing ...")` messages instead to remove the parasitic `failed to` text * `anyhow::ensure!` is used instead of `if ... { return Err(...); }` calls * `anyhow::bail!` is used instead of `return Err(anyhow!(...));` Release Notes: - N/A	2025-05-20 23:06:07 +00:00
Piotr Osiewicz	a092e2dc03	extension: Add debug_adapters to extension manifest (#30676 ) Also pass worktree to the get_dap_binary. Release Notes: - N/A	2025-05-20 11:01:33 +02:00
Oleksiy Syvokon	6420df3975	eval: Count execution errors as failures (#30712 ) - Evals returning an error (e.g., LLM API format mismatch) were silently skipped in the aggregated results. Now we count them as a failure (0% success score). - Setting the `VERBOSE` environment variable to something non-empty disables string truncation Release Notes: - N/A	2025-05-14 20:44:19 +03:00
Oleksiy Syvokon	255d8f7cf8	agent: Overwrite files more cautiously (#30649 ) 1. The `edit_file` tool tended to use `create_or_overwrite` a bit too often, leading to corruption of long files. This change replaces the boolean flag with an `EditFileMode` enum, which helps Agent make a more deliberate choice when overwriting files. With this change, the pass rate of the new eval increased from 10% to 100%. 2. eval: Added ability to run eval on top of an existing thread. Threads can now be loaded from JSON files in the `SerializedThread` format, which makes it easy to use real threads as starting points for tests/evals. 3. Don't try to restore tool cards when running in headless or eval mode -- we don't have a window to properly do this. Release Notes: - N/A	2025-05-14 10:40:44 +03:00
Max Brunsfeld	65b13968a2	Wait to locate system-installed Node until the shell environment is loaded (#30416 ) Release Notes: - Fixed a race condition that sometimes prevented a system-installed `node` binary from being detected. - Fixed a bug where the `node.path` setting was not respected when invoking npm.	2025-05-09 19:24:28 +00:00
Oleksiy Syvokon	ac007139ab	evals: Enable Python LSP (#29987 ) We now have one eval that uses a Python repo Release Notes: - N/A	2025-05-06 10:28:59 +00:00
Bennet Bo Fenner	9cb5ffac25	context_store: Refactor state management (#29910 ) Because we instantiated `ContextServerManager` both in `agent` and `assistant-context-editor`, and these two entities track the running MCP servers separately, we were effectively running every MCP server twice. This PR moves the `ContextServerManager` into the project crate (now called `ContextServerStore`). The store can be accessed via a project instance. This ensures that we only instantiate one `ContextServerStore` per project. Also, this PR adds a bunch of tests to ensure that the `ContextServerStore` behaves correctly (Previously there were none). Closes #28714 Closes #29530 Release Notes: - N/A	2025-05-05 21:36:12 +02:00
Michael Sloan	bb82d9ca82	agent eval: Fix `--model` arg and add `--provider` (#29883 ) Release Notes: - N/A	2025-05-04 13:43:57 -06:00
Richard Feldman	9efc09c5a6	Add eval for open_tool (#29801 ) Also have its description say it should only be used on request Release Notes: - N/A	2025-05-02 15:56:07 +00:00
Bennet Bo Fenner	24eb039752	context servers: Show configuration modal when extension is installed (#29309 ) WIP Release Notes: - N/A --------- Co-authored-by: Danilo Leal <67129314+danilo-leal@users.noreply.github.com> Co-authored-by: Danilo Leal <daniloleal09@gmail.com> Co-authored-by: Marshall Bowers <git@maxdeviant.com> Co-authored-by: Cole Miller <m@cole-miller.net> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-05-01 20:02:14 +02:00
Antonio Scandurra	f891dfb358	Introduce a new `StreamingEditFileTool` (#29733 ) This pull request introduces a new tool for streaming edits. The short-term goal is for this tool to replace the existing `EditFileTool`, but we want to get this out the door as soon as possible so that we can start testing it. `StreamingEditFileTool` is mutually exclusive with `EditFileTool`. It will be enabled by default for anyone who has the `agent-stream-edits` feature flag, as well as people that set `assistant.stream_edits` to `true` in their settings. ### Implementation Streaming is achieved by requesting a completion while the `edit_file` tool gets called. We invoke the model by taking the existing conversation with the agent and appending a prompt specifically tailored for editing. In that prompt, we ask the model to produce a stream of `<old_text>`/`<new_text>` tags. As the model streams text in, we incrementally parse it and start editing as soon as we can. ### Evals Note that, as part of this pull request, I also defined some new evals that I used to drive the behavior of the recursive LLM call. To run them, use this command: ```bash cargo test --package=assistant_tools --features eval -- eval_extract_handle_command_output ``` Or comment out the `#[cfg_attr(not(feature = "eval"), ignore)]` macro. I recommend running them one at a time, because right now we don't really have a way of orchestrating of all these evals. I think we should invest into that effort once the new agent panel goes live. Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de> Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>	2025-05-01 17:37:43 +02:00
Richard Feldman	04c68dc0cf	Make the default repetitions be 8, and concurrency 4 (#29576 ) This is based on having observed that there is a lot of variation between runs on `n=1` and `n=3`. * With `n=8` two runs on the same branch give answers that seem close enough to be reasonably consistent. * With higher concurrency, trying to run this many repetitions seems to lead language servers to time out a lot, causing evals to fail. Release Notes: - N/A	2025-04-30 15:21:19 -04:00
Finn Evers	3a1bd38503	reqwest_client: Only register proxies with valid proxy URIs (#27773 ) Closes #27641 This PR fixes invalid proxy URIs being registered despite the URI not being a valid proxy URI. Whilst investigating #27641 , I noticed that currently any proxy URI passed to `RequestClient::proxy_and_user_agent` will be assigned to the created client, even if the URI is not a valid proxy URI. Given a test as an example: We create an URI here and pass it as a proxy to `ReqwestClient::proxy_and_user_agent`: https://github.com/zed-industries/zed/blob/main/crates/reqwest_client/src/reqwest_client.rs#L272-L273 In `ReqwestClient::proxy_and_user_agent`we take the proxy parameter here `9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L46)` and set it unconditionally here: `9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L62)` , not considering at all whether the proxy was successfully created above. Concluding, we currently do not actually check whether a proxy was successfully created, but rather whether an URI is equal to itself, which trivially holds. The existing test for a malformed proxy URI `9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L293-L297)` does not check whether invalid proxies cause an error, but rather checks whether `http::Uri::from_static` panics on an invalid URI, [which it does as documented](https://docs.rs/http/latest/http/uri/struct.Uri.html#panics). Thus, the tests currently do not really check anything proxy-related and invalid proxies are assigned as valid proxies. --- This PR fixes the behaviour by considering whether the proxy was actually properly parsed and only assigning it if that is the case. Furthermore, it improves logging in case of errors so issues like the linked one are easier to debug (for the linked issue, the log will now include that the proxy schema is not supported in the logs). Lastly, it also updates the test for a malformed proxy URI. The test now actually checks that malformed proxy URIs are not registered for the client rather than testing the `http` crate. The update also initially caused the [test for a `socks4a` proxy](`9b40770e9f/crates/reqwest_client/src/reqwest_client.rs (L280C1-L282C50)`) to fail. This happened because the reqwest-library introduced supports for `socks4a` proxies in [version 0.12.13](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md#v01213). Thus, this PR includes a bump of the reqwest library to add proper support for socks4a proxies. Release Notes: - Added support for socks4a proxies. --------- Co-authored-by: Peter Tripp <peter@zed.dev>	2025-04-28 11:12:16 -04:00
tidely	f060918b57	zed: Remove unnecessary clones (#29513 ) `App::http_client` and `Client::http_client` both return an owned `Arc` which it clones internally. This means we can remove unnecessary clones when calling these methods. Release Notes: - N/A	2025-04-27 19:23:37 -07:00
Marshall Bowers	a5405fcbd7	eval: Add support for reading from a `.env` file (#29426 ) This PR adds support for the eval to read environment variables from a `.env` file located in the `crates/eval` directory. For instance, you can use it to set your Anthropic API key: ``` ANTHROPIC_API_KEY=<secret> ``` Release Notes: - N/A	2025-04-25 15:53:02 +00:00
Oleksiy Syvokon	3389327df5	eval: Add HTML overview for evaluation runs (#29413 ) This update generates a single self-contained .html file that shows an overview of evaluation threads in the browser. It's useful for: - Quickly reviewing results - Sharing evaluation runs - Debugging - Comparing models (TBD) Features: - Export thread JSON from the UI - Keyboard navigation (j/k or Ctrl + ←/→) - Toggle between compact and full views Generating the overview: - `cargo run -p eval` will write this file in the run dir's root. - Or you can call `cargo run -p eval --bin explorer` to generate it without running evals. Screenshot: ![image](https://github.com/user-attachments/assets/4ead71f6-da08-48ea-8fcb-2148d2e4b4db) Release Notes: - N/A	2025-04-25 17:49:05 +03:00
Agus Zubiaga	ce1a674eba	eval: Fine-grained assertions (#29246 ) - Support programmatic examples ([example](`17feb260a0/crates/eval/src/examples/file_search.rs`)) - Combine data-driven example declarations into a single `.toml` file ([example](`17feb260a0/crates/eval/src/examples/find_and_replace_diff_card.toml`)) - Run judge on individual assertions (previously called "criteria") - Report judge and programmatic assertions in one combined table Note: We still need to work on concept naming <img width=400 src="https://github.com/user-attachments/assets/fc719c93-467f-412b-8d47-68821bd8a5f5"> Release Notes: - N/A --------- Co-authored-by: Richard Feldman <oss@rtfeldman.com> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>	2025-04-22 23:58:58 -03:00
Max Brunsfeld	36d02de784	Rework eval to support interpretable scores and efficient repetitions (#29197 ) ### Problem We want to start continuously tracking our progress on agent evals over time. As part of this, we'd like the score to have a clear, interpretable meaning. Right now, it's a number from 0 to 5, but it's not clear what any particular number works. In addition, scores vary widely from run to run, because the agent's output is deterministic. We try to stabilize the score using a panel of judges, but the behavior of the agent itself varies much more widely than the judges' scores for a given run. ### Solution * explicit meanings of scores - In this PR, we're prescribing the diff and thread criteria files so that they must be unordered lists of assertions. For both the thread and the diff, rather than providing an abstract score, the judge's task is simply to count how many of these assertions are satisfied. A percentage score can be derived from this number, divided by the total number of assertions. * repetitions - Rather than running each example once, and judging it N times, we'll run the example N times. Right now, I'm just judging the output once per run, because I believe that with these more clear scoring criteria, the main source of non-determinism will be the agent's behavior, not the judge's ### Questions * accounting for diagnostic errors - Previously, the judge was asked to incorporate diagnostics into their abstract scores. Now that the "score" is determined directly from the criteria, the diagnostic will not be captured in the score. How should the diagnostics be accounted for in the eval? One thought is - let's simply count and report the number of errors remaining after the agent finishes, as a separate field of the run (along with diff score and thread score). We could consider normalizing it using the total lines of added code (like errors per 100 lines of code added) in order to give it some semblance of stability between examples. * repetitions - How many repetitions should we run on CI? Each repetition takes significant time, but I think running more than one repetition will make the scores significantly less volatile. ### Todo * [x] Fix `--concurrency` implementation so that only N tasks are spawned * [x] Support `--repetitions` efficiently (re-using the same worktree) * [x] Restructure judge prompts to count passing criteria, not compute abstract score * [x] Report total number of diagnostics in some way * [x] Format output nicely Release Notes: - N/A or Added/Fixed/Improved ... --------- Co-authored-by: Antonio Scandurra <me@as-cii.com>	2025-04-22 14:00:09 +00:00
Nathan Sobo	458ffaa134	Add new action to run agent eval (#29158 ) The old one wasn't linking, and https://github.com/zed-industries/zed/pull/29081 has a bunch of merge conflicts. Wanted to start simple/small. ## Todo * [x] Remove low-signal examples * [x] Make the eval run on a cron, on main, and on any PR with the `run-eval` label * [x] Noise in logs about failure to write settings ``` [2025-04-21T20:45:04Z ERROR settings] Failed to write settings to file "/home/runner/.config/zed/settings.json" Caused by: No such file or directory (os error 2) at path "/home/runner/.config/zed/.tmpLewFEs" ``` * [x] `Agentic loop stalled` (https://github.com/zed-industries/zed/actions/runs/14581044243/job/40897622894) * [x] Make sure that events are recorded in snowflake * [ ] Change judge criteria to be more explicit about meanings of scores Release Notes: - N/A --------- Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Agus Zubiaga <hi@aguz.me> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>	2025-04-21 21:30:21 -07:00
Michael Sloan	70c51b513b	agent eval: Default to also running typescript examples (#29185 ) Release Notes: - N/A	2025-04-21 23:59:35 +00:00
Antonio Scandurra	97ab0980d1	Start tracking tool failure rates in eval (#29122 ) This pull request will print all the used tools and their failure rates. The objective goal should be to minimize that failure rate. @tmickleydoyle: this also changes the telemetry event to report `tool_metrics` as opposed to `tool_use_counts`. Ideally I'd love to be able to plot failure rates by tool and hopefully see that percentage go down. Can we do that with the data we're tracking with this pull request? Release Notes: - N/A	2025-04-21 16:16:43 +02:00
Michael Sloan	d88b06a5dc	Simplify language model registry + only emit change events on change (#29086 ) * Now only does default fallback logic in the registry * Only emits change events when there is actually a change Release Notes: - N/A	2025-04-19 08:26:42 +00:00
Nathan Sobo	bab28560ef	Systematically optimize agentic editing performance (#28961 ) Now that we've established a proper eval in tree, this PR is reboots of our agent loop back to a set of minimal tools and simpler prompts. We should aim to get this branch feeling subjectively competitive with what's on main and then merge it, and build from there. Let's invest in our eval and use it to drive better performance of the agent loop. How you can help: Pick an example, and then make the outcome faster or better. It's fine to even use your own subjective judgment, as our evaluation criteria likely need tuning as well at this point. Focus on making the agent work better in your own subjective experience first. Let's focus on simple/practical improvements to make this thing work better, then determine how we can craft our judgment criteria to lock those improvements in. Release Notes: - N/A --------- Co-authored-by: Max <max@zed.dev> Co-authored-by: Antonio <antonio@zed.dev> Co-authored-by: Agus <agus@zed.dev> Co-authored-by: Richard <richard@zed.dev> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Michael Sloan <mgsloan@gmail.com>	2025-04-19 02:47:59 +00:00
Michael Sloan	327fee4d22	Init prompt store in agent eval (#29068 ) Needed after #28915 Release Notes: - N/A	2025-04-18 20:06:34 +00:00
Thomas Mickley-Doyle	8de53bd89f	agent: Add git commit ID to the eval telemetry data (#28895 ) Release Notes: - N/A	2025-04-16 14:13:43 -05:00
Michael Sloan	9a9f2e71ca	Agent Eval: Initial support for running examples repeatedly (#28844 ) Not ideal as it creates a separate worktree for each repetition Release Notes: - N/A	2025-04-16 06:35:55 +00:00
Michael Sloan	609895d95f	Agent Eval: bounded concurrency (#28843 ) Release Notes: - N/A	2025-04-16 00:05:46 -06:00
Thomas Mickley-Doyle	222d4a2546	agent: Add telemetry for eval runs (#28816 ) Release Notes: - N/A --------- Co-authored-by: Joseph <joseph@zed.dev>	2025-04-16 02:54:26 +00:00
Michael Sloan	102ea6ac79	Add support for judge repetitions in eval (#28811 ) Release Notes: - N/A --------- Co-authored-by: Thomas <thomas@zed.dev>	2025-04-15 23:18:02 +00:00
Agus Zubiaga	0182e09e33	eval: Do not create run files for skipped examples (#28800 ) Release Notes: - N/A	2025-04-15 18:00:04 +00:00
Agus Zubiaga	ff4334efc7	eval: Fix stalling on tool confirmation (#28786 ) The `always_allow_tool_actions` setting would get overridden with the default when we loaded each example project, leading to examples stalling when they run a tool that needed confirmation. There's now a separate `runner_settings.json` file where we can configure the environment for the eval. Release Notes: - N/A --------- Co-authored-by: Oleksiy <oleksiy@zed.dev>	2025-04-15 16:53:45 +00:00
Agus Zubiaga	e4cf7fe8f5	eval: Improve readability with colors and alignment (#28761 ) ![CleanShot 2025-04-15 at 10 35 39@2x](https://github.com/user-attachments/assets/495d96fb-fe2f-478b-a9d6-678c1184db9a) Release Notes: - N/A	2025-04-15 13:50:01 +00:00
Michael Sloan	0d6e455bf6	Agent eval: output paths to log files at the end (#28724 ) Release Notes: - N/A	2025-04-14 23:04:07 +00:00
Thomas Mickley-Doyle	d74f0735c2	Add more eval examples + filtering examples by language + fix git concurrent usage (#28719 ) Release Notes: - N/A --------- Co-authored-by: michael <michael@zed.dev> Co-authored-by: agus <agus@zed.dev>	2025-04-14 22:05:46 +00:00
Michael Sloan	6b80eb556c	Add judge to new eval + provide LSP diagnostics (#28713 ) Release Notes: - N/A --------- Co-authored-by: Antonio Scandurra <antonio@zed.dev> Co-authored-by: agus <agus@zed.dev>	2025-04-14 20:18:47 +00:00
Antonio Scandurra	2440faf4b2	Actually run the eval and fix a hang when retrieving outline (#28547 ) Release Notes: - Fixed a regression that caused the agent to hang sometimes. --------- Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com> Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Michael Sloan <mgsloan@gmail.com>	2025-04-11 00:01:33 +00:00

1 2

51 commits