Zed Improved. Aiming to improve upon Zed and make a truly delightful code editor. https://zed.dev

Find a file

Max Brunsfeld 36d02de784 Rework eval to support interpretable scores and efficient repetitions (#29197 ) ### Problem We want to start continuously tracking our progress on agent evals over time. As part of this, we'd like the score to have a clear, interpretable meaning. Right now, it's a number from 0 to 5, but it's not clear what any particular number works. In addition, scores vary widely from run to run, because the agent's output is deterministic. We try to stabilize the score using a panel of judges, but the behavior of the agent itself varies much more widely than the judges' scores for a given run. ### Solution * explicit meanings of scores - In this PR, we're prescribing the diff and thread criteria files so that they must be unordered lists of assertions. For both the thread and the diff, rather than providing an abstract score, the judge's task is simply to count how many of these assertions are satisfied. A percentage score can be derived from this number, divided by the total number of assertions. * repetitions - Rather than running each example once, and judging it N times, we'll run the example N times. Right now, I'm just judging the output once per run, because I believe that with these more clear scoring criteria, the main source of non-determinism will be the agent's behavior, not the judge's ### Questions * accounting for diagnostic errors - Previously, the judge was asked to incorporate diagnostics into their abstract scores. Now that the "score" is determined directly from the criteria, the diagnostic will not be captured in the score. How should the diagnostics be accounted for in the eval? One thought is - let's simply count and report the number of errors remaining after the agent finishes, as a separate field of the run (along with diff score and thread score). We could consider normalizing it using the total lines of added code (like errors per 100 lines of code added) in order to give it some semblance of stability between examples. * repetitions - How many repetitions should we run on CI? Each repetition takes significant time, but I think running more than one repetition will make the scores significantly less volatile. ### Todo * [x] Fix `--concurrency` implementation so that only N tasks are spawned * [x] Support `--repetitions` efficiently (re-using the same worktree) * [x] Restructure judge prompts to count passing criteria, not compute abstract score * [x] Report total number of diagnostics in some way * [x] Format output nicely Release Notes: - N/A or Added/Fixed/Improved ... --------- Co-authored-by: Antonio Scandurra <me@as-cii.com>		2025-04-22 14:00:09 +00:00
.cargo	nix: Clean up build (#27881 )	2025-04-01 22:35:15 +00:00
.cloudflare	docs: Document context servers (#21170 )	2024-11-25 11:05:14 -05:00
.config	workspace-hack: remove openssl from remote_server (#27990 )	2025-04-03 00:49:07 +00:00
.github	Rework eval to support interpretable scores and efficient repetitions (#29197 )	2025-04-22 14:00:09 +00:00
.zed	Add eval worktrees and repos to file_scan_exclusions in zed project settings (#29106 )	2025-04-19 23:43:54 +00:00
assets	agent: Support pasting images as context (#29177 )	2025-04-22 09:01:01 +00:00
crates	Rework eval to support interpretable scores and efficient repetitions (#29197 )	2025-04-22 14:00:09 +00:00
docs	editor: Improve selection highlights speed (#29097 )	2025-04-20 01:20:36 +05:30
extensions	html: Improve syntax highlighting (#28184 )	2025-04-17 13:40:56 -04:00
legal	legal: Terms of Use (2025-02-13) (#24803 )	2025-02-13 09:41:43 -05:00
nix	nix: Add libX11 dependency for X11 support (#28938 )	2025-04-17 12:58:45 -07:00
script	Fix reset_db script (#29067 )	2025-04-18 19:28:14 +00:00
tooling	agent: Attach thread ID and prompt ID to telemetry events (#29069 )	2025-04-18 20:41:02 +00:00
.clinerules	Initial `.rules` file for agent with symlinks for other rules file paths (#29014 )	2025-04-17 23:41:23 +00:00
.cursorrules	Initial `.rules` file for agent with symlinks for other rules file paths (#29014 )	2025-04-17 23:41:23 +00:00
.git-blame-ignore-revs	Add additional git-blame-ignore-revs (#27189 )	2025-03-20 09:17:56 -04:00
.gitattributes	Prevent GitHub from displaying comments within JSON files as errors (#7043 )	2024-01-29 23:11:25 -05:00
.gitignore	Remove .direnv from .gitignore as the correct file is `.envrc` (#29058 )	2025-04-18 16:53:41 +00:00
.mailmap	Add myself (Ben Kunkle) and Smit to the mailmap (#25590 )	2025-02-25 19:55:39 +00:00
.rules	Add a brief description of GPUI 2->GPUI 3 changes to `.rules` (#29180 )	2025-04-21 22:41:15 +00:00
.windsurfrules	Initial `.rules` file for agent with symlinks for other rules file paths (#29014 )	2025-04-17 23:41:23 +00:00
Cargo.lock	Rework eval to support interpretable scores and efficient repetitions (#29197 )	2025-04-22 14:00:09 +00:00
Cargo.toml	Streaming tool calls (#29179 )	2025-04-21 22:28:32 +00:00
CLAUDE.md	Initial `.rules` file for agent with symlinks for other rules file paths (#29014 )	2025-04-17 23:41:23 +00:00
clippy.toml	chore: Fix some violations of 'needless_pass_by_ref_mut' lint (#18795 )	2024-10-07 01:29:58 +02:00
CODE_OF_CONDUCT.md	Remove community content from docs and point to zed.dev (#19895 )	2024-10-29 09:44:58 -04:00
compose.yml	Add Postgrest to Docker Compose (#16498 )	2024-08-19 20:50:45 -04:00
CONTRIBUTING.md	Allow icon themes to provide their own file associations (#24926 )	2025-02-15 00:35:13 +00:00
Cross.toml	Add remote server cross compilation (#19136 )	2024-10-12 23:23:56 -07:00
debug.plist	WIP	2023-12-14 09:25:14 -07:00
default.nix	Fix nix build (#26270 )	2025-03-10 01:06:11 -07:00
docker-compose.sql	collab: Setup database for LLM service (#15882 )	2024-08-06 17:18:08 -04:00
Dockerfile-collab	chore: Bump Rust version to 1.86 (#28021 )	2025-04-03 23:32:50 +02:00
Dockerfile-collab.dockerignore	ci: Move collab to Dockerfile-collab (#18515 )	2024-09-30 16:14:26 -04:00
Dockerfile-cross	Add remote server cross compilation (#19136 )	2024-10-12 23:23:56 -07:00
Dockerfile-cross.dockerignore	Add remote server cross compilation (#19136 )	2024-10-12 23:23:56 -07:00
Dockerfile-distros	Support More Linux (#18480 )	2024-09-30 17:46:21 -04:00
Dockerfile-distros.dockerignore	Support More Linux (#18480 )	2024-09-30 17:46:21 -04:00
flake.lock	nix: Bump rust-overlay for Rust 1.86 (#28181 )	2025-04-14 01:14:54 -07:00
flake.nix	nix: Separate debug output (#27871 )	2025-04-01 14:19:10 -07:00
LICENSE-AGPL	Update license year (#24191 )	2025-02-04 09:02:59 -05:00
LICENSE-APACHE	Update license year (#24191 )	2025-02-04 09:02:59 -05:00
LICENSE-GPL	Licenses: change license fields in Cargo.toml to AGPL-3.0-or-later. (#5535 )	2024-01-27 13:51:16 +01:00
livekit.yaml	Add LiveKit server to Docker Compose (#7907 )	2024-02-16 10:49:48 -05:00
Procfile	Refactor: Restructure collab main function to prepare for new subcommand: `serve llm` (#15824 )	2024-08-05 12:07:38 -07:00
Procfile.postgrest	Fix llm queries (#16006 )	2024-08-08 17:21:38 -07:00
README.md	Format READMEs (#17454 )	2024-09-05 15:39:16 -04:00
renovate.json	renovate: Require dependency dashboard approval for updates (#29065 )	2025-04-18 18:44:30 +00:00
rust-toolchain.toml	chore: Bump Rust version to 1.86 (#28021 )	2025-04-03 23:32:50 +02:00
shell.nix	Fix nix build (#26270 )	2025-03-10 01:06:11 -07:00
typos.toml	Systematically optimize agentic editing performance (#28961 )	2025-04-19 02:47:59 +00:00

README.md

Zed

Welcome to Zed, a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

Installation

On macOS and Linux you can download Zed directly or install Zed via your local package manager.

Other platforms are not yet available:

Windows (tracking issue)
Web (tracking issue)

Developing Zed

Contributing

See CONTRIBUTING.md for ways you can contribute to Zed.

Also... we're hiring! Check out our jobs page for open roles.

Licensing

License information for third party dependencies must be correctly provided for CI to pass.

We use cargo-about to automatically comply with open source licenses. If CI is failing, check the following:

Is it showing a no license specified error for a crate you've created? If so, add publish = false under [package] in your crate's Cargo.toml.
Is the error failed to satisfy license requirements for a dependency? If so, first determine what license the project has and whether this system is sufficient to comply with this license's requirements. If you're unsure, ask a lawyer. Once you've verified that this system is acceptable add the license's SPDX identifier to the accepted array in script/licenses/zed-licenses.toml.
Is cargo-about unable to find the license for a dependency? If so, add a clarification field at the end of script/licenses/zed-licenses.toml, as specified in the cargo-about book.