ZIm/crates/eval
Nathan Sobo 458ffaa134
Add new action to run agent eval (#29158)
The old one wasn't linking, and
https://github.com/zed-industries/zed/pull/29081 has a bunch of merge
conflicts. Wanted to start simple/small.

## Todo

* [x] Remove low-signal examples
* [x] Make the eval run on a cron, on main, and on any PR with the
`run-eval` label
* [x] Noise in logs about failure to write settings
    ```
[2025-04-21T20:45:04Z ERROR settings] Failed to write settings to file
"/home/runner/.config/zed/settings.json"
    
       Caused by:
No such file or directory (os error 2) at path
"/home/runner/.config/zed/.tmpLewFEs"
    ```
* [x] `Agentic loop stalled`
(https://github.com/zed-industries/zed/actions/runs/14581044243/job/40897622894)
* [x] Make sure that events are recorded in snowflake
* [ ] Change judge criteria to be more explicit about meanings of scores

Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Agus Zubiaga <hi@aguz.me>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
2025-04-21 21:30:21 -07:00
..
examples/find_and_replace_diff_card Add new action to run agent eval (#29158) 2025-04-21 21:30:21 -07:00
src Add new action to run agent eval (#29158) 2025-04-21 21:30:21 -07:00
.gitignore Add judge to new eval + provide LSP diagnostics (#28713) 2025-04-14 20:18:47 +00:00
Cargo.toml Write {result_count}.diff and last.diff eval run outputs (#29181) 2025-04-21 23:19:07 +00:00
LICENSE-GPL Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
README.md Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
runner_settings.json eval: Fix stalling on tool confirmation (#28786) 2025-04-15 16:53:45 +00:00

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval