ZIm/crates/eval
Antonio Scandurra 97ab0980d1
Start tracking tool failure rates in eval (#29122)
This pull request will print all the used tools and their failure rates.
The objective goal should be to minimize that failure rate.

@tmickleydoyle: this also changes the telemetry event to report
`tool_metrics` as opposed to `tool_use_counts`. Ideally I'd love to be
able to plot failure rates by tool and hopefully see that percentage go
down. Can we do that with the data we're tracking with this pull
request?

Release Notes:

- N/A
2025-04-21 16:16:43 +02:00
..
examples Rename regex search tool to grep and accept an include glob pattern (#29100) 2025-04-20 00:53:30 +00:00
src Start tracking tool failure rates in eval (#29122) 2025-04-21 16:16:43 +02:00
.gitignore Add judge to new eval + provide LSP diagnostics (#28713) 2025-04-14 20:18:47 +00:00
Cargo.toml Pretty tool inputs in eval output markdown + numbered assistant messages (#29082) 2025-04-19 06:59:22 +00:00
LICENSE-GPL Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
README.md Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
runner_settings.json eval: Fix stalling on tool confirmation (#28786) 2025-04-15 16:53:45 +00:00

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval