History

Antonio Scandurra 97ab0980d1 Start tracking tool failure rates in eval (#29122 ) This pull request will print all the used tools and their failure rates. The objective goal should be to minimize that failure rate. @tmickleydoyle: this also changes the telemetry event to report `tool_metrics` as opposed to `tool_use_counts`. Ideally I'd love to be able to plot failure rates by tool and hopefully see that percentage go down. Can we do that with the data we're tracking with this pull request? Release Notes: - N/A		2025-04-21 16:16:43 +02:00
..
examples	Rename regex search tool to grep and accept an include glob pattern (#29100 )	2025-04-20 00:53:30 +00:00
src	Start tracking tool failure rates in eval (#29122 )	2025-04-21 16:16:43 +02:00
.gitignore	Add judge to new eval + provide LSP diagnostics (#28713 )	2025-04-14 20:18:47 +00:00
Cargo.toml	Pretty tool inputs in eval output markdown + numbered assistant messages (#29082 )	2025-04-19 06:59:22 +00:00
LICENSE-GPL	Lay the groundwork for a Rust-based eval (#28488 )	2025-04-10 04:45:27 +00:00
README.md	Lay the groundwork for a Rust-based eval (#28488 )	2025-04-10 04:45:27 +00:00
runner_settings.json	eval: Fix stalling on tool confirmation (#28786 )	2025-04-15 16:53:45 +00:00

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval