ZIm/crates/eval/src
Richard Feldman 49887d6934
Add no_tools_enabled eval (#30537)
This is our first eval of the Minimal tool profile. Right now they're
all passing; the value of having it is to catch regressions in the
system prompt (which has special logic in it for the case where no tools
are enabled).

Release Notes:

- N/A
2025-05-12 08:52:03 +00:00
..
examples Add no_tools_enabled eval (#30537) 2025-05-12 08:52:03 +00:00
assertions.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
eval.rs Wait to locate system-installed Node until the shell environment is loaded (#30416) 2025-05-09 19:24:28 +00:00
example.rs agent: Handle attempts to use hallucinated tools (#29946) 2025-05-05 19:31:11 +00:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
ids.rs Add new action to run agent eval (#29158) 2025-04-21 21:30:21 -07:00
instance.rs Include EditAgent's raw output when inspecting thread (#30337) 2025-05-09 06:58:45 +00:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00