ZIm/crates/eval/src
Oleksiy Syvokon 6420df3975
eval: Count execution errors as failures (#30712)
- Evals returning an error (e.g., LLM API format mismatch) were silently
skipped in the aggregated results. Now we count them as a failure (0%
success score).

- Setting the `VERBOSE` environment variable to something non-empty
disables string truncation

Release Notes:

- N/A
2025-05-14 20:44:19 +03:00
..
examples agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
assertions.rs eval: Count execution errors as failures (#30712) 2025-05-14 20:44:19 +03:00
eval.rs eval: Count execution errors as failures (#30712) 2025-05-14 20:44:19 +03:00
example.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
ids.rs Add new action to run agent eval (#29158) 2025-04-21 21:30:21 -07:00
instance.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00