ZIm/crates/eval/src at 8a24f9f2803a7eedffa14a26ed6b2a38746d6e4a - Yehowshua/ZIm

History

Oleksiy Syvokon 68a46c3627 evals: Configurable judge model (#31282 ) This is needed for apples-to-apples comparison of different agent models. Another change is that now `cargo -p eval` accepts model names as `provider_id/model_id` instead of separate `--provider` and `--model` params. Release Notes: - N/A		2025-05-23 15:03:09 +00:00
..
examples	agent: Overwrite files more cautiously (#30649 )	2025-05-14 10:40:44 +03:00
assertions.rs	eval: Count execution errors as failures (#30712 )	2025-05-14 20:44:19 +03:00
eval.rs	evals: Configurable judge model (#31282 )	2025-05-23 15:03:09 +00:00
example.rs	Handle new `refusal` stop reason from Claude 4 models (#31217 )	2025-05-22 16:56:59 -04:00
explorer.html	eval: Add HTML overview for evaluation runs (#29413 )	2025-04-25 17:49:05 +03:00
explorer.rs	Use `anyhow` more idiomatically (#31052 )	2025-05-20 23:06:07 +00:00
ids.rs	Use `anyhow` more idiomatically (#31052 )	2025-05-20 23:06:07 +00:00
instance.rs	Use `anyhow` more idiomatically (#31052 )	2025-05-20 23:06:07 +00:00
judge_diff_prompt.hbs	eval: Fine-grained assertions (#29246 )	2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs	eval: Fine-grained assertions (#29246 )	2025-04-22 23:58:58 -03:00
tool_metrics.rs	eval: Fine-grained assertions (#29246 )	2025-04-22 23:58:58 -03:00