ZIm/crates/eval/src
Oleksiy Syvokon 68a46c3627
evals: Configurable judge model (#31282)
This is needed for apples-to-apples comparison of different agent
models.

Another change is that now `cargo -p eval` accepts model names as
`provider_id/model_id` instead of separate `--provider` and `--model`
params.


Release Notes:

- N/A
2025-05-23 15:03:09 +00:00
..
examples agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
assertions.rs eval: Count execution errors as failures (#30712) 2025-05-14 20:44:19 +03:00
eval.rs evals: Configurable judge model (#31282) 2025-05-23 15:03:09 +00:00
example.rs Handle new refusal stop reason from Claude 4 models (#31217) 2025-05-22 16:56:59 -04:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
ids.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
instance.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00