ZIm/crates/eval/src
Oleksiy Syvokon 3884de937b
assistant: Partial fix for HTML entities in tools params (#32148)
This problem seems to be specific to Opus 4. Eval shows improvement from
89% to 97%.

Closes: https://github.com/zed-industries/zed/issues/32060

Release Notes:

- N/A

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
2025-06-05 10:36:55 +00:00
..
examples assistant: Partial fix for HTML entities in tools params (#32148) 2025-06-05 10:36:55 +00:00
assertions.rs eval: Count execution errors as failures (#30712) 2025-05-14 20:44:19 +03:00
eval.rs evals: Configurable judge model (#31282) 2025-05-23 15:03:09 +00:00
example.rs agent: Generate a notification when reaching tool use limit (#31894) 2025-06-02 21:57:42 -03:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs evals: Allow threads explorer to search for JSON files recursively (#31509) 2025-05-27 14:18:47 +00:00
ids.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
instance.rs Pass up intent with completion requests (#31710) 2025-05-29 20:43:12 +00:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00