ZIm/crates/eval/src
Marshall Bowers 5c0b161563
Handle new refusal stop reason from Claude 4 models (#31217)
This PR adds support for handling the new [`refusal` stop
reason](https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals)
from Claude 4 models.

<img width="409" alt="Screenshot 2025-05-22 at 4 31 56 PM"
src="https://github.com/user-attachments/assets/707b04f5-5a52-4a19-95d9-cbd2be2dd86f"
/>

Release Notes:

- Added handling for `"stop_reason": "refusal"` from Claude 4 models.
2025-05-22 16:56:59 -04:00
..
examples agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
assertions.rs eval: Count execution errors as failures (#30712) 2025-05-14 20:44:19 +03:00
eval.rs chore: Make terminal_view own the TerminalSlashCommand (#31070) 2025-05-21 09:27:54 +00:00
example.rs Handle new refusal stop reason from Claude 4 models (#31217) 2025-05-22 16:56:59 -04:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
ids.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
instance.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00