ZIm/crates/eval/src
Richard Feldman 00fd045844
Make language model deserialization more resilient (#31311)
This expands our deserialization of JSON from models to be more tolerant
of different variations that the model may send, including
capitalization, wrapping things in objects vs. being plain strings, etc.

Also when deserialization fails, it reports the entire error in the JSON
so we can see what failed to deserialize. (Previously these errors were
very unhelpful at diagnosing the problem.)

Finally, also removes the `WrappedText` variant since the custom
deserializer just turns that style of JSON into a normal `Text` variant.

Release Notes:

- N/A
2025-05-28 12:06:07 -04:00
..
examples Rename assistant_settings to agent_settings (#31513) 2025-05-27 15:16:55 +00:00
assertions.rs eval: Count execution errors as failures (#30712) 2025-05-14 20:44:19 +03:00
eval.rs evals: Configurable judge model (#31282) 2025-05-23 15:03:09 +00:00
example.rs Rename assistant_settings to agent_settings (#31513) 2025-05-27 15:16:55 +00:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs evals: Allow threads explorer to search for JSON files recursively (#31509) 2025-05-27 14:18:47 +00:00
ids.rs Use anyhow more idiomatically (#31052) 2025-05-20 23:06:07 +00:00
instance.rs Make language model deserialization more resilient (#31311) 2025-05-28 12:06:07 -04:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00