ZIm/crates/eval/src at 00fd045844a4bfc902863c64a2a331df16fea629 - Yehowshua/ZIm

History

Richard Feldman 00fd045844 Make language model deserialization more resilient (#31311 ) This expands our deserialization of JSON from models to be more tolerant of different variations that the model may send, including capitalization, wrapping things in objects vs. being plain strings, etc. Also when deserialization fails, it reports the entire error in the JSON so we can see what failed to deserialize. (Previously these errors were very unhelpful at diagnosing the problem.) Finally, also removes the `WrappedText` variant since the custom deserializer just turns that style of JSON into a normal `Text` variant. Release Notes: - N/A		2025-05-28 12:06:07 -04:00
..
examples	Rename `assistant_settings` to `agent_settings` (#31513 )	2025-05-27 15:16:55 +00:00
assertions.rs	eval: Count execution errors as failures (#30712 )	2025-05-14 20:44:19 +03:00
eval.rs	evals: Configurable judge model (#31282 )	2025-05-23 15:03:09 +00:00
example.rs	Rename `assistant_settings` to `agent_settings` (#31513 )	2025-05-27 15:16:55 +00:00
explorer.html	eval: Add HTML overview for evaluation runs (#29413 )	2025-04-25 17:49:05 +03:00
explorer.rs	evals: Allow threads explorer to search for JSON files recursively (#31509 )	2025-05-27 14:18:47 +00:00
ids.rs	Use `anyhow` more idiomatically (#31052 )	2025-05-20 23:06:07 +00:00
instance.rs	Make language model deserialization more resilient (#31311 )	2025-05-28 12:06:07 -04:00
judge_diff_prompt.hbs	eval: Fine-grained assertions (#29246 )	2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs	eval: Fine-grained assertions (#29246 )	2025-04-22 23:58:58 -03:00
tool_metrics.rs	eval: Fine-grained assertions (#29246 )	2025-04-22 23:58:58 -03:00