ZIm/crates/eval
Richard Feldman 00fd045844
Make language model deserialization more resilient (#31311)
This expands our deserialization of JSON from models to be more tolerant
of different variations that the model may send, including
capitalization, wrapping things in objects vs. being plain strings, etc.

Also when deserialization fails, it reports the entire error in the JSON
so we can see what failed to deserialize. (Previously these errors were
very unhelpful at diagnosing the problem.)

Finally, also removes the `WrappedText` variant since the custom
deserializer just turns that style of JSON into a normal `Text` variant.

Release Notes:

- N/A
2025-05-28 12:06:07 -04:00
..
docs eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
src Make language model deserialization more resilient (#31311) 2025-05-28 12:06:07 -04:00
.gitignore Add judge to new eval + provide LSP diagnostics (#28713) 2025-04-14 20:18:47 +00:00
Cargo.toml Rename assistant_settings to agent_settings (#31513) 2025-05-27 15:16:55 +00:00
LICENSE-GPL Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
README.md eval: Add support for reading from a .env file (#29426) 2025-04-25 15:53:02 +00:00
runner_settings.json Introduce a new StreamingEditFileTool (#29733) 2025-05-01 17:37:43 +02:00

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval

The eval will optionally read a .env file in crates/eval if you need it to set environment variables, such as API keys.

Explorer Tool

The explorer tool generates a self-contained HTML view from one or more thread JSON file. It provides a visual interface to explore the agent thread, including tool calls and results. See ./docs/explorer.md for more details.

Usage

cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>

Example:

cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html