History

Richard Feldman d7004030b3 Code block evals (#29619 ) Add a targeted eval for code block formatting, and revise the system prompt accordingly. ### Eval before, n=8 <img width="728" alt="eval before" src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2" /> ### Eval after prompt change, n=8 (excluding the new evals, so just testing the prompt change) <img width="717" alt="eval after" src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e" /> Release Notes: - N/A		2025-04-29 18:52:09 -04:00
..
docs	eval: Add HTML overview for evaluation runs (#29413 )	2025-04-25 17:49:05 +03:00
src	Code block evals (#29619 )	2025-04-29 18:52:09 -04:00
.gitignore	Add judge to new eval + provide LSP diagnostics (#28713 )	2025-04-14 20:18:47 +00:00
Cargo.toml	Code block evals (#29619 )	2025-04-29 18:52:09 -04:00
LICENSE-GPL	Lay the groundwork for a Rust-based eval (#28488 )	2025-04-10 04:45:27 +00:00
README.md	eval: Add support for reading from a `.env` file (#29426 )	2025-04-25 15:53:02 +00:00
runner_settings.json	eval: Fix stalling on tool confirmation (#28786 )	2025-04-15 16:53:45 +00:00

README.md

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval

The eval will optionally read a .env file in crates/eval if you need it to set environment variables, such as API keys.

Explorer Tool

The explorer tool generates a self-contained HTML view from one or more thread JSON file. It provides a visual interface to explore the agent thread, including tool calls and results. See ./docs/explorer.md for more details.

Usage

cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>

Example:

cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html