![]() Add a targeted eval for code block formatting, and revise the system prompt accordingly. ### Eval before, n=8 <img width="728" alt="eval before" src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2" /> ### Eval after prompt change, n=8 (excluding the new evals, so just testing the prompt change) <img width="717" alt="eval after" src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e" /> Release Notes: - N/A |
||
---|---|---|
.. | ||
docs | ||
src | ||
.gitignore | ||
Cargo.toml | ||
LICENSE-GPL | ||
README.md | ||
runner_settings.json |
Eval
This eval assumes the working directory is the root of the repository. Run it with:
cargo run -p eval
The eval will optionally read a .env
file in crates/eval
if you need it to set environment variables, such as API keys.
Explorer Tool
The explorer tool generates a self-contained HTML view from one or more thread JSON file. It provides a visual interface to explore the agent thread, including tool calls and results. See ./docs/explorer.md for more details.
Usage
cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>
Example:
cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html