eval: Add HTML overview for evaluation runs (#29413)
This update generates a single self-contained .html file that shows an overview of evaluation threads in the browser. It's useful for: - Quickly reviewing results - Sharing evaluation runs - Debugging - Comparing models (TBD) Features: - Export thread JSON from the UI - Keyboard navigation (j/k or Ctrl + ←/→) - Toggle between compact and full views Generating the overview: - `cargo run -p eval` will write this file in the run dir's root. - Or you can call `cargo run -p eval --bin explorer` to generate it without running evals. Screenshot:  Release Notes: - N/A
This commit is contained in:
parent
f106dfca42
commit
3389327df5
7 changed files with 1351 additions and 149 deletions
|
@ -5,3 +5,21 @@ This eval assumes the working directory is the root of the repository. Run it wi
|
|||
```sh
|
||||
cargo run -p eval
|
||||
```
|
||||
|
||||
## Explorer Tool
|
||||
|
||||
The explorer tool generates a self-contained HTML view from one or more thread
|
||||
JSON file. It provides a visual interface to explore the agent thread, including
|
||||
tool calls and results. See [./docs/explorer.md](./docs/explorer.md) for more details.
|
||||
|
||||
### Usage
|
||||
|
||||
```sh
|
||||
cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```sh
|
||||
cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue