eval: Add HTML overview for evaluation runs (#29413)

This update generates a single self-contained .html file that shows an
overview of evaluation threads in the browser. It's useful for:

- Quickly reviewing results
- Sharing evaluation runs
- Debugging
- Comparing models (TBD)

Features:

- Export thread JSON from the UI
- Keyboard navigation (j/k or Ctrl + ←/→)
- Toggle between compact and full views

Generating the overview:

- `cargo run -p eval` will write this file in the run dir's root.
- Or you can call `cargo run -p eval --bin explorer` to generate it
without running evals.


Screenshot:

![image](https://github.com/user-attachments/assets/4ead71f6-da08-48ea-8fcb-2148d2e4b4db)


Release Notes:

- N/A
This commit is contained in:
Oleksiy Syvokon 2025-04-25 17:49:05 +03:00 committed by GitHub
parent f106dfca42
commit 3389327df5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 1351 additions and 149 deletions

View file

@ -5,3 +5,21 @@ This eval assumes the working directory is the root of the repository. Run it wi
```sh
cargo run -p eval
```
## Explorer Tool
The explorer tool generates a self-contained HTML view from one or more thread
JSON file. It provides a visual interface to explore the agent thread, including
tool calls and results. See [./docs/explorer.md](./docs/explorer.md) for more details.
### Usage
```sh
cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>
```
Example:
```sh
cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html
```