eval: Add HTML overview for evaluation runs (#29413)

This update generates a single self-contained .html file that shows an overview of evaluation threads in the browser. It's useful for: - Quickly reviewing results - Sharing evaluation runs - Debugging - Comparing models (TBD) Features: - Export thread JSON from the UI - Keyboard navigation (j/k or Ctrl + ←/→) - Toggle between compact and full views Generating the overview: - `cargo run -p eval` will write this file in the run dir's root. - Or you can call `cargo run -p eval --bin explorer` to generate it without running evals. Screenshot: ![image](https://github.com/user-attachments/assets/4ead71f6-da08-48ea-8fcb-2148d2e4b4db) Release Notes: - N/A
2025-04-25 17:49:05 +03:00 · 2025-04-25 17:49:05 +03:00 · 3389327df5
commit 3389327df5
parent f106dfca42
7 changed files with 1351 additions and 149 deletions
--- a/crates/eval/README.md
+++ b/crates/eval/README.md
@ -5,3 +5,21 @@ This eval assumes the working directory is the root of the repository. Run it wi
 ```sh
 cargo run -p eval
 ```
+
+## Explorer Tool
+
+The explorer tool generates a self-contained HTML view from one or more thread
+JSON file. It provides a visual interface to explore the agent thread, including
+tool calls and results. See [./docs/explorer.md](./docs/explorer.md) for more details.
+
+### Usage
+
+```sh
+cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>
+```
+
+Example:
+
+```sh
+cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html
+```