eval: Add HTML overview for evaluation runs (#29413)

This update generates a single self-contained .html file that shows an overview of evaluation threads in the browser. It's useful for: - Quickly reviewing results - Sharing evaluation runs - Debugging - Comparing models (TBD) Features: - Export thread JSON from the UI - Keyboard navigation (j/k or Ctrl + ←/→) - Toggle between compact and full views Generating the overview: - `cargo run -p eval` will write this file in the run dir's root. - Or you can call `cargo run -p eval --bin explorer` to generate it without running evals. Screenshot: ![image](https://github.com/user-attachments/assets/4ead71f6-da08-48ea-8fcb-2148d2e4b4db) Release Notes: - N/A
2025-04-25 17:49:05 +03:00 · 2025-04-25 17:49:05 +03:00 · 3389327df5
commit 3389327df5
parent f106dfca42
7 changed files with 1351 additions and 149 deletions
--- a/crates/eval/docs/explorer.md
+++ b/crates/eval/docs/explorer.md
@ -0,0 +1,27 @@
+# Explorer
+
+Threads Explorer is a single self-contained HTML file that gives an overview of
+evaluation runs, while allowing for some interactivity.
+
+When you open a file, it gives you a _thread overview_, which looks like this:
+
+| Turn | Text                                 | Tool                                         | Result                                        |
+| ---- | ------------------------------------ | -------------------------------------------- | --------------------------------------------- |
+| 1    | [User]:                              |                                              |                                               |
+|      | Fix the bug: kwargs not passed...    |                                              |                                               |
+| 2    | I'll help you fix that bug.          | **list_directory**(path="fastmcp")           | `fastmcp/src [...]`                           |
+|      |                                      |                                              |                                               |
+| 3    | Let's examine the code.              | **read_file**(path="fastmcp/main.py", [...]) | `def run_application(app, \*\*kwargs): [...]` |
+| 4    | I found the issue.                   | **edit_file**(path="fastmcp/core.py", [...]) | `Made edit to fastmcp/core.py`                |
+| 5    | Let's check if there are any errors. | **diagnostics**()                            | `No errors found`                             |
+
+### Implementation details
+
+`src/explorer.html` contains the template. You can open this template in a
+browser as is, and it will show some dummy values. But the main use is to set
+the `threadsData` variable with real data, which then will be used instead of
+the dummy values.
+
+`src/explorer.rs` takes one or more JSON files as generated by `cargo run -p
+eval`, and outputs an HTML file for rendering these threads. Refer dummy data
+in `explorer.html` for a sample format.