
This update generates a single self-contained .html file that shows an overview of evaluation threads in the browser. It's useful for: - Quickly reviewing results - Sharing evaluation runs - Debugging - Comparing models (TBD) Features: - Export thread JSON from the UI - Keyboard navigation (j/k or Ctrl + ←/→) - Toggle between compact and full views Generating the overview: - `cargo run -p eval` will write this file in the run dir's root. - Or you can call `cargo run -p eval --bin explorer` to generate it without running evals. Screenshot:  Release Notes: - N/A
27 lines
1.9 KiB
Markdown
27 lines
1.9 KiB
Markdown
# Explorer
|
|
|
|
Threads Explorer is a single self-contained HTML file that gives an overview of
|
|
evaluation runs, while allowing for some interactivity.
|
|
|
|
When you open a file, it gives you a _thread overview_, which looks like this:
|
|
|
|
| Turn | Text | Tool | Result |
|
|
| ---- | ------------------------------------ | -------------------------------------------- | --------------------------------------------- |
|
|
| 1 | [User]: | | |
|
|
| | Fix the bug: kwargs not passed... | | |
|
|
| 2 | I'll help you fix that bug. | **list_directory**(path="fastmcp") | `fastmcp/src [...]` |
|
|
| | | | |
|
|
| 3 | Let's examine the code. | **read_file**(path="fastmcp/main.py", [...]) | `def run_application(app, \*\*kwargs): [...]` |
|
|
| 4 | I found the issue. | **edit_file**(path="fastmcp/core.py", [...]) | `Made edit to fastmcp/core.py` |
|
|
| 5 | Let's check if there are any errors. | **diagnostics**() | `No errors found` |
|
|
|
|
### Implementation details
|
|
|
|
`src/explorer.html` contains the template. You can open this template in a
|
|
browser as is, and it will show some dummy values. But the main use is to set
|
|
the `threadsData` variable with real data, which then will be used instead of
|
|
the dummy values.
|
|
|
|
`src/explorer.rs` takes one or more JSON files as generated by `cargo run -p
|
|
eval`, and outputs an HTML file for rendering these threads. Refer dummy data
|
|
in `explorer.html` for a sample format.
|