
This update generates a single self-contained .html file that shows an overview of evaluation threads in the browser. It's useful for: - Quickly reviewing results - Sharing evaluation runs - Debugging - Comparing models (TBD) Features: - Export thread JSON from the UI - Keyboard navigation (j/k or Ctrl + ←/→) - Toggle between compact and full views Generating the overview: - `cargo run -p eval` will write this file in the run dir's root. - Or you can call `cargo run -p eval --bin explorer` to generate it without running evals. Screenshot:  Release Notes: - N/A
1.9 KiB
1.9 KiB
Explorer
Threads Explorer is a single self-contained HTML file that gives an overview of evaluation runs, while allowing for some interactivity.
When you open a file, it gives you a thread overview, which looks like this:
Turn | Text | Tool | Result |
---|---|---|---|
1 | [User]: | ||
Fix the bug: kwargs not passed... | |||
2 | I'll help you fix that bug. | list_directory(path="fastmcp") | fastmcp/src [...] |
3 | Let's examine the code. | read_file(path="fastmcp/main.py", [...]) | def run_application(app, \*\*kwargs): [...] |
4 | I found the issue. | edit_file(path="fastmcp/core.py", [...]) | Made edit to fastmcp/core.py |
5 | Let's check if there are any errors. | diagnostics() | No errors found |
Implementation details
src/explorer.html
contains the template. You can open this template in a
browser as is, and it will show some dummy values. But the main use is to set
the threadsData
variable with real data, which then will be used instead of
the dummy values.
src/explorer.rs
takes one or more JSON files as generated by cargo run -p eval
, and outputs an HTML file for rendering these threads. Refer dummy data
in explorer.html
for a sample format.