ZIm/crates/eval
Richard Feldman 57e8f5c5b9
Automatically retry in more situations (#34473)
In #33275 I was very conservative about when to retry when there are
errors in language completions in the Agent panel.

Now we retry in more scenarios (e.g. HTTP 5xx and 4xx errors that aren't
in the specific list of ones that we handle differently, such as 429s),
and also we show a notification if the thread halts for any reason.

<img width="441" height="68" alt="Screenshot 2025-07-15 at 12 51 30 PM"
src="https://github.com/user-attachments/assets/433775d0-a8b2-403d-9427-1e296d164980"
/>
<img width="482" height="322" alt="Screenshot 2025-07-15 at 12 44 15 PM"
src="https://github.com/user-attachments/assets/5a508224-0fe0-4d34-9768-25d95873eab8"
/>


Release Notes:

- Automatic retry for more Agent errors
- Whenever the Agent stops, play a sound (if configured) and show a
notification (if configured) if the Zed window was in the background.
2025-07-15 14:22:13 -04:00
..
docs eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
src Automatically retry in more situations (#34473) 2025-07-15 14:22:13 -04:00
.gitignore Add judge to new eval + provide LSP diagnostics (#28713) 2025-04-14 20:18:47 +00:00
Cargo.toml debugger: Handle the envFile setting for Go (#33666) 2025-07-01 09:14:59 -07:00
LICENSE-GPL Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
README.md eval: Add support for reading from a .env file (#29426) 2025-04-25 15:53:02 +00:00
runner_settings.json Introduce a new StreamingEditFileTool (#29733) 2025-05-01 17:37:43 +02:00

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval

The eval will optionally read a .env file in crates/eval if you need it to set environment variables, such as API keys.

Explorer Tool

The explorer tool generates a self-contained HTML view from one or more thread JSON file. It provides a visual interface to explore the agent thread, including tool calls and results. See ./docs/explorer.md for more details.

Usage

cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>

Example:

cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html