ZIm/crates/eval
Agus Zubiaga fd17f2d8ae
agent: Enrich grep tool output with syntax information (#29601)
The `grep` tool used to include 4 lines of context around the match, but
the lines included would often be unhelpful. This PR improves this
behavior by using the range of the parent syntax node that contains the
full line(s) matched.

The match headers will also now include symbol breadcrumbs so that the
model can already gather code structure before/without reading files.

````md
### impl GitRepository for RealGitRepository › fn compare_checkpoints › L1278-1284
```rust
                let result = git
                    .run(&[
                        "diff-tree",
                        "--quiet",
                        &left.commit_sha.to_string(),
                        &right.commit_sha.to_string(),
                    ])
```
````

This positively impacts the `add_arg_to_trait_method` eval example with
better diff output, fewer tool failures, and reduced total turns.

Note: We have some plans to use a an "elision" approach where we would
combine all matches for a given file, skipping lines between them while
keeping symbol declaration lines. The theory is that this would be map
more closely to the expected input for edits. For now, this PR is a
significant improvement.

Release Notes:

- Agent: Enrich `grep` tool output with syntax information
2025-04-29 17:03:02 +00:00
..
docs eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
src agent: Enrich grep tool output with syntax information (#29601) 2025-04-29 17:03:02 +00:00
.gitignore Add judge to new eval + provide LSP diagnostics (#28713) 2025-04-14 20:18:47 +00:00
Cargo.toml eval: Use workspace dependencies (#29430) 2025-04-25 16:11:26 +00:00
LICENSE-GPL Lay the groundwork for a Rust-based eval (#28488) 2025-04-10 04:45:27 +00:00
README.md eval: Add support for reading from a .env file (#29426) 2025-04-25 15:53:02 +00:00
runner_settings.json eval: Fix stalling on tool confirmation (#28786) 2025-04-15 16:53:45 +00:00

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval

The eval will optionally read a .env file in crates/eval if you need it to set environment variables, such as API keys.

Explorer Tool

The explorer tool generates a self-contained HTML view from one or more thread JSON file. It provides a visual interface to explore the agent thread, including tool calls and results. See ./docs/explorer.md for more details.

Usage

cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>

Example:

cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html