1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.
With this change, the pass rate of the new eval increased from 10% to
100%.
2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.
3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.
Release Notes:
- N/A
This change:
1. Catches attempts to use missing tools. If this happens, we now send
Agent a message listing available tools, after which Agent can
gracefully recover. Prior behavior: thread would stop in a broken state.
Example of a hallucinated call and a message we send back:

2. Adds evals for hallucinated tool use and imagined edits
3. Adds ability to configure a profile name in evals.
Release Notes:
- N/A
The `grep` tool used to include 4 lines of context around the match, but
the lines included would often be unhelpful. This PR improves this
behavior by using the range of the parent syntax node that contains the
full line(s) matched.
The match headers will also now include symbol breadcrumbs so that the
model can already gather code structure before/without reading files.
````md
### impl GitRepository for RealGitRepository › fn compare_checkpoints › L1278-1284
```rust
let result = git
.run(&[
"diff-tree",
"--quiet",
&left.commit_sha.to_string(),
&right.commit_sha.to_string(),
])
```
````
This positively impacts the `add_arg_to_trait_method` eval example with
better diff output, fewer tool failures, and reduced total turns.
Note: We have some plans to use a an "elision" approach where we would
combine all matches for a given file, skipping lines between them while
keeping symbol declaration lines. The theory is that this would be map
more closely to the expected input for edits. For now, this PR is a
significant improvement.
Release Notes:
- Agent: Enrich `grep` tool output with syntax information