ZIm/crates/eval/src/examples
Oleksiy Syvokon 255d8f7cf8
agent: Overwrite files more cautiously (#30649)
1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.

With this change, the pass rate of the new eval increased from 10% to
100%.

2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.

3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.

Release Notes:

- N/A
2025-05-14 10:40:44 +03:00
..
threads agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
add_arg_to_trait_method.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
code_block_citations.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
comment_translation.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
file_search.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
find_and_replace_diff_card.toml eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
hallucinated_tool_calls.toml agent: Handle attempts to use hallucinated tools (#29946) 2025-05-05 19:31:11 +00:00
mod.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
no_tools_enabled.toml Add no_tools_enabled eval (#30537) 2025-05-12 08:52:03 +00:00
overwrite_file.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
planets.rs agent: Overwrite files more cautiously (#30649) 2025-05-14 10:40:44 +03:00
tree_sitter_drop_emscripten_dep.toml Add tree-sitter example to the eval (#29321) 2025-04-23 18:46:38 -07:00