ZIm/crates/assistant_tools
Oleksiy Syvokon ab017129d8
agent: Improve Gemini support in the edit_file tool (#31116)
This change improves `eval_extract_handle_command_output` results for
all models:

Model                       | Pass rate before | Pass rate after
----------------------------|------------------|----------------
claude-3.7-sonnet           |  0.96            | 0.98
gemini-2.5-pro              |  0.35            | 0.86
gpt-4.1                     |  0.81            | 1.00

Part of this improvement comes from more robust evaluation, which now
accepts multiple possible outcomes. Another part is from the prompt
adaptation: addressing common Gemini failure modes, adding a few-shot
example, and, in the final commit, auto-rewriting instructions for
clarity and conciseness.

This change still needs validation from larger end-to-end evals.


Release Notes:

- N/A
2025-05-22 12:01:43 +03:00
..
src agent: Improve Gemini support in the edit_file tool (#31116) 2025-05-22 12:01:43 +03:00
Cargo.toml evals: Add system prompt to edit agent evals + fix edit agent (#31082) 2025-05-21 10:14:58 +00:00
LICENSE-GPL Factor tool definitions out of assistant (#21189) 2024-11-25 18:26:34 -05:00