agent: Improve Gemini support in the edit_file tool (#31116)

This change improves `eval_extract_handle_command_output` results for
all models:

Model                       | Pass rate before | Pass rate after
----------------------------|------------------|----------------
claude-3.7-sonnet           |  0.96            | 0.98
gemini-2.5-pro              |  0.35            | 0.86
gpt-4.1                     |  0.81            | 1.00

Part of this improvement comes from more robust evaluation, which now
accepts multiple possible outcomes. Another part is from the prompt
adaptation: addressing common Gemini failure modes, adding a few-shot
example, and, in the final commit, auto-rewriting instructions for
clarity and conciseness.

This change still needs validation from larger end-to-end evals.


Release Notes:

- N/A
This commit is contained in:
Oleksiy Syvokon 2025-05-22 12:01:43 +03:00 committed by GitHub
parent 71fb17c507
commit ab017129d8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
15 changed files with 307 additions and 398 deletions

View file

@ -65,7 +65,9 @@ use std::{num::NonZeroU32, sync::OnceLock};
use syntax_map::{QueryCursorHandle, SyntaxSnapshot};
use task::RunnableTag;
pub use task_context::{ContextProvider, RunnableRange};
pub use text_diff::{DiffOptions, line_diff, text_diff, text_diff_with_options, unified_diff};
pub use text_diff::{
DiffOptions, apply_diff_patch, line_diff, text_diff, text_diff_with_options, unified_diff,
};
use theme::SyntaxTheme;
pub use toolchain::{LanguageToolchainStore, Toolchain, ToolchainList, ToolchainLister};
use tree_sitter::{self, Query, QueryCursor, WasmStore, wasmtime};