agent: Overwrite files more cautiously (#30649)

1. The `edit_file` tool tended to use `create_or_overwrite` a bit too
often, leading to corruption of long files. This change replaces the
boolean flag with an `EditFileMode` enum, which helps Agent make a more
deliberate choice when overwriting files.

With this change, the pass rate of the new eval increased from 10% to
100%.

2. eval: Added ability to run eval on top of an existing thread. Threads
can now be loaded from JSON files in the `SerializedThread` format,
which makes it easy to use real threads as starting points for
tests/evals.

3. Don't try to restore tool cards when running in headless or eval mode
-- we don't have a window to properly do this.

Release Notes:

- N/A
This commit is contained in:
Oleksiy Syvokon 2025-05-14 10:40:44 +03:00 committed by GitHub
parent 22f76ac1a7
commit 255d8f7cf8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
18 changed files with 425 additions and 37 deletions

View file

@ -386,6 +386,25 @@ impl ThreadStore {
})
}
pub fn create_thread_from_serialized(
&mut self,
serialized: SerializedThread,
cx: &mut Context<Self>,
) -> Entity<Thread> {
cx.new(|cx| {
Thread::deserialize(
ThreadId::new(),
serialized,
self.project.clone(),
self.tools.clone(),
self.prompt_builder.clone(),
self.project_context.clone(),
None,
cx,
)
})
}
pub fn open_thread(
&self,
id: &ThreadId,
@ -411,7 +430,7 @@ impl ThreadStore {
this.tools.clone(),
this.prompt_builder.clone(),
this.project_context.clone(),
window,
Some(window),
cx,
)
})