Commit graph

117 commits

Author SHA1 Message Date
Michael Sloan
9a9f2e71ca
Agent Eval: Initial support for running examples repeatedly (#28844)
Not ideal as it creates a separate worktree for each repetition

Release Notes:

- N/A
2025-04-16 06:35:55 +00:00
Michael Sloan
609895d95f
Agent Eval: bounded concurrency (#28843)
Release Notes:

- N/A
2025-04-16 00:05:46 -06:00
Michael Sloan
da2d8bd845
Agent Eval: Distinguish tool successes and failures in log (#28839)
Release Notes:

- N/A
2025-04-15 22:51:33 -06:00
Thomas Mickley-Doyle
222d4a2546
agent: Add telemetry for eval runs (#28816)
Release Notes:

- N/A

---------

Co-authored-by: Joseph <joseph@zed.dev>
2025-04-16 02:54:26 +00:00
Michael Sloan
102ea6ac79
Add support for judge repetitions in eval (#28811)
Release Notes:

- N/A

---------

Co-authored-by: Thomas <thomas@zed.dev>
2025-04-15 23:18:02 +00:00
Agus Zubiaga
0182e09e33
eval: Do not create run files for skipped examples (#28800)
Release Notes:

- N/A
2025-04-15 18:00:04 +00:00
Agus Zubiaga
ff4334efc7
eval: Fix stalling on tool confirmation (#28786)
The `always_allow_tool_actions` setting would get overridden with the
default when we loaded each example project, leading to examples
stalling when they run a tool that needed confirmation. There's now a
separate `runner_settings.json` file where we can configure the
environment for the eval.

Release Notes:

- N/A

---------

Co-authored-by: Oleksiy <oleksiy@zed.dev>
2025-04-15 16:53:45 +00:00
Thomas Mickley-Doyle
b1e4e6048a
agent: Add more Rust code examples, update TODO check (#28737)
Release Notes:

- N/A
2025-04-15 16:52:08 +00:00
Agus Zubiaga
e4cf7fe8f5
eval: Improve readability with colors and alignment (#28761)
![CleanShot 2025-04-15 at 10 35
39@2x](https://github.com/user-attachments/assets/495d96fb-fe2f-478b-a9d6-678c1184db9a)


Release Notes:

- N/A
2025-04-15 13:50:01 +00:00
Bennet Bo Fenner
e26f0a331f
agent: Make ToolWorkingSet an Entity (#28757)
Motivation is to emit events when enabled tools change, want to use this
in #28755

Release Notes:

- N/A
2025-04-15 14:42:31 +02:00
Michael Sloan
0d6e455bf6
Agent eval: output paths to log files at the end (#28724)
Release Notes:

- N/A
2025-04-14 23:04:07 +00:00
Michael Sloan
5f897b0e00
Agent Eval: Fail example when there are no events in 2 minutes (#28725)
Release Notes:

- N/A
2025-04-14 23:01:21 +00:00
Thomas Mickley-Doyle
d74f0735c2
Add more eval examples + filtering examples by language + fix git concurrent usage (#28719)
Release Notes:

- N/A

---------

Co-authored-by: michael <michael@zed.dev>
Co-authored-by: agus <agus@zed.dev>
2025-04-14 22:05:46 +00:00
Michael Sloan
c8ccc472b5
Track tool use counts (#28722)
Release Notes:

- N/A
2025-04-14 21:45:36 +00:00
Michael Sloan
6b80eb556c
Add judge to new eval + provide LSP diagnostics (#28713)
Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <antonio@zed.dev>
Co-authored-by: agus <agus@zed.dev>
2025-04-14 20:18:47 +00:00
Antonio Scandurra
2440faf4b2
Actually run the eval and fix a hang when retrieving outline (#28547)
Release Notes:

- Fixed a regression that caused the agent to hang sometimes.

---------

Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
2025-04-11 00:01:33 +00:00
Antonio Scandurra
8ac378b86e
Lay the groundwork for a Rust-based eval (#28488)
Also, we moved the logic for driving the agentic loop into `Thread` so
that we don't have to re-implement it.

Release Notes:

- N/A

---------

Co-authored-by: Nathan Sobo <nathan@zed.dev>
2025-04-10 04:45:27 +00:00