Michael Sloan
609895d95f
Agent Eval: bounded concurrency ( #28843 )
...
Release Notes:
- N/A
2025-04-16 00:05:46 -06:00
Michael Sloan
da2d8bd845
Agent Eval: Distinguish tool successes and failures in log ( #28839 )
...
Release Notes:
- N/A
2025-04-15 22:51:33 -06:00
Thomas Mickley-Doyle
222d4a2546
agent: Add telemetry for eval runs ( #28816 )
...
Release Notes:
- N/A
---------
Co-authored-by: Joseph <joseph@zed.dev>
2025-04-16 02:54:26 +00:00
Michael Sloan
102ea6ac79
Add support for judge repetitions in eval ( #28811 )
...
Release Notes:
- N/A
---------
Co-authored-by: Thomas <thomas@zed.dev>
2025-04-15 23:18:02 +00:00
Agus Zubiaga
0182e09e33
eval: Do not create run files for skipped examples ( #28800 )
...
Release Notes:
- N/A
2025-04-15 18:00:04 +00:00
Agus Zubiaga
ff4334efc7
eval: Fix stalling on tool confirmation ( #28786 )
...
The `always_allow_tool_actions` setting would get overridden with the
default when we loaded each example project, leading to examples
stalling when they run a tool that needed confirmation. There's now a
separate `runner_settings.json` file where we can configure the
environment for the eval.
Release Notes:
- N/A
---------
Co-authored-by: Oleksiy <oleksiy@zed.dev>
2025-04-15 16:53:45 +00:00
Thomas Mickley-Doyle
b1e4e6048a
agent: Add more Rust code examples, update TODO check ( #28737 )
...
Release Notes:
- N/A
2025-04-15 16:52:08 +00:00
Agus Zubiaga
e4cf7fe8f5
eval: Improve readability with colors and alignment ( #28761 )
...

Release Notes:
- N/A
2025-04-15 13:50:01 +00:00
Bennet Bo Fenner
e26f0a331f
agent: Make ToolWorkingSet
an Entity
( #28757 )
...
Motivation is to emit events when enabled tools change, want to use this
in #28755
Release Notes:
- N/A
2025-04-15 14:42:31 +02:00
Michael Sloan
0d6e455bf6
Agent eval: output paths to log files at the end ( #28724 )
...
Release Notes:
- N/A
2025-04-14 23:04:07 +00:00
Michael Sloan
5f897b0e00
Agent Eval: Fail example when there are no events in 2 minutes ( #28725 )
...
Release Notes:
- N/A
2025-04-14 23:01:21 +00:00
Thomas Mickley-Doyle
d74f0735c2
Add more eval examples + filtering examples by language + fix git concurrent usage ( #28719 )
...
Release Notes:
- N/A
---------
Co-authored-by: michael <michael@zed.dev>
Co-authored-by: agus <agus@zed.dev>
2025-04-14 22:05:46 +00:00
Michael Sloan
c8ccc472b5
Track tool use counts ( #28722 )
...
Release Notes:
- N/A
2025-04-14 21:45:36 +00:00
Michael Sloan
6b80eb556c
Add judge to new eval + provide LSP diagnostics ( #28713 )
...
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <antonio@zed.dev>
Co-authored-by: agus <agus@zed.dev>
2025-04-14 20:18:47 +00:00
Antonio Scandurra
2440faf4b2
Actually run the eval and fix a hang when retrieving outline ( #28547 )
...
Release Notes:
- Fixed a regression that caused the agent to hang sometimes.
---------
Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
2025-04-11 00:01:33 +00:00
Antonio Scandurra
8ac378b86e
Lay the groundwork for a Rust-based eval ( #28488 )
...
Also, we moved the logic for driving the agentic loop into `Thread` so
that we don't have to re-implement it.
Release Notes:
- N/A
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
2025-04-10 04:45:27 +00:00