Yehowshua/ZIm - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
Michael Sloan	9a9f2e71ca	Agent Eval: Initial support for running examples repeatedly (#28844 ) Not ideal as it creates a separate worktree for each repetition Release Notes: - N/A	2025-04-16 06:35:55 +00:00
Michael Sloan	609895d95f	Agent Eval: bounded concurrency (#28843 ) Release Notes: - N/A	2025-04-16 00:05:46 -06:00
Michael Sloan	da2d8bd845	Agent Eval: Distinguish tool successes and failures in log (#28839 ) Release Notes: - N/A	2025-04-15 22:51:33 -06:00
Thomas Mickley-Doyle	222d4a2546	agent: Add telemetry for eval runs (#28816 ) Release Notes: - N/A --------- Co-authored-by: Joseph <joseph@zed.dev>	2025-04-16 02:54:26 +00:00
Michael Sloan	102ea6ac79	Add support for judge repetitions in eval (#28811 ) Release Notes: - N/A --------- Co-authored-by: Thomas <thomas@zed.dev>	2025-04-15 23:18:02 +00:00
Agus Zubiaga	0182e09e33	eval: Do not create run files for skipped examples (#28800 ) Release Notes: - N/A	2025-04-15 18:00:04 +00:00
Agus Zubiaga	ff4334efc7	eval: Fix stalling on tool confirmation (#28786 ) The `always_allow_tool_actions` setting would get overridden with the default when we loaded each example project, leading to examples stalling when they run a tool that needed confirmation. There's now a separate `runner_settings.json` file where we can configure the environment for the eval. Release Notes: - N/A --------- Co-authored-by: Oleksiy <oleksiy@zed.dev>	2025-04-15 16:53:45 +00:00
Thomas Mickley-Doyle	b1e4e6048a	agent: Add more Rust code examples, update TODO check (#28737 ) Release Notes: - N/A	2025-04-15 16:52:08 +00:00
Agus Zubiaga	e4cf7fe8f5	eval: Improve readability with colors and alignment (#28761 ) ![CleanShot 2025-04-15 at 10 35 39@2x](https://github.com/user-attachments/assets/495d96fb-fe2f-478b-a9d6-678c1184db9a) Release Notes: - N/A	2025-04-15 13:50:01 +00:00
Bennet Bo Fenner	e26f0a331f	agent: Make `ToolWorkingSet` an `Entity` (#28757 ) Motivation is to emit events when enabled tools change, want to use this in #28755 Release Notes: - N/A	2025-04-15 14:42:31 +02:00
Michael Sloan	0d6e455bf6	Agent eval: output paths to log files at the end (#28724 ) Release Notes: - N/A	2025-04-14 23:04:07 +00:00
Michael Sloan	5f897b0e00	Agent Eval: Fail example when there are no events in 2 minutes (#28725 ) Release Notes: - N/A	2025-04-14 23:01:21 +00:00
Thomas Mickley-Doyle	d74f0735c2	Add more eval examples + filtering examples by language + fix git concurrent usage (#28719 ) Release Notes: - N/A --------- Co-authored-by: michael <michael@zed.dev> Co-authored-by: agus <agus@zed.dev>	2025-04-14 22:05:46 +00:00
Michael Sloan	c8ccc472b5	Track tool use counts (#28722 ) Release Notes: - N/A	2025-04-14 21:45:36 +00:00
Michael Sloan	6b80eb556c	Add judge to new eval + provide LSP diagnostics (#28713 ) Release Notes: - N/A --------- Co-authored-by: Antonio Scandurra <antonio@zed.dev> Co-authored-by: agus <agus@zed.dev>	2025-04-14 20:18:47 +00:00
Antonio Scandurra	2440faf4b2	Actually run the eval and fix a hang when retrieving outline (#28547 ) Release Notes: - Fixed a regression that caused the agent to hang sometimes. --------- Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com> Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Michael Sloan <mgsloan@gmail.com>	2025-04-11 00:01:33 +00:00
Antonio Scandurra	8ac378b86e	Lay the groundwork for a Rust-based eval (#28488 ) Also, we moved the logic for driving the agentic loop into `Thread` so that we don't have to re-implement it. Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev>	2025-04-10 04:45:27 +00:00

1 2 3

117 commits