Yehowshua/ZIm - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
Michael Sloan	70c51b513b	agent eval: Default to also running typescript examples (#29185 ) Release Notes: - N/A	2025-04-21 23:59:35 +00:00
Antonio Scandurra	97ab0980d1	Start tracking tool failure rates in eval (#29122 ) This pull request will print all the used tools and their failure rates. The objective goal should be to minimize that failure rate. @tmickleydoyle: this also changes the telemetry event to report `tool_metrics` as opposed to `tool_use_counts`. Ideally I'd love to be able to plot failure rates by tool and hopefully see that percentage go down. Can we do that with the data we're tracking with this pull request? Release Notes: - N/A	2025-04-21 16:16:43 +02:00
Michael Sloan	d88b06a5dc	Simplify language model registry + only emit change events on change (#29086 ) * Now only does default fallback logic in the registry * Only emits change events when there is actually a change Release Notes: - N/A	2025-04-19 08:26:42 +00:00
Nathan Sobo	bab28560ef	Systematically optimize agentic editing performance (#28961 ) Now that we've established a proper eval in tree, this PR is reboots of our agent loop back to a set of minimal tools and simpler prompts. We should aim to get this branch feeling subjectively competitive with what's on main and then merge it, and build from there. Let's invest in our eval and use it to drive better performance of the agent loop. How you can help: Pick an example, and then make the outcome faster or better. It's fine to even use your own subjective judgment, as our evaluation criteria likely need tuning as well at this point. Focus on making the agent work better in your own subjective experience first. Let's focus on simple/practical improvements to make this thing work better, then determine how we can craft our judgment criteria to lock those improvements in. Release Notes: - N/A --------- Co-authored-by: Max <max@zed.dev> Co-authored-by: Antonio <antonio@zed.dev> Co-authored-by: Agus <agus@zed.dev> Co-authored-by: Richard <richard@zed.dev> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Michael Sloan <mgsloan@gmail.com>	2025-04-19 02:47:59 +00:00
Michael Sloan	327fee4d22	Init prompt store in agent eval (#29068 ) Needed after #28915 Release Notes: - N/A	2025-04-18 20:06:34 +00:00
Thomas Mickley-Doyle	8de53bd89f	agent: Add git commit ID to the eval telemetry data (#28895 ) Release Notes: - N/A	2025-04-16 14:13:43 -05:00
Michael Sloan	9a9f2e71ca	Agent Eval: Initial support for running examples repeatedly (#28844 ) Not ideal as it creates a separate worktree for each repetition Release Notes: - N/A	2025-04-16 06:35:55 +00:00
Michael Sloan	609895d95f	Agent Eval: bounded concurrency (#28843 ) Release Notes: - N/A	2025-04-16 00:05:46 -06:00
Thomas Mickley-Doyle	222d4a2546	agent: Add telemetry for eval runs (#28816 ) Release Notes: - N/A --------- Co-authored-by: Joseph <joseph@zed.dev>	2025-04-16 02:54:26 +00:00
Michael Sloan	102ea6ac79	Add support for judge repetitions in eval (#28811 ) Release Notes: - N/A --------- Co-authored-by: Thomas <thomas@zed.dev>	2025-04-15 23:18:02 +00:00
Agus Zubiaga	0182e09e33	eval: Do not create run files for skipped examples (#28800 ) Release Notes: - N/A	2025-04-15 18:00:04 +00:00
Agus Zubiaga	ff4334efc7	eval: Fix stalling on tool confirmation (#28786 ) The `always_allow_tool_actions` setting would get overridden with the default when we loaded each example project, leading to examples stalling when they run a tool that needed confirmation. There's now a separate `runner_settings.json` file where we can configure the environment for the eval. Release Notes: - N/A --------- Co-authored-by: Oleksiy <oleksiy@zed.dev>	2025-04-15 16:53:45 +00:00
Agus Zubiaga	e4cf7fe8f5	eval: Improve readability with colors and alignment (#28761 ) ![CleanShot 2025-04-15 at 10 35 39@2x](https://github.com/user-attachments/assets/495d96fb-fe2f-478b-a9d6-678c1184db9a) Release Notes: - N/A	2025-04-15 13:50:01 +00:00
Michael Sloan	0d6e455bf6	Agent eval: output paths to log files at the end (#28724 ) Release Notes: - N/A	2025-04-14 23:04:07 +00:00
Thomas Mickley-Doyle	d74f0735c2	Add more eval examples + filtering examples by language + fix git concurrent usage (#28719 ) Release Notes: - N/A --------- Co-authored-by: michael <michael@zed.dev> Co-authored-by: agus <agus@zed.dev>	2025-04-14 22:05:46 +00:00
Michael Sloan	6b80eb556c	Add judge to new eval + provide LSP diagnostics (#28713 ) Release Notes: - N/A --------- Co-authored-by: Antonio Scandurra <antonio@zed.dev> Co-authored-by: agus <agus@zed.dev>	2025-04-14 20:18:47 +00:00
Antonio Scandurra	2440faf4b2	Actually run the eval and fix a hang when retrieving outline (#28547 ) Release Notes: - Fixed a regression that caused the agent to hang sometimes. --------- Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com> Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Michael Sloan <mgsloan@gmail.com>	2025-04-11 00:01:33 +00:00
Antonio Scandurra	8ac378b86e	Lay the groundwork for a Rust-based eval (#28488 ) Also, we moved the logic for driving the agentic loop into `Thread` so that we don't have to re-implement it. Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev>	2025-04-10 04:45:27 +00:00

18 commits