ZIm/crates/assistant_tools at 5e5a124ae14b5271f10c8656eabc4f3fbc7f24bd - Yehowshua/ZIm

History

Oleksiy Syvokon 5e5a124ae1 evals: Eval for creating an empty file (#31034 ) This eval checks that Edit Agent can create an empty file without writing its thoughts into it. This issue is not specific to empty files, but it's easier to reproduce with them. For some mysterious reason, I could easily reproduce this issue roughly 90% of the time in actual Zed. However, once I extract the exact LLM request before the failure point and generate from that, the reproduction rate drops to 2%! Things I've tried to make sure it's not a fluke: disabling prompt caching, capturing the LLM request via a proxy server, running the prompt on Claude separately from evals. Every time it was mostly giving good outcomes, which doesn't match my actual experience in Zed. At some point I discovered that simply adding one insignificant space or a newline to the prompt suddenly results in an outcome I tried to reproduce almost perfectly. This weirdness happens even outside the Zed code base and even when using a different subscription. The result is the same: an extra newline or space changes the model behavior significantly enough, so that the pass rate drops from 99% to 0-3% I have no explanation to this. Release Notes: - N/A		2025-05-20 20:03:08 +03:00
..
src	evals: Eval for creating an empty file (#31034 )	2025-05-20 20:03:08 +03:00
Cargo.toml	component: Replace `linkme` with `inventory` (#30705 )	2025-05-14 23:29:11 +02:00
LICENSE-GPL	Factor tool definitions out of `assistant` (#21189 )	2024-11-25 18:26:34 -05:00