ZIm/crates
Oleksiy Syvokon 5e5a124ae1
evals: Eval for creating an empty file (#31034)
This eval checks that Edit Agent can create an empty file without
writing its thoughts into it. This issue is not specific to empty files,
but it's easier to reproduce with them.

For some mysterious reason, I could easily reproduce this issue roughly
90% of the time in actual Zed. However, once I extract the exact LLM
request before the failure point and generate from that, the
reproduction rate drops to 2%!

Things I've tried to make sure it's not a fluke: disabling prompt
caching, capturing the LLM request via a proxy server, running the
prompt on Claude separately from evals. Every time it was mostly giving
good outcomes, which doesn't match my actual experience in Zed.

At some point I discovered that simply adding one insignificant space or
a newline to the prompt suddenly results in an outcome I tried to
reproduce almost perfectly.

This weirdness happens even outside the Zed code base and even when
using a different subscription. The result is the same: an extra newline
or space changes the model behavior significantly enough, so that the
pass rate drops from 99% to 0-3%

I have no explanation to this.


Release Notes:

- N/A
2025-05-20 20:03:08 +03:00
..
activity_indicator Allow updater to check for updates after downloading one (#30969) 2025-05-19 18:27:39 +00:00
agent Add end of service notifications (#30982) 2025-05-20 00:20:00 +00:00
anthropic Add image input support for OpenAI models (#30639) 2025-05-13 17:32:42 +02:00
askpass askpass: Workaround rust lang 69343 (#30774) 2025-05-16 05:04:36 -04:00
assets Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
assistant_context_editor Reduce allocations (#30693) 2025-05-14 18:29:28 +02:00
assistant_settings language_models: Add tool use support for Mistral models (#29994) 2025-05-19 18:36:59 +02:00
assistant_slash_command Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
assistant_slash_commands assistant_slash_commands: Be more precise in content type matching (#29124) 2025-05-06 04:38:03 +00:00
assistant_tool Fix rejecting overwritten files if the agent previously edited them (#30744) 2025-05-15 09:47:54 +00:00
assistant_tools evals: Eval for creating an empty file (#31034) 2025-05-20 20:03:08 +03:00
audio Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
auto_update Allow updater to check for updates after downloading one (#30969) 2025-05-19 18:27:39 +00:00
auto_update_helper Update block diagnostics (#28006) 2025-04-15 09:35:13 -06:00
auto_update_ui Restyle notification close control (#30262) 2025-05-08 14:10:30 +00:00
aws_http_client Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
bedrock bedrock: Fix Claude 3.5 Haiku support (#30560) 2025-05-12 12:45:35 +00:00
breadcrumbs breadcrumbs: Update multibuffer to match singleton (#28267) 2025-04-07 20:26:55 +00:00
buffer_diff Fix diff recalculation hang (#28377) 2025-04-10 22:58:41 +00:00
call Deny unknown keys in settings in JSON schema so user gets warnings but settings still parses (#30583) 2025-05-12 17:48:36 -04:00
channel Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
cli Try to weak-link ScreenCaptureKit always (#28585) 2025-04-11 17:38:14 +00:00
client Revert "client: Add support for HTTP/HTTPS proxy" (#30979) 2025-05-19 20:19:40 -04:00
clock Add the ability to follow the agent as it makes edits (#29839) 2025-05-04 08:28:39 +00:00
collab debugger: Surface validity of breakpoints (#30380) 2025-05-20 15:56:15 +00:00
collab_ui gpui: Add a standard text example (#30747) 2025-05-16 17:35:44 +02:00
collections Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
command_palette Simplify the SerializableItem::cleanup implementation (#29567) 2025-04-28 22:15:24 +00:00
command_palette_hooks Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
component component: Replace linkme with inventory (#30705) 2025-05-14 23:29:11 +02:00
context_server context_store: Refactor state management (#29910) 2025-05-05 21:36:12 +02:00
copilot Add image input support for OpenAI models (#30639) 2025-05-13 17:32:42 +02:00
credentials_provider Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
dap extension/dap: Add resolve_tcp_template function (#31010) 2025-05-20 15:17:13 +02:00
dap_adapters extension/dap: Add resolve_tcp_template function (#31010) 2025-05-20 15:17:13 +02:00
db Add end of service notifications (#30982) 2025-05-20 00:20:00 +00:00
debug_adapter_extension extension: Add debug_adapters to extension manifest (#30676) 2025-05-20 11:01:33 +02:00
debugger_tools Rename debug: commands to dev: (#30675) 2025-05-14 11:15:27 +02:00
debugger_ui debugger: Surface validity of breakpoints (#30380) 2025-05-20 15:56:15 +00:00
deepseek Default to fast model for thread summaries and titles + don't include system prompt / context / thinking segments (#29102) 2025-04-19 23:26:29 +00:00
diagnostics component: Replace linkme with inventory (#30705) 2025-05-14 23:29:11 +02:00
docs_preprocessor Simplify docs preprocessing (#30947) 2025-05-19 08:16:14 -04:00
editor debugger: Surface validity of breakpoints (#30380) 2025-05-20 15:56:15 +00:00
eval extension: Add debug_adapters to extension manifest (#30676) 2025-05-20 11:01:33 +02:00
extension extension: Add debug_adapters to extension manifest (#30676) 2025-05-20 11:01:33 +02:00
extension_api extension/dap: Add resolve_tcp_template function (#31010) 2025-05-20 15:17:13 +02:00
extension_cli Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
extension_host extension/dap: Add resolve_tcp_template function (#31010) 2025-05-20 15:17:13 +02:00
extensions_ui Improve error message around failing to install dev extensions (#30711) 2025-05-14 17:22:17 +00:00
feature_flags Add a picker for jj bookmark list (#30883) 2025-05-17 16:42:45 +00:00
feedback feedback: Update issue template URL (#28790) 2025-04-15 21:36:30 -04:00
file_finder Reduce allocations (#30693) 2025-05-14 18:29:28 +02:00
file_icons Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
fs windows: Remove extra empty line when loading default settings (#30344) 2025-05-09 19:00:16 +08:00
fsevent Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
fuzzy Fix out-of-bounds panic in fuzzy matcher with Unicode/multibyte characters (#30546) 2025-05-12 14:43:14 +00:00
git git: Don't filter local upstreams from branch picker (#30557) 2025-05-19 13:41:58 +00:00
git_hosting_providers VSCode Settings import (#29018) 2025-04-23 20:54:09 +00:00
git_ui git: Save buffer when resolving a conflict from the project diff (#30762) 2025-05-19 17:32:31 +00:00
go_to_line editor: Add minimap (#26893) 2025-05-07 23:11:09 +03:00
google_ai Add support for getting the token count for all parts of Gemini generation requests (#29630) 2025-05-04 21:32:45 +00:00
gpui Revert "gpui: Fix shape_text split to support \r\n" (#31031) 2025-05-20 16:01:47 +00:00
gpui_macros ui_macros: Remove DerivePathStr macro (#30862) 2025-05-17 10:05:55 +00:00
gpui_tokio Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
html_to_markdown Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
http_client Remove individual URL overrides for LLM service (#30290) 2025-05-08 17:54:46 +00:00
http_client_tls Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
icons agent: Fix layout shift due to the "Generating" label (#30422) 2025-05-09 16:20:14 -03:00
image_viewer Restore the ability to drag and drop images into the editor (#31009) 2025-05-20 12:38:24 +00:00
indexed_docs indexed_docs: Remove some unnecessary cloning (#30236) 2025-05-08 10:59:56 +00:00
inline_completion Add end of service notifications (#30982) 2025-05-20 00:20:00 +00:00
inline_completion_button Add end of service notifications (#30982) 2025-05-20 00:20:00 +00:00
install_cli install_cli: Show feedback when installing CLI from welcome screen (#28532) 2025-04-11 01:47:40 +05:30
jj Add a picker for jj bookmark list (#30883) 2025-05-17 16:42:45 +00:00
jj_ui Add a picker for jj bookmark list (#30883) 2025-05-17 16:42:45 +00:00
journal VSCode Settings import (#29018) 2025-04-23 20:54:09 +00:00
language extension: Add debug_adapters to extension manifest (#30676) 2025-05-20 11:01:33 +02:00
language_extension debugger/extensions: Revert changes to extension store related to language config (#30225) 2025-05-08 14:01:39 +02:00
language_model evals: Make LLMs configurable in edit_agent evals (#30813) 2025-05-16 11:10:15 +00:00
language_model_selector agent: Don't duplicate recommended models in all models list (#30692) 2025-05-14 13:21:41 +00:00
language_models language_models: Add tool use support for Mistral models (#29994) 2025-05-19 18:36:59 +02:00
language_selector Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
language_tools Rename debug: commands to dev: (#30675) 2025-05-14 11:15:27 +02:00
languages editor: Add python indentation tests (#30902) 2025-05-18 07:29:25 +05:30
livekit_api Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
livekit_client Fix deafening new participants (#28330) 2025-04-08 16:01:27 +00:00
lmstudio lmstudio: Fix streaming not working in v0.3.15 (#30013) 2025-05-06 12:59:36 -04:00
lsp Remove the minimap from the debugger console (#30610) 2025-05-13 08:09:38 +00:00
markdown markdown: Fix out of range panic in parser (#30510) 2025-05-11 15:08:37 +00:00
markdown_preview Use image cache to stop leaking images (#29452) 2025-04-29 19:30:16 +00:00
media chore: Make objc a workspace level crate (#28258) 2025-04-07 18:46:09 +00:00
menu agent: Add new panel navigation dropdown (#29539) 2025-04-29 21:58:45 -03:00
migrator settings: Migration for fixing duplicated agent keys (#30237) 2025-05-08 12:38:19 +00:00
mistral language_models: Add tool use support for Mistral models (#29994) 2025-05-19 18:36:59 +02:00
multi_buffer editor: Trim indent guides at last non-empty line (#29482) 2025-05-12 17:04:46 +02:00
node_runtime Wait to locate system-installed Node until the shell environment is loaded (#30416) 2025-05-09 19:24:28 +00:00
notifications component: Replace linkme with inventory (#30705) 2025-05-14 23:29:11 +02:00
ollama Improve Ollama tool use (#30120) 2025-05-07 15:37:06 +00:00
open_ai Add image input support for OpenAI models (#30639) 2025-05-13 17:32:42 +02:00
outline Highlight merge conflicts and provide for resolving them (#28065) 2025-04-23 12:38:46 -04:00
outline_panel chore: Bump Rust to 1.87 (#30739) 2025-05-15 22:28:52 +00:00
panel Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
paths Add a way to import ssh host names from the ssh config (#30926) 2025-05-18 20:34:47 +00:00
picker agent: Keyboard navigation improvements (#30274) 2025-05-09 13:52:06 +00:00
prettier Separate timeout and connection dropped errors out (#30457) 2025-05-10 15:12:58 +03:00
project debugger: Surface validity of breakpoints (#30380) 2025-05-20 15:56:15 +00:00
project_panel zed: Fix no way to open local folder from remote window (#30954) 2025-05-19 21:26:30 +05:30
project_symbols Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
prompt_store assistant_context_editor: Remove suggest edits (#30286) 2025-05-08 17:27:49 +00:00
proto debugger: Surface validity of breakpoints (#30380) 2025-05-20 15:56:15 +00:00
recent_projects zed: Fix no way to open local folder from remote window (#30954) 2025-05-19 21:26:30 +05:30
refineable Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
release_channel Fix handling of --system-specs argument so it happens before Application::new (#29240) 2025-04-22 21:32:32 +00:00
remote chore: Bump Rust to 1.87 (#30739) 2025-05-15 22:28:52 +00:00
remote_server extension: Add debug_adapters to extension manifest (#30676) 2025-05-20 11:01:33 +02:00
repl Remove unnecessary result in line shaping (#30721) 2025-05-16 23:48:36 +02:00
reqwest_client Fix license symlinks (#29758) 2025-05-01 19:24:14 +00:00
rich_text Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
rope extension_host: Turn on parallel compilation (#30942) 2025-05-19 18:06:33 +02:00
rpc chore: Bump Rust to 1.87 (#30739) 2025-05-15 22:28:52 +00:00
rules_library Reuse conversation cache when streaming edits (#30245) 2025-05-08 14:36:34 +02:00
schema_generator Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
search Project Search: Don't prompt to save edited buffers in project search results if buffers open elsewhere (#31026) 2025-05-20 15:34:42 +00:00
semantic_index Reuse conversation cache when streaming edits (#30245) 2025-05-08 14:36:34 +02:00
semantic_version Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
session Avoid unnecessary DB writes (#29417) 2025-04-25 17:41:49 +03:00
settings Rename debug: commands to dev: (#30675) 2025-05-14 11:15:27 +02:00
settings_ui Add searchable global tab switcher (#28047) 2025-04-28 09:21:27 +00:00
snippet Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
snippet_provider editor: Improve snippet completion to show key inline in completion and description as aside (#30603) 2025-05-13 05:28:59 +05:30
snippets_ui Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
sqlez Simplify the SerializableItem::cleanup implementation (#29567) 2025-04-28 22:15:24 +00:00
sqlez_macros Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
story gpui: Add a standard text example (#30747) 2025-05-16 17:35:44 +02:00
storybook gpui: Add a standard text example (#30747) 2025-05-16 17:35:44 +02:00
streaming_diff Introduce a new StreamingEditFileTool (#29733) 2025-05-01 17:37:43 +02:00
sum_tree Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
supermaven Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
supermaven_api Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
tab_switcher Add searchable global tab switcher (#28047) 2025-04-28 09:21:27 +00:00
task chore: Bump Rust to 1.87 (#30739) 2025-05-15 22:28:52 +00:00
tasks_ui debugger: Add debug task picker to new session modal (#29702) 2025-05-02 08:38:29 +00:00
telemetry Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
telemetry_events Add new action to run agent eval (#29158) 2025-04-21 21:30:21 -07:00
terminal Reduce allocations (#30693) 2025-05-14 18:29:28 +02:00
terminal_view Remove unnecessary result in line shaping (#30721) 2025-05-16 23:48:36 +02:00
text format: Re-implement support for formatting with code actions that contain commands (#28392) 2025-04-09 01:53:54 +00:00
theme Fix prevent zero value for buffer line height (#30832) 2025-05-19 00:55:35 +00:00
theme_extension Add workspace-hack (#27277) 2025-04-02 13:26:34 -07:00
theme_importer theme: Add scrollbar_thumb_active_background color (#30177) 2025-05-07 23:15:32 +03:00
theme_selector theme_selector: Don't select last theme when fuzzy searching (#28278) 2025-04-28 14:29:17 +00:00
time_format agent: Add date separators to Thread History (#29961) 2025-05-06 10:18:48 +00:00
title_bar title_bar: Fix config merging to respect priority (#30980) 2025-05-20 07:56:24 +00:00
toolchain_selector toolchain: Respect currently focused file when querying toolchains (#28875) 2025-04-16 19:05:57 +02:00
ui Add end of service notifications (#30982) 2025-05-20 00:20:00 +00:00
ui_input component: Replace linkme with inventory (#30705) 2025-05-14 23:29:11 +02:00
ui_macros ui_macros: Remove DerivePathStr macro (#30862) 2025-05-17 10:05:55 +00:00
ui_prompt markdown: Don't retain MarkdownStyle in favor of using MarkdownElement directly (#28255) 2025-04-07 19:03:24 +00:00
util Separate timeout and connection dropped errors out (#30457) 2025-05-10 15:12:58 +03:00
util_macros Fix license symlinks (#29758) 2025-05-01 19:24:14 +00:00
vim vim: Add g M motion to go to the middle of a line (#30227) 2025-05-16 21:21:30 +00:00
vim_mode_setting VSCode Settings import (#29018) 2025-04-23 20:54:09 +00:00
web_search agent: Expose web search tool to beta users (#29273) 2025-04-23 15:30:20 +00:00
web_search_providers Remove individual URL overrides for LLM service (#30290) 2025-05-08 17:54:46 +00:00
welcome component: Replace linkme with inventory (#30705) 2025-05-14 23:29:11 +02:00
workspace debugger: Surface validity of breakpoints (#30380) 2025-05-20 15:56:15 +00:00
worktree Restore the ability to drag and drop images into the editor (#31009) 2025-05-20 12:38:24 +00:00
zed extension: Add debug_adapters to extension manifest (#30676) 2025-05-20 11:01:33 +02:00
zed_actions zed: Fix no way to open local folder from remote window (#30954) 2025-05-19 21:26:30 +05:30
zeta Add end of service notifications (#30982) 2025-05-20 00:20:00 +00:00
zlog zlog: Fall back to printing module path instead of *unknown* or just crate name (#29691) 2025-05-01 10:59:51 -04:00
zlog_settings VSCode Settings import (#29018) 2025-04-23 20:54:09 +00:00