ZIm/crates/eval/src
Cole Miller c12e6376b8
Terminal tool improvements (#29924)
WIP

- On macOS/Linux, run the command in bash instead of the user's shell
- Try to prevent the agent from running commands that expect interaction

Release Notes:

- Agent Beta: Switched to using `bash` (if available) instead of the
user's shell when calling the terminal tool.
- Agent Beta: Prevented the agent from hanging when trying to run
interactive commands.

---------

Co-authored-by: WeetHet <stas.ale66@gmail.com>
2025-05-05 15:57:03 -04:00
..
examples Terminal tool improvements (#29924) 2025-05-05 15:57:03 -04:00
assertions.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
eval.rs context_store: Refactor state management (#29910) 2025-05-05 21:36:12 +02:00
example.rs agent: Handle attempts to use hallucinated tools (#29946) 2025-05-05 19:31:11 +00:00
explorer.html eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
explorer.rs eval: Add HTML overview for evaluation runs (#29413) 2025-04-25 17:49:05 +03:00
ids.rs Add new action to run agent eval (#29158) 2025-04-21 21:30:21 -07:00
instance.rs agent: Handle attempts to use hallucinated tools (#29946) 2025-05-05 19:31:11 +00:00
judge_diff_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
judge_thread_prompt.hbs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00
tool_metrics.rs eval: Fine-grained assertions (#29246) 2025-04-22 23:58:58 -03:00