Antonio Scandurra
4d51602e7b
Encourage editing over re-creating a file from scratch ( #29870 )
...
I also introduced a new eval to prove the encouragement actually makes a
difference.
Release Notes:
- Improved agent behavior when streaming edits, encouraging it to
editing files as opposed to creating them from scratch
2025-05-04 13:18:28 +00:00
Richard Feldman
9efc09c5a6
Add eval for open_tool ( #29801 )
...
Also have its description say it should only be used on request
Release Notes:
- N/A
2025-05-02 15:56:07 +00:00
Richard Feldman
d7004030b3
Code block evals ( #29619 )
...
Add a targeted eval for code block formatting, and revise the system
prompt accordingly.
### Eval before, n=8
<img width="728" alt="eval before"
src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2 "
/>
### Eval after prompt change, n=8 (excluding the new evals, so just
testing the prompt change)
<img width="717" alt="eval after"
src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e "
/>
Release Notes:
- N/A
2025-04-29 18:52:09 -04:00
Agus Zubiaga
45d3f5168a
eval: New add_arg_to_trait_method
example ( #29297 )
...
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
2025-04-23 18:46:39 +00:00
Agus Zubiaga
ce1a674eba
eval: Fine-grained assertions ( #29246 )
...
- Support programmatic examples
([example](17feb260a0/crates/eval/src/examples/file_search.rs
))
- Combine data-driven example declarations into a single `.toml` file
([example](17feb260a0/crates/eval/src/examples/find_and_replace_diff_card.toml
))
- Run judge on individual assertions (previously called "criteria")
- Report judge and programmatic assertions in one combined table
Note: We still need to work on concept naming
<img width=400
src="https://github.com/user-attachments/assets/fc719c93-467f-412b-8d47-68821bd8a5f5 ">
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
2025-04-22 23:58:58 -03:00