ZIm/crates/eval/src/judge_diff_prompt.hbs
Agus Zubiaga ce1a674eba
eval: Fine-grained assertions (#29246)
- Support programmatic examples
([example](17feb260a0/crates/eval/src/examples/file_search.rs))
- Combine data-driven example declarations into a single `.toml` file
([example](17feb260a0/crates/eval/src/examples/find_and_replace_diff_card.toml))
- Run judge on individual assertions (previously called "criteria")
- Report judge and programmatic assertions in one combined table

Note: We still need to work on concept naming 

<img width=400
src="https://github.com/user-attachments/assets/fc719c93-467f-412b-8d47-68821bd8a5f5">

Release Notes:

- N/A

---------

Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
2025-04-22 23:58:58 -03:00

25 lines
558 B
Handlebars

You are an expert software developer. Your task is to evaluate a diff produced by an AI agent
in response to a prompt. Here is the prompt and the diff:
<prompt>
{{{prompt}}}
</prompt>
<diff>
{{{repository_diff}}}
</diff>
Evaluate whether or not the diff passes the following assertion:
<assertion>
{{assertion}}
</assertion>
Analyze the diff hunk by hunk, and structure your answer in the following XML format:
```
<analysis>{YOUR ANALYSIS HERE}</analysis>
<passed>{PASSED_ASSERTION}</passed>
```
Where `PASSED_ASSERTION` is either `true` or `false`.