
- Support programmatic examples ([example](17feb260a0/crates/eval/src/examples/file_search.rs
)) - Combine data-driven example declarations into a single `.toml` file ([example](17feb260a0/crates/eval/src/examples/find_and_replace_diff_card.toml
)) - Run judge on individual assertions (previously called "criteria") - Report judge and programmatic assertions in one combined table Note: We still need to work on concept naming <img width=400 src="https://github.com/user-attachments/assets/fc719c93-467f-412b-8d47-68821bd8a5f5"> Release Notes: - N/A --------- Co-authored-by: Richard Feldman <oss@rtfeldman.com> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Thomas Mickley-Doyle <tmickleydoyle@gmail.com>
25 lines
558 B
Handlebars
25 lines
558 B
Handlebars
You are an expert software developer. Your task is to evaluate a diff produced by an AI agent
|
|
in response to a prompt. Here is the prompt and the diff:
|
|
|
|
<prompt>
|
|
{{{prompt}}}
|
|
</prompt>
|
|
|
|
<diff>
|
|
{{{repository_diff}}}
|
|
</diff>
|
|
|
|
Evaluate whether or not the diff passes the following assertion:
|
|
|
|
<assertion>
|
|
{{assertion}}
|
|
</assertion>
|
|
|
|
Analyze the diff hunk by hunk, and structure your answer in the following XML format:
|
|
|
|
```
|
|
<analysis>{YOUR ANALYSIS HERE}</analysis>
|
|
<passed>{PASSED_ASSERTION}</passed>
|
|
```
|
|
|
|
Where `PASSED_ASSERTION` is either `true` or `false`.
|