Re-add code block formatting instructions (#29574)
Re-enabled instructions about code block formatting. In practice, the model doesn't seem to use these very often, but there's no negative effect on evals. In a future PR, I'll experiment with adding more evals around the model actually using the code blocks. 2 runs before: (`--repetitions=8`) ``` ================================================================= AGGREGATE ================================================================= 4 examples failed to run! Average programmatic score: 37% Average diff score: 66% Average thread score: 93% ----------------------------------------------------------------- CUMULATIVE TOOL METRICS ----------------------------------------------------------------- ┌──────────────────────────────┬──────────┬──────────┬──────────┐ │ Tool │ Uses │ Failures │ Rate │ ├──────────────────────────────┼──────────┼──────────┼──────────┤ │edit_file │ 398 │ 53 │ 13% │ │terminal │ 11 │ 1 │ 9% │ │create_file │ 40 │ 2 │ 5% │ │read_file │ 245 │ 8 │ 3% │ │find_path │ 48 │ 0 │ 0% │ │list_directory │ 13 │ 0 │ 0% │ │grep │ 133 │ 0 │ 0% │ │thinking │ 18 │ 0 │ 0% │ │diagnostics │ 130 │ 0 │ 0% │ ``` ``` ================================================================= AGGREGATE ================================================================= 1 examples failed to run! Average programmatic score: 41% Average diff score: 68% Average thread score: 96% ----------------------------------------------------------------- CUMULATIVE TOOL METRICS ----------------------------------------------------------------- ┌──────────────────────────────┬──────────┬──────────┬──────────┐ │ Tool │ Uses │ Failures │ Rate │ ├──────────────────────────────┼──────────┼──────────┼──────────┤ │fetch │ 1 │ 1 │ 100% │ │edit_file │ 553 │ 63 │ 11% │ │read_file │ 349 │ 3 │ 1% │ │diagnostics │ 158 │ 0 │ 0% │ │find_path │ 70 │ 0 │ 0% │ │list_directory │ 10 │ 0 │ 0% │ │thinking │ 45 │ 0 │ 0% │ │grep │ 213 │ 0 │ 0% │ │create_file │ 24 │ 0 │ 0% │ │terminal │ 17 │ 0 │ 0% │ └──────────────────────────────┴──────────┴──────────┴──────────┘ ``` 1 run after this change: ``` ================================================================= AGGREGATE ================================================================= Average programmatic score: 42% Average diff score: 74% Average thread score: 100% ----------------------------------------------------------------- CUMULATIVE TOOL METRICS ----------------------------------------------------------------- ┌──────────────────────────────┬──────────┬──────────┬──────────┐ │ Tool │ Uses │ Failures │ Rate │ ├──────────────────────────────┼──────────┼──────────┼──────────┤ │edit_file │ 534 │ 92 │ 17% │ │read_file │ 325 │ 6 │ 2% │ │list_directory │ 6 │ 0 │ 0% │ │thinking │ 12 │ 0 │ 0% │ │create_file │ 16 │ 0 │ 0% │ │diagnostics │ 49 │ 0 │ 0% │ │grep │ 234 │ 0 │ 0% │ │find_path │ 65 │ 0 │ 0% │ │terminal │ 38 │ 0 │ 0% │ └──────────────────────────────┴──────────┴──────────┴──────────┘ ``` Release Notes: - N/A
This commit is contained in:
parent
4812c9094b
commit
2b431d3e9d
1 changed files with 14 additions and 0 deletions
|
@ -36,6 +36,20 @@ If appropriate, use tool calls to explore the current project, which contains th
|
|||
- The user might specify a partial file path. If you don't know the full path, use `find_path` (not `grep`) before you read the file.
|
||||
{{/if}}
|
||||
|
||||
## Code Block Formatting
|
||||
|
||||
Whenever you mention a code block, you MUST use ONLY use the following format when the code in the block comes from a file
|
||||
in the project:
|
||||
|
||||
```path/to/Something.blah#L123-456
|
||||
(code goes here)
|
||||
```
|
||||
|
||||
The `#L123-456` means the line number range 123 through 456, and the path/to/Something.blah
|
||||
is a path in the project. (If this code block does not come from a file in the project, then you may instead use
|
||||
the normal markdown style of three backticks followed by language name. However, you MUST use this format if
|
||||
the code in the block comes from a file in the project.)
|
||||
|
||||
## Fixing Diagnostics
|
||||
|
||||
1. Make 1-2 attempts at fixing diagnostics, then defer to the user.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue