Improve content generation prompt to reduce over-generation (#16333)

I focused on cases where we're inserting doc comments or annotations
above symbols.

I added 5 new examples to the content generation prompt, covering
various scenarios:

1. Inserting documentation for a Rust struct
2. Writing docstrings for a Python class
3. Adding comments to a TypeScript method
4. Adding a derive attribute to a Rust struct
5. Adding a decorator to a Python class

These examples demonstrate how to handle different languages and common
tasks like adding documentation, attributes, and decorators.

To improve context integration, I've made the following changes:

1. Added a `transform_context_range` that includes 3 lines before and
after the transform range
2. Introduced `rewrite_section_prefix` and `rewrite_section_suffix` to
provide more context around the section being rewritten
3. Updated the prompt template to include this additional context in a
separate code snippet

Release Notes:

- Reduced instances of over-generation when inserting docs or
annotations above a symbol.
This commit is contained in:
Nathan Sobo 2024-08-15 22:20:11 -06:00 committed by GitHub
parent bac39d7743
commit ad44b459cd
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 434 additions and 33 deletions

View file

@ -1,52 +1,426 @@
Here's a text file that I'm going to ask you to make an edit to. You are an expert developer assistant working in an AI-enabled text editor.
Your task is to rewrite a specific section of the provided document based on a user-provided prompt.
{{#if language_name}} <guidelines>
The file is in {{language_name}}. 1. Scope: Modify only content within <rewrite_this> tags. Do not alter anything outside these boundaries.
{{/if}} 2. Precision: Make changes strictly necessary to fulfill the given prompt. Preserve all other content as-is.
3. Seamless integration: Ensure rewritten sections flow naturally with surrounding text and maintain document structure.
You need to rewrite a portion of it. 4. Tag exclusion: Never include <rewrite_this>, </rewrite_this>, <edit_here>, or <insert_here> tags in the output.
5. Indentation: Maintain the original indentation level of the file in rewritten sections.
The section you'll need to edit is marked with <rewrite_this></rewrite_this> tags. 6. Completeness: Rewrite the entire tagged section, even if only partial changes are needed. Avoid omissions or elisions.
7. Insertions: Replace <insert_here></insert_here> tags with appropriate content as specified by the prompt.
8. Code integrity: Respect existing code structure and functionality when making changes.
9. Consistency: Maintain a uniform style and tone throughout the rewritten text.
</guidelines>
<examples>
<example>
<input>
<document> <document>
use std::cell::Cell;
use std::collections::HashMap;
use std::cmp;
<rewrite_this>
<insert_here></insert_here>
</rewrite_this>
pub struct LruCache<K, V> {
/// The maximum number of items the cache can hold.
capacity: usize,
/// The map storing the cached items.
items: HashMap<K, V>,
}
// The rest of the implementation...
</document>
<prompt>
doc this
</prompt>
</input>
<incorrect_output failure="Over-generation. The text starting with `pub struct AabbTree<T> {` is *after* the rewrite_this tag">
/// Represents an Axis-Aligned Bounding Box (AABB) tree data structure.
///
/// This structure is used for efficient spatial queries and collision detection.
/// It organizes objects in a hierarchical tree structure based on their bounding boxes.
///
/// # Type Parameters
///
/// * `T`: The type of data associated with each node in the tree.
pub struct AabbTree<T> {
root: Option<usize>,
</incorrect_output>
<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
/// Represents an Axis-Aligned Bounding Box (AABB) tree data structure.
///
/// This structure is used for efficient spatial queries and collision detection.
/// It organizes objects in a hierarchical tree structure based on their bounding boxes.
///
/// # Type Parameters
///
/// * `T`: The type of data associated with each node in the tree.
</corrected_output>
</example>
<example>
<input>
<document>
import math
def calculate_circle_area(radius):
"""Calculate the area of a circle given its radius."""
return math.pi * radius ** 2
<rewrite_this>
<insert_here></insert_here>
</rewrite_this>
class Circle:
def __init__(self, radius):
self.radius = radius
def area(self):
return math.pi * self.radius ** 2
def circumference(self):
return 2 * math.pi * self.radius
# Usage example
circle = Circle(5)
print(f"Area: {circle.area():.2f}")
print(f"Circumference: {circle.circumference():.2f}")
</document>
<prompt>
write docs
</prompt>
</input>
<incorrect_output failure="Over-generation. The text starting with `class Circle:` is *after* the rewrite_this tag">
"""
Represents a circle with methods to calculate its area and circumference.
This class provides a simple way to work with circles in a geometric context.
It allows for the creation of Circle objects with a specified radius and
offers methods to compute the circle's area and circumference.
Attributes:
radius (float): The radius of the circle.
Methods:
area(): Calculates and returns the area of the circle.
circumference(): Calculates and returns the circumference of the circle.
"""
class Circle:
</incorrect_output>
<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
"""
Represents a circle with methods to calculate its area and circumference.
This class provides a simple way to work with circles in a geometric context.
It allows for the creation of Circle objects with a specified radius and
offers methods to compute the circle's area and circumference.
Attributes:
radius (float): The radius of the circle.
Methods:
area(): Calculates and returns the area of the circle.
circumference(): Calculates and returns the circumference of the circle.
"""
</corrected_output>
</example>
<example>
<input>
<document>
class BankAccount {
private balance: number;
constructor(initialBalance: number) {
this.balance = initialBalance;
}
<rewrite_this>
<insert_here></insert_here>
</rewrite_this>
deposit(amount: number): void {
if (amount > 0) {
this.balance += amount;
}
}
withdraw(amount: number): boolean {
if (amount > 0 && this.balance >= amount) {
this.balance -= amount;
return true;
}
return false;
}
getBalance(): number {
return this.balance;
}
}
// Usage
const account = new BankAccount(1000);
account.deposit(500);
console.log(account.getBalance()); // 1500
account.withdraw(200);
console.log(account.getBalance()); // 1300
</document>
<prompt>
//
</prompt>
</input>
<incorrect_output failure="Over-generation. The text starting with `deposit(amount: number): void {` is *after* the rewrite_this tag">
/**
* Deposits the specified amount into the bank account.
*
* @param amount The amount to deposit. Must be a positive number.
* @throws Error if the amount is not positive.
*/
deposit(amount: number): void {
if (amount > 0) {
this.balance += amount;
} else {
throw new Error("Deposit amount must be positive");
}
}
</incorrect_output>
<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
/**
* Deposits the specified amount into the bank account.
*
* @param amount The amount to deposit. Must be a positive number.
* @throws Error if the amount is not positive.
*/
</corrected_output>
</example>
<example>
<input>
<document>
use std::collections::VecDeque;
pub struct BinaryTree<T> {
root: Option<Node<T>>,
}
<rewrite_this>
<insert_here></insert_here>
</rewrite_this>
struct Node<T> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
</document>
<prompt>
derive clone
</prompt>
</input>
<incorrect_output failure="Over-generation below the rewrite_this tags. Extra space between derive annotation and struct definition.">
#[derive(Clone)]
struct Node<T> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
</incorrect_output>
<incorrect_output failure="Over-generation above the rewrite_this tags">
pub struct BinaryTree<T> {
root: Option<Node<T>>,
}
#[derive(Clone)]
</incorrect_output>
<incorrect_output failure="Over-generation below the rewrite_this tags">
#[derive(Clone)]
struct Node<T> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
impl<T> Node<T> {
fn new(value: T) -> Self {
Node {
value,
left: None,
right: None,
}
}
}
</incorrect_output>
<corrected_output improvement="Only includes the new content within the rewrite_this tags">
#[derive(Clone)]
</corrected_output>
</example>
<example>
<input>
<document>
import math
def calculate_circle_area(radius):
"""Calculate the area of a circle given its radius."""
return math.pi * radius ** 2
<rewrite_this>
<insert_here></insert_here>
</rewrite_this>
class Circle:
def __init__(self, radius):
self.radius = radius
def area(self):
return math.pi * self.radius ** 2
def circumference(self):
return 2 * math.pi * self.radius
# Usage example
circle = Circle(5)
print(f"Area: {circle.area():.2f}")
print(f"Circumference: {circle.circumference():.2f}")
</document>
<prompt>
add dataclass decorator
</prompt>
</input>
<incorrect_output failure="Over-generation. The text starting with `class Circle:` is *after* the rewrite_this tag">
@dataclass
class Circle:
radius: float
def __init__(self, radius):
self.radius = radius
def area(self):
return math.pi * self.radius ** 2
</incorrect_output>
<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
@dataclass
</corrected_output>
</example>
<example>
<input>
<document>
interface ShoppingCart {
items: string[];
total: number;
}
<rewrite_this>
<insert_here></insert_here>class ShoppingCartManager {
</rewrite_this>
private cart: ShoppingCart;
constructor() {
this.cart = { items: [], total: 0 };
}
addItem(item: string, price: number): void {
this.cart.items.push(item);
this.cart.total += price;
}
getTotal(): number {
return this.cart.total;
}
}
// Usage
const manager = new ShoppingCartManager();
manager.addItem("Book", 15.99);
console.log(manager.getTotal()); // 15.99
</document>
<prompt>
add readonly modifier
</prompt>
</input>
<incorrect_output failure="Over-generation. The line starting with ` items: string[];` is *after* the rewrite_this tag">
readonly interface ShoppingCart {
items: string[];
total: number;
}
class ShoppingCartManager {
private readonly cart: ShoppingCart;
constructor() {
this.cart = { items: [], total: 0 };
}
</incorrect_output>
<corrected_output improvement="Only includes the new content within the rewrite_this tags and integrates cleanly into surrounding code">
readonly interface ShoppingCart {
</corrected_output>
</example>
</examples>
With these examples in mind, edit the following file:
<document language="{{ language_name }}">
{{{ document_content }}} {{{ document_content }}}
</document> </document>
{{#if is_truncated}} {{#if is_truncated}}
The context around the relevant section has been truncated (possibly in the middle of a line) for brevity. The provided document has been truncated (potentially mid-line) for brevity.
{{/if}} {{/if}}
Rewrite the section of {{content_type}} in <rewrite_this></rewrite_this> tags based on the following prompt: <instructions>
{{#if has_insertion}}
Insert text anywhere you see marked with <insert_here></insert_here> tags. It's CRITICAL that you DO NOT include <insert_here> tags in your output.
{{/if}}
{{#if has_replacement}}
Edit text that you see surrounded with <edit_here>...</edit_here> tags. It's CRITICAL that you DO NOT include <edit_here> tags in your output.
{{/if}}
Make no changes to the rewritten content outside these tags.
<snippet language="Rust" annotated="true">
{{{ rewrite_section_prefix }}}
<rewrite_this>
{{{ rewrite_section_with_edits }}}
</rewrite_this>
{{{ rewrite_section_suffix }}}
</snippet>
Rewrite the lines enclosed within the <rewrite_this></rewrite_this> tags in accordance with the provided instructions and the prompt below.
<prompt> <prompt>
{{{ user_prompt }}} {{{ user_prompt }}}
</prompt> </prompt>
Here's the section to edit based on that prompt again for reference: Do not include <insert_here> or <edit_here> annotations in your output. Here is a clean copy of the snippet without annotations for your reference.
<rewrite_this> <snippet>
{{{ rewrite_section_prefix }}}
{{{ rewrite_section }}} {{{ rewrite_section }}}
</rewrite_this> {{{ rewrite_section_suffix }}}
</snippet>
</instructions>
You'll rewrite this entire section, but you will only make changes within certain subsections. <guidelines_reminder>
1. Focus on necessary changes: Modify only what's required to fulfill the prompt.
{{#if has_insertion}} 2. Preserve context: Maintain all surrounding content as-is, ensuring the rewritten section seamlessly integrates with the existing document structure and flow.
Insert text anywhere you see it marked with with <insert_here></insert_here> tags. Do not include <insert_here> tags in your output. 3. Exclude annotation tags: Do not output <rewrite_this>, </rewrite_this>, <edit_here>, or <insert_here> tags.
{{/if}} 4. Maintain indentation: Begin at the original file's indentation level.
{{#if has_replacement}} 5. Complete rewrite: Continue until the entire section is rewritten, even if no further changes are needed.
Edit edit text that you see surrounded with <edit_here></edit_here> tags. Do not include <edit_here> tags in your output. 6. Avoid elisions: Always write out the full section without unnecessary omissions. NEVER say `// ...` or `// ...existing code` in your output.
{{/if}} 7. Respect content boundaries: Preserve code integrity.
</guidelines_reminder>
<rewrite_this>
{{{rewrite_section_with_selections}}}
</rewrite_this>
Only make changes that are necessary to fulfill the prompt, leave everything else as-is. All surrounding {{content_type}} will be preserved. Do not output the <rewrite_this></rewrite this> tags or anything outside of them.
Start at the indentation level in the original file in the rewritten {{content_type}}. Don't stop until you've rewritten the entire section, even if you have no more changes to make. Always write out the whole section with no unnecessary elisions.
Immediately start with the following format with no remarks: Immediately start with the following format with no remarks:
``` ```
\{{REWRITTEN_CODE}} {{REWRITTEN_CODE}}
``` ```

View file

@ -45,6 +45,7 @@ use std::{
task::{self, Poll}, task::{self, Poll},
time::{Duration, Instant}, time::{Duration, Instant},
}; };
use text::OffsetRangeExt as _;
use theme::ThemeSettings; use theme::ThemeSettings;
use ui::{prelude::*, CheckboxWithLabel, IconButtonShape, Popover, Tooltip}; use ui::{prelude::*, CheckboxWithLabel, IconButtonShape, Popover, Tooltip};
use util::{RangeExt, ResultExt}; use util::{RangeExt, ResultExt};
@ -2354,6 +2355,15 @@ impl Codegen {
return Err(anyhow::anyhow!("invalid transformation range")); return Err(anyhow::anyhow!("invalid transformation range"));
}; };
let mut transform_context_range = transform_range.to_point(&transform_buffer);
transform_context_range.start.row = transform_context_range.start.row.saturating_sub(3);
transform_context_range.start.column = 0;
transform_context_range.end =
(transform_context_range.end + Point::new(3, 0)).min(transform_buffer.max_point());
transform_context_range.end.column =
transform_buffer.line_len(transform_context_range.end.row);
let transform_context_range = transform_context_range.to_offset(&transform_buffer);
let selected_ranges = self let selected_ranges = self
.selected_ranges .selected_ranges
.iter() .iter()
@ -2376,6 +2386,7 @@ impl Codegen {
transform_buffer, transform_buffer,
transform_range, transform_range,
selected_ranges, selected_ranges,
transform_context_range,
) )
.map_err(|e| anyhow::anyhow!("Failed to generate content prompt: {}", e))?; .map_err(|e| anyhow::anyhow!("Failed to generate content prompt: {}", e))?;

View file

@ -16,7 +16,9 @@ pub struct ContentPromptContext {
pub document_content: String, pub document_content: String,
pub user_prompt: String, pub user_prompt: String,
pub rewrite_section: String, pub rewrite_section: String,
pub rewrite_section_with_selections: String, pub rewrite_section_prefix: String,
pub rewrite_section_suffix: String,
pub rewrite_section_with_edits: String,
pub has_insertion: bool, pub has_insertion: bool,
pub has_replacement: bool, pub has_replacement: bool,
} }
@ -173,6 +175,7 @@ impl PromptBuilder {
buffer: BufferSnapshot, buffer: BufferSnapshot,
transform_range: Range<usize>, transform_range: Range<usize>,
selected_ranges: Vec<Range<usize>>, selected_ranges: Vec<Range<usize>>,
transform_context_range: Range<usize>,
) -> Result<String, RenderError> { ) -> Result<String, RenderError> {
let content_type = match language_name { let content_type = match language_name {
None | Some("Markdown" | "Plain Text") => "text", None | Some("Markdown" | "Plain Text") => "text",
@ -202,6 +205,7 @@ impl PromptBuilder {
for chunk in buffer.text_for_range(truncated_before) { for chunk in buffer.text_for_range(truncated_before) {
document_content.push_str(chunk); document_content.push_str(chunk);
} }
document_content.push_str("<rewrite_this>\n"); document_content.push_str("<rewrite_this>\n");
for chunk in buffer.text_for_range(transform_range.clone()) { for chunk in buffer.text_for_range(transform_range.clone()) {
document_content.push_str(chunk); document_content.push_str(chunk);
@ -217,7 +221,17 @@ impl PromptBuilder {
rewrite_section.push_str(chunk); rewrite_section.push_str(chunk);
} }
let rewrite_section_with_selections = { let mut rewrite_section_prefix = String::new();
for chunk in buffer.text_for_range(transform_context_range.start..transform_range.start) {
rewrite_section_prefix.push_str(chunk);
}
let mut rewrite_section_suffix = String::new();
for chunk in buffer.text_for_range(transform_range.end..transform_context_range.end) {
rewrite_section_suffix.push_str(chunk);
}
let rewrite_section_with_edits = {
let mut section_with_selections = String::new(); let mut section_with_selections = String::new();
let mut last_end = 0; let mut last_end = 0;
for selected_range in &selected_ranges { for selected_range in &selected_ranges {
@ -254,7 +268,9 @@ impl PromptBuilder {
document_content, document_content,
user_prompt, user_prompt,
rewrite_section, rewrite_section,
rewrite_section_with_selections, rewrite_section_prefix,
rewrite_section_suffix,
rewrite_section_with_edits,
has_insertion, has_insertion,
has_replacement, has_replacement,
}; };