Add max_output_tokens to OpenAI models and integrate into requests (#16381)

### Pull Request Title
Introduce `max_output_tokens` Field for OpenAI Models


https://platform.deepseek.com/api-docs/news/news0725/#4-8k-max_tokens-betarelease-longer-possibilities

### Description
This commit introduces a new field `max_output_tokens` to the OpenAI
models, which allows specifying the maximum number of tokens that can be
generated in the output. This field is now integrated into the request
handling across multiple crates, ensuring that the output token limit is
respected during language model completions.

Changes include:
- Adding `max_output_tokens` to the `Custom` variant of the
`open_ai::Model` enum.
- Updating the `into_open_ai` method in `LanguageModelRequest` to accept
and use `max_output_tokens`.
- Modifying the `OpenAiLanguageModel` and `CloudLanguageModel`
implementations to pass `max_output_tokens` when converting requests.
- Ensuring that the `max_output_tokens` field is correctly serialized
and deserialized in relevant structures.

This enhancement provides more control over the output length of OpenAI
model responses, improving the flexibility and accuracy of language
model interactions.

### Changes
- Added `max_output_tokens` to the `Custom` variant of the
`open_ai::Model` enum.
- Updated the `into_open_ai` method in `LanguageModelRequest` to accept
and use `max_output_tokens`.
- Modified the `OpenAiLanguageModel` and `CloudLanguageModel`
implementations to pass `max_output_tokens` when converting requests.
- Ensured that the `max_output_tokens` field is correctly serialized and
deserialized in relevant structures.

### Related Issue
https://github.com/zed-industries/zed/pull/16358

### Screenshots / Media
N/A

### Checklist
- [x] Code compiles correctly.
- [x] All tests pass.
- [ ] Documentation has been updated accordingly.
- [ ] Additional tests have been added to cover new functionality.
- [ ] Relevant documentation has been updated or added.

### Release Notes

- Added `max_output_tokens` field to OpenAI models for controlling
output token length.
This commit is contained in:
邻二氮杂菲 2024-08-21 12:39:10 +08:00 committed by GitHub
parent 36d51fe4a5
commit f1778dd9de
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 46 additions and 15 deletions

View file

@ -66,7 +66,11 @@ pub enum Model {
#[serde(rename = "gpt-4o-mini", alias = "gpt-4o-mini-2024-07-18")]
FourOmniMini,
#[serde(rename = "custom")]
Custom { name: String, max_tokens: usize },
Custom {
name: String,
max_tokens: usize,
max_output_tokens: Option<u32>,
},
}
impl Model {
@ -113,6 +117,19 @@ impl Model {
Self::Custom { max_tokens, .. } => *max_tokens,
}
}
pub fn max_output_tokens(&self) -> Option<u32> {
match self {
Self::ThreePointFiveTurbo => Some(4096),
Self::Four => Some(8192),
Self::FourTurbo => Some(4096),
Self::FourOmni => Some(4096),
Self::FourOmniMini => Some(16384),
Self::Custom {
max_output_tokens, ..
} => *max_output_tokens,
}
}
}
#[derive(Debug, Serialize, Deserialize)]
@ -121,7 +138,7 @@ pub struct Request {
pub messages: Vec<RequestMessage>,
pub stream: bool,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub max_tokens: Option<usize>,
pub max_tokens: Option<u32>,
pub stop: Vec<String>,
pub temperature: f32,
#[serde(default, skip_serializing_if = "Option::is_none")]