Don't auto-retry in certain circumstances (#35037)
Someone encountered this in production, which should not happen: <img width="1266" height="623" alt="Screenshot 2025-07-24 at 10 38 40 AM" src="https://github.com/user-attachments/assets/40f3f977-5110-4808-a456-7e708d953b3b" /> This moves certain errors into the category of "never retry" and reduces the number of retries for some others. Also it adds some diagnostic logging for retry policy. It's not a complete fix for the above, because the underlying issues is that the server is sending a HTTP 403 response and although we were already treating 403s as "do not retry" it was deciding to retry with 2 attempts anyway. So further debugging is needed to figure out why it wasn't going down the 403 branch by the time the request got here. Release Notes: - N/A
This commit is contained in:
parent
fa788a39a4
commit
2a9355a3d2
1 changed files with 12 additions and 7 deletions
|
@ -2037,6 +2037,12 @@ impl Thread {
|
||||||
if let Some(retry_strategy) =
|
if let Some(retry_strategy) =
|
||||||
Thread::get_retry_strategy(completion_error)
|
Thread::get_retry_strategy(completion_error)
|
||||||
{
|
{
|
||||||
|
log::info!(
|
||||||
|
"Retrying with {:?} for language model completion error {:?}",
|
||||||
|
retry_strategy,
|
||||||
|
completion_error
|
||||||
|
);
|
||||||
|
|
||||||
retry_scheduled = thread
|
retry_scheduled = thread
|
||||||
.handle_retryable_error_with_delay(
|
.handle_retryable_error_with_delay(
|
||||||
&completion_error,
|
&completion_error,
|
||||||
|
@ -2246,15 +2252,14 @@ impl Thread {
|
||||||
..
|
..
|
||||||
}
|
}
|
||||||
| AuthenticationError { .. }
|
| AuthenticationError { .. }
|
||||||
| PermissionError { .. } => None,
|
| PermissionError { .. }
|
||||||
// These errors might be transient, so retry them
|
| NoApiKey { .. }
|
||||||
SerializeRequest { .. }
|
|
||||||
| BuildRequestBody { .. }
|
|
||||||
| PromptTooLarge { .. }
|
|
||||||
| ApiEndpointNotFound { .. }
|
| ApiEndpointNotFound { .. }
|
||||||
| NoApiKey { .. } => Some(RetryStrategy::Fixed {
|
| PromptTooLarge { .. } => None,
|
||||||
|
// These errors might be transient, so retry them
|
||||||
|
SerializeRequest { .. } | BuildRequestBody { .. } => Some(RetryStrategy::Fixed {
|
||||||
delay: BASE_RETRY_DELAY,
|
delay: BASE_RETRY_DELAY,
|
||||||
max_attempts: 2,
|
max_attempts: 1,
|
||||||
}),
|
}),
|
||||||
// Retry all other 4xx and 5xx errors once.
|
// Retry all other 4xx and 5xx errors once.
|
||||||
HttpResponseError { status_code, .. }
|
HttpResponseError { status_code, .. }
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue