Don't auto-retry in certain circumstances (#35037)

Someone encountered this in production, which should not happen:

<img width="1266" height="623" alt="Screenshot 2025-07-24 at 10 38
40 AM"
src="https://github.com/user-attachments/assets/40f3f977-5110-4808-a456-7e708d953b3b"
/>

This moves certain errors into the category of "never retry" and reduces
the number of retries for some others. Also it adds some diagnostic
logging for retry policy.

It's not a complete fix for the above, because the underlying issues is
that the server is sending a HTTP 403 response and although we were
already treating 403s as "do not retry" it was deciding to retry with 2
attempts anyway. So further debugging is needed to figure out why it
wasn't going down the 403 branch by the time the request got here.

Release Notes:

- N/A
This commit is contained in:
Richard Feldman 2025-07-24 11:11:26 -04:00 committed by GitHub
parent fa788a39a4
commit 2a9355a3d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -2037,6 +2037,12 @@ impl Thread {
if let Some(retry_strategy) =
Thread::get_retry_strategy(completion_error)
{
log::info!(
"Retrying with {:?} for language model completion error {:?}",
retry_strategy,
completion_error
);
retry_scheduled = thread
.handle_retryable_error_with_delay(
&completion_error,
@ -2246,15 +2252,14 @@ impl Thread {
..
}
| AuthenticationError { .. }
| PermissionError { .. } => None,
// These errors might be transient, so retry them
SerializeRequest { .. }
| BuildRequestBody { .. }
| PromptTooLarge { .. }
| PermissionError { .. }
| NoApiKey { .. }
| ApiEndpointNotFound { .. }
| NoApiKey { .. } => Some(RetryStrategy::Fixed {
| PromptTooLarge { .. } => None,
// These errors might be transient, so retry them
SerializeRequest { .. } | BuildRequestBody { .. } => Some(RetryStrategy::Fixed {
delay: BASE_RETRY_DELAY,
max_attempts: 2,
max_attempts: 1,
}),
// Retry all other 4xx and 5xx errors once.
HttpResponseError { status_code, .. }