agent: Fix issue with Anthropic thinking models (#33317)

cc @osyvokon We were seeing a bunch of errors in our backend when people were using Claude models with thinking enabled. In the logs we would see > an error occurred while interacting with the Anthropic API: invalid_request_error: messages.x.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking However, this issue did not occur frequently and was not easily reproducible. Turns out it was triggered by us not correctly handling [Redacted Thinking Blocks](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#thinking-redaction). I could constantly reproduce this issue by including this magic string: `ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB ` in the request, which forces `claude-3-7-sonnet` to emit redacted thinking blocks (confusingly the magic string does not seem to be working for `claude-sonnet-4`). As soon as we hit a tool call Anthropic would return an error. Thanks to @osyvokon for pointing me in the right direction 😄! Release Notes: - agent: Fixed an issue where Anthropic models would sometimes return an error when thinking was enabled
2025-06-24 18:23:59 +02:00 · 2025-06-24 18:23:59 +02:00 · 7be57baef0
commit 7be57baef0
parent 39dc4b9040
7 changed files with 36 additions and 10 deletions
--- a/crates/eval/src/instance.rs
+++ b/crates/eval/src/instance.rs
@ -1030,6 +1030,7 @@ pub fn response_events_to_markdown(
            Ok(LanguageModelCompletionEvent::Thinking { text, .. }) => {
                thinking_buffer.push_str(text);
            }
+            Ok(LanguageModelCompletionEvent::RedactedThinking { .. }) => {}
            Ok(LanguageModelCompletionEvent::Stop(reason)) => {
                flush_buffers(&mut response, &mut text_buffer, &mut thinking_buffer);
                response.push_str(&format!("**Stop**: {:?}\n\n", reason));
@ -1126,6 +1127,7 @@ impl ThreadDialog {

                // Skip these
                Ok(LanguageModelCompletionEvent::UsageUpdate(_))
+                | Ok(LanguageModelCompletionEvent::RedactedThinking { .. })
                | Ok(LanguageModelCompletionEvent::StatusUpdate { .. })
                | Ok(LanguageModelCompletionEvent::StartMessage { .. })
                | Ok(LanguageModelCompletionEvent::Stop(_)) => {}