fix: resolve false-positive rate limit and one-message-behind in chat sessions

The Claude Code CLI v2.1.45+ emits a `rate_limit_event` message type that the Python SDK v0.1.19 cannot parse, raising MessageParseError. Two bugs resulted: 1. **False-positive rate limit**: check_rate_limit_error() matched "rate_limit" in the exception string "Unknown message type: rate_limit_event" via both an explicit type check and a regex fallback, triggering 15-19s backoff + query re-send on every session. 2. **One-message-behind**: The MessageParseError killed the receive_response() async generator, but the CLI subprocess was still alive with buffered response data. Catching and returning meant the response was never consumed. The next send_message() would read the previous response first, creating a one-behind offset. Changes: - chat_constants.py: check_rate_limit_error() now returns (False, None) for any MessageParseError, blocking both false-positive paths. Added safe_receive_response() helper that retries receive_response() on MessageParseError — the SDK's decoupled producer/consumer architecture (anyio memory channel) allows the new generator to continue reading remaining messages without data loss. Removed calculate_rate_limit_backoff re-export and MAX_CHAT_RATE_LIMIT_RETRIES constant. - spec_chat_session.py, assistant_chat_session.py, expand_chat_session.py: Replaced retry-with-backoff loops with safe_receive_response() wrapper. Removed asyncio.sleep backoff, query re-send, and rate_limited yield. Cleaned up unused imports (asyncio, calculate_rate_limit_backoff, MAX_CHAT_RATE_LIMIT_RETRIES). - agent.py: Added inner retry loop around receive_response() with same MessageParseError skip-and-restart pattern. Removed early-return that truncated responses. - types.ts: Removed SpecChatRateLimitedMessage, AssistantChatRateLimitedMessage, and their union entries. - useSpecChat.ts, useAssistantChat.ts, useExpandChat.ts: Removed dead 'rate_limited' case handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 19:33:09 +00:00 · 2026-02-23 13:00:16 +02:00
parent 9af0f309b7
commit 4f102e7bc2
9 changed files with 258 additions and 363 deletions
--- a/server/services/chat_constants.py
+++ b/server/services/chat_constants.py
@@ -12,7 +12,7 @@ imports (``from .chat_constants import API_ENV_VARS``) continue to work.
 import logging
 import sys
 from pathlib import Path
-from typing import AsyncGenerator
+from typing import Any, AsyncGenerator

 # -------------------------------------------------------------------
 # Root directory of the autoforge project (repository root).
@@ -33,15 +33,10 @@ if _root_str not in sys.path:
 # imports continue to work unchanged.
 # -------------------------------------------------------------------
 from env_constants import API_ENV_VARS  # noqa: E402, F401
-from rate_limit_utils import calculate_rate_limit_backoff, is_rate_limit_error, parse_retry_after  # noqa: E402, F401
+from rate_limit_utils import is_rate_limit_error, parse_retry_after  # noqa: E402, F401

 logger = logging.getLogger(__name__)

-# -------------------------------------------------------------------
-# Rate-limit handling for chat sessions
-# -------------------------------------------------------------------
-MAX_CHAT_RATE_LIMIT_RETRIES = 3
-

 def check_rate_limit_error(exc: Exception) -> tuple[bool, int | None]:
    """Inspect an exception and determine if it represents a rate-limit.
@@ -49,24 +44,15 @@ def check_rate_limit_error(exc: Exception) -> tuple[bool, int | None]:
    Returns ``(is_rate_limit, retry_seconds)``.  ``retry_seconds`` is the
    parsed Retry-After value when available, otherwise ``None`` (caller
    should use exponential backoff).
-
-    Handles:
-    - ``MessageParseError`` whose raw *data* dict has
-      ``type == "rate_limit_event"`` (Claude CLI sends this).
-    - Any exception whose string representation matches known rate-limit
-      patterns (via ``rate_limit_utils.is_rate_limit_error``).
    """
+    # MessageParseError = unknown CLI message type (e.g. "rate_limit_event").
+    # These are informational events, NOT actual rate limit errors.
+    # The word "rate_limit" in the type name would false-positive the regex.
+    if type(exc).__name__ == "MessageParseError":
+        return False, None
+
+    # For all other exceptions: match error text against known rate-limit patterns
    exc_str = str(exc)
-
-    # Check for MessageParseError with a rate_limit_event payload
-    cls_name = type(exc).__name__
-    if cls_name == "MessageParseError":
-        raw_data = getattr(exc, "data", None)
-        if isinstance(raw_data, dict) and raw_data.get("type") == "rate_limit_event":
-            retry = parse_retry_after(str(raw_data)) if raw_data else None
-            return True, retry
-
-    # Fallback: match error text against known rate-limit patterns
    if is_rate_limit_error(exc_str):
        retry = parse_retry_after(exc_str)
        return True, retry
@@ -74,6 +60,34 @@ def check_rate_limit_error(exc: Exception) -> tuple[bool, int | None]:
    return False, None


+async def safe_receive_response(client: Any, log: logging.Logger) -> AsyncGenerator:
+    """Wrap ``client.receive_response()`` to skip ``MessageParseError``.
+
+    The Claude Code CLI may emit message types (e.g. ``rate_limit_event``)
+    that the installed Python SDK does not recognise, causing
+    ``MessageParseError`` which kills the async generator.  The CLI
+    subprocess is still alive and the SDK uses a buffered memory channel,
+    so we restart ``receive_response()`` to continue reading remaining
+    messages without losing data.
+    """
+    max_retries = 50
+    retries = 0
+    while True:
+        try:
+            async for msg in client.receive_response():
+                yield msg
+            return  # Normal completion
+        except Exception as exc:
+            if type(exc).__name__ == "MessageParseError":
+                retries += 1
+                if retries > max_retries:
+                    log.error(f"Too many unrecognized CLI messages ({retries}), stopping")
+                    return
+                log.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
+                continue
+            raise
+
+
 async def make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:
    """Yield a single multimodal user message in Claude Agent SDK format.