Merge pull request #211 from AutoForgeAI/fix/rate-limit-event-crash

fix: handle rate_limit_event crash in chat sessions
2026-03-17 02:43:09 +00:00 · 2026-02-23 12:28:28 +02:00
parent d65fa0ca56 49442f0d43
commit 9af0f309b7
15 changed files with 406 additions and 150 deletions
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -55,10 +55,10 @@ Pull request(s): $ARGUMENTS
   - Reviewing large, unfocused PRs is impractical and error-prone; the review cannot provide adequate assurance for such changes
 6. **Vision Alignment Check**
-   - Read the project's README.md and CLAUDE.md to understand the application's core purpose
+   - **VISION.md protection**: First, check whether the PR diff modifies `VISION.md` in any way (edits, deletions, renames). If it does, **stop the review immediately** — verdict is **DON'T MERGE**. VISION.md is immutable and no PR is permitted to alter it. Explain this to the user and skip all remaining steps.
-   - Assess whether this PR aligns with the application's intended functionality
+   - Read the project's `VISION.md`, `README.md`, and `CLAUDE.md` to understand the application's core purpose and mandatory architectural constraints
-   - If the changes deviate significantly from the core vision or add functionality that doesn't serve the application's purpose, note this in the review
+   - Assess whether this PR aligns with the vision defined in `VISION.md`
-   - This is not a blocker, but should be flagged for the reviewer's consideration
+   - **Vision deviation is a merge blocker.** If the PR introduces functionality, integrations, or architectural changes that conflict with `VISION.md`, the verdict must be **DON'T MERGE**. This is not negotiable — the vision document takes precedence over any PR rationale.
 7. **Safety Assessment**
   - Provide a review on whether the PR is safe to merge as-is
--- a/.claude/launch.json
+++ b/.claude/launch.json
@@ -0,0 +1,18 @@
 {
  "version": "0.0.1",
  "configurations": [
    {
      "name": "backend",
      "runtimeExecutable": "python",
      "runtimeArgs": ["-m", "uvicorn", "server.main:app", "--host", "127.0.0.1", "--port", "8888", "--reload"],
      "port": 8888
    },
    {
      "name": "frontend",
      "runtimeExecutable": "cmd",
      "runtimeArgs": ["/c", "cd ui && npx vite"],
      "port": 5173
    }
  ],
  "autoVerify": true
 }
--- a/VISION.md
+++ b/VISION.md
@@ -0,0 +1,22 @@
 # VISION
 This document defines the mandatory project vision for AutoForge. All contributions must align with these principles. PRs that deviate from this vision will be rejected. This file itself is immutable via PR — any PR that modifies VISION.md will be rejected outright.
 ## Claude Agent SDK Exclusivity
 AutoForge is a wrapper around the **Claude Agent SDK**. This is a foundational architectural decision, not a preference.
 **What this means:**
 - AutoForge only supports providers, models, and integrations that work through the Claude Agent SDK.
 - We will not integrate with, accommodate, or add support for other AI SDKs, CLIs, or coding agent platforms (e.g., Codex, OpenCode, Aider, Continue, Cursor agents, or similar tools).
 **Why:**
 Each platform has its own approach to MCP tools, skills, context management, and feature integration. Attempting to support multiple agent frameworks creates an unsustainable maintenance burden and dilutes the quality of the core experience. By committing to the Claude Agent SDK exclusively, we can build deep, reliable integration rather than shallow compatibility across many targets.
 **In practice:**
 - PRs adding support for non-Claude agent frameworks will be rejected.
 - PRs introducing abstractions designed to make AutoForge "agent-agnostic" will be rejected.
 - Alternative API providers (e.g., Vertex AI, AWS Bedrock) are acceptable only when accessed through the Claude Agent SDK's own configuration.
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "autoforge-ai",
-  "version": "0.1.13",
+  "version": "0.1.14",
  "description": "Autonomous coding agent with web UI - build complete apps with AI",
  "license": "AGPL-3.0",
  "bin": {
--- a/requirements-prod.txt
+++ b/requirements-prod.txt
@@ -1,6 +1,6 @@
 # Production runtime dependencies only
 # For development, use requirements.txt (includes ruff, mypy, pytest)
-claude-agent-sdk>=0.1.0,<0.2.0
+claude-agent-sdk>=0.1.39,<0.2.0
 python-dotenv>=1.0.0
 sqlalchemy>=2.0.0
 fastapi>=0.115.0
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-claude-agent-sdk>=0.1.0,<0.2.0
+claude-agent-sdk>=0.1.39,<0.2.0
 python-dotenv>=1.0.0
 sqlalchemy>=2.0.0
 fastapi>=0.115.0
--- a/server/services/assistant_chat_session.py
+++ b/server/services/assistant_chat_session.py
@@ -7,6 +7,7 @@ The assistant can answer questions about the codebase and features
 but cannot modify any files.
 """
 import asyncio
 import json
 import logging
 import os
@@ -25,7 +26,12 @@ from .assistant_database import (
    create_conversation,
    get_messages,
 )
-from .chat_constants import ROOT_DIR
+from .chat_constants import (
    MAX_CHAT_RATE_LIMIT_RETRIES,
    ROOT_DIR,
    calculate_rate_limit_backoff,
    check_rate_limit_error,
 )
 # Load environment variables from .env file if present
 load_dotenv()
@@ -393,7 +399,9 @@ class AssistantChatSession:
        full_response = ""
-        # Stream the response
+        # Stream the response (with rate-limit retry)
        for _attempt in range(MAX_CHAT_RATE_LIMIT_RETRIES + 1):
            try:
                async for msg in self.client.receive_response():
                    msg_type = type(msg).__name__
@@ -426,6 +434,31 @@ class AssistantChatSession:
                                    "tool": tool_name,
                                    "input": tool_input,
                                }
                # Completed successfully — break out of retry loop
                break
            except Exception as exc:
                is_rate_limit, retry_secs = check_rate_limit_error(exc)
                if is_rate_limit and _attempt < MAX_CHAT_RATE_LIMIT_RETRIES:
                    delay = retry_secs if retry_secs else calculate_rate_limit_backoff(_attempt)
                    logger.warning(f"Rate limited (attempt {_attempt + 1}/{MAX_CHAT_RATE_LIMIT_RETRIES}), retrying in {delay}s")
                    yield {
                        "type": "rate_limited",
                        "retry_in": delay,
                        "attempt": _attempt + 1,
                        "max_attempts": MAX_CHAT_RATE_LIMIT_RETRIES,
                    }
                    await asyncio.sleep(delay)
                    await self.client.query(message)
                    continue
                if is_rate_limit:
                    logger.error("Rate limit retries exhausted for assistant chat")
                    yield {"type": "error", "content": "Rate limited. Please try again later."}
                    return
                # Non-rate-limit MessageParseError: log and break (don't crash)
                if type(exc).__name__ == "MessageParseError":
                    logger.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
                    break
                raise
        # Store the complete response in the database
        if full_response and self.conversation_id:
--- a/server/services/chat_constants.py
+++ b/server/services/chat_constants.py
@@ -9,6 +9,7 @@ project root and is re-exported here for convenience so that existing
 imports (``from .chat_constants import API_ENV_VARS``) continue to work.
 """
 import logging
 import sys
 from pathlib import Path
 from typing import AsyncGenerator
@@ -32,6 +33,45 @@ if _root_str not in sys.path:
 # imports continue to work unchanged.
 # -------------------------------------------------------------------
 from env_constants import API_ENV_VARS  # noqa: E402, F401
 from rate_limit_utils import calculate_rate_limit_backoff, is_rate_limit_error, parse_retry_after  # noqa: E402, F401
 logger = logging.getLogger(__name__)
 # -------------------------------------------------------------------
 # Rate-limit handling for chat sessions
 # -------------------------------------------------------------------
 MAX_CHAT_RATE_LIMIT_RETRIES = 3
 def check_rate_limit_error(exc: Exception) -> tuple[bool, int | None]:
    """Inspect an exception and determine if it represents a rate-limit.
    Returns ``(is_rate_limit, retry_seconds)``.  ``retry_seconds`` is the
    parsed Retry-After value when available, otherwise ``None`` (caller
    should use exponential backoff).
    Handles:
    - ``MessageParseError`` whose raw *data* dict has
      ``type == "rate_limit_event"`` (Claude CLI sends this).
    - Any exception whose string representation matches known rate-limit
      patterns (via ``rate_limit_utils.is_rate_limit_error``).
    """
    exc_str = str(exc)
    # Check for MessageParseError with a rate_limit_event payload
    cls_name = type(exc).__name__
    if cls_name == "MessageParseError":
        raw_data = getattr(exc, "data", None)
        if isinstance(raw_data, dict) and raw_data.get("type") == "rate_limit_event":
            retry = parse_retry_after(str(raw_data)) if raw_data else None
            return True, retry
    # Fallback: match error text against known rate-limit patterns
    if is_rate_limit_error(exc_str):
        retry = parse_retry_after(exc_str)
        return True, retry
    return False, None
 async def make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:
--- a/server/services/expand_chat_session.py
+++ b/server/services/expand_chat_session.py
@@ -22,7 +22,13 @@ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from dotenv import load_dotenv
 from ..schemas import ImageAttachment
-from .chat_constants import ROOT_DIR, make_multimodal_message
+from .chat_constants import (
    MAX_CHAT_RATE_LIMIT_RETRIES,
    ROOT_DIR,
    calculate_rate_limit_backoff,
    check_rate_limit_error,
    make_multimodal_message,
 )
 # Load environment variables from .env file if present
 load_dotenv()
@@ -298,7 +304,9 @@ class ExpandChatSession:
        else:
            await self.client.query(message)
-        # Stream the response
+        # Stream the response (with rate-limit retry)
        for _attempt in range(MAX_CHAT_RATE_LIMIT_RETRIES + 1):
            try:
                async for msg in self.client.receive_response():
                    msg_type = type(msg).__name__
@@ -316,6 +324,47 @@ class ExpandChatSession:
                                        "content": text,
                                        "timestamp": datetime.now().isoformat()
                                    })
                # Completed successfully — break out of retry loop
                break
            except Exception as exc:
                is_rate_limit, retry_secs = check_rate_limit_error(exc)
                if is_rate_limit and _attempt < MAX_CHAT_RATE_LIMIT_RETRIES:
                    delay = retry_secs if retry_secs else calculate_rate_limit_backoff(_attempt)
                    logger.warning(f"Rate limited (attempt {_attempt + 1}/{MAX_CHAT_RATE_LIMIT_RETRIES}), retrying in {delay}s")
                    yield {
                        "type": "rate_limited",
                        "retry_in": delay,
                        "attempt": _attempt + 1,
                        "max_attempts": MAX_CHAT_RATE_LIMIT_RETRIES,
                    }
                    await asyncio.sleep(delay)
                    # Re-send the query before retrying receive_response
                    if attachments and len(attachments) > 0:
                        content_blocks_retry: list[dict[str, Any]] = []
                        if message:
                            content_blocks_retry.append({"type": "text", "text": message})
                        for att in attachments:
                            content_blocks_retry.append({
                                "type": "image",
                                "source": {
                                    "type": "base64",
                                    "media_type": att.mimeType,
                                    "data": att.base64Data,
                                }
                            })
                        await self.client.query(make_multimodal_message(content_blocks_retry))
                    else:
                        await self.client.query(message)
                    continue
                if is_rate_limit:
                    logger.error("Rate limit retries exhausted for expand chat")
                    yield {"type": "error", "content": "Rate limited. Please try again later."}
                    return
                # Non-rate-limit MessageParseError: log and break (don't crash)
                if type(exc).__name__ == "MessageParseError":
                    logger.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
                    break
                raise
    def get_features_created(self) -> int:
        """Get the total number of features created in this session."""
--- a/server/services/spec_chat_session.py
+++ b/server/services/spec_chat_session.py
@@ -6,6 +6,7 @@ Manages interactive spec creation conversation with Claude.
 Uses the create-spec.md skill to guide users through app spec creation.
 """
 import asyncio
 import json
 import logging
 import os
@@ -19,7 +20,13 @@ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from dotenv import load_dotenv
 from ..schemas import ImageAttachment
-from .chat_constants import ROOT_DIR, make_multimodal_message
+from .chat_constants import (
    MAX_CHAT_RATE_LIMIT_RETRIES,
    ROOT_DIR,
    calculate_rate_limit_backoff,
    check_rate_limit_error,
    make_multimodal_message,
 )
 # Load environment variables from .env file if present
 load_dotenv()
@@ -304,7 +311,9 @@ class SpecChatSession:
        # Store paths for the completion message
        spec_path = None
-        # Stream the response using receive_response
+        # Stream the response using receive_response (with rate-limit retry)
        for _attempt in range(MAX_CHAT_RATE_LIMIT_RETRIES + 1):
            try:
                async for msg in self.client.receive_response():
                    msg_type = type(msg).__name__
@@ -415,6 +424,32 @@ class SpecChatSession:
                                            "type": "spec_complete",
                                            "path": str(spec_path)
                                        }
                # Completed successfully — break out of retry loop
                break
            except Exception as exc:
                is_rate_limit, retry_secs = check_rate_limit_error(exc)
                if is_rate_limit and _attempt < MAX_CHAT_RATE_LIMIT_RETRIES:
                    delay = retry_secs if retry_secs else calculate_rate_limit_backoff(_attempt)
                    logger.warning(f"Rate limited (attempt {_attempt + 1}/{MAX_CHAT_RATE_LIMIT_RETRIES}), retrying in {delay}s")
                    yield {
                        "type": "rate_limited",
                        "retry_in": delay,
                        "attempt": _attempt + 1,
                        "max_attempts": MAX_CHAT_RATE_LIMIT_RETRIES,
                    }
                    await asyncio.sleep(delay)
                    # Re-send the query before retrying receive_response
                    await self.client.query(message)
                    continue
                if is_rate_limit:
                    logger.error("Rate limit retries exhausted for spec chat")
                    yield {"type": "error", "content": "Rate limited. Please try again later."}
                    return
                # Non-rate-limit MessageParseError: log and break (don't crash)
                if type(exc).__name__ == "MessageParseError":
                    logger.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
                    break
                raise
    def is_complete(self) -> bool:
        """Check if spec creation is complete."""
--- a/ui/package-lock.json
+++ b/ui/package-lock.json
@@ -56,7 +56,7 @@
    },
    "..": {
      "name": "autoforge-ai",
-      "version": "0.1.13",
+      "version": "0.1.14",
      "license": "AGPL-3.0",
      "bin": {
        "autoforge": "bin/autoforge.js"
--- a/ui/src/hooks/useAssistantChat.ts
+++ b/ui/src/hooks/useAssistantChat.ts
@@ -269,6 +269,20 @@ export function useAssistantChat({
            break;
          }
          case "rate_limited": {
            // Show rate limit info as system message
            setMessages((prev) => [
              ...prev,
              {
                id: generateId(),
                role: "system",
                content: `Rate limited. Retrying in ${data.retry_in}s... (attempt ${data.attempt}/${data.max_attempts})`,
                timestamp: new Date(),
              },
            ]);
            break;
          }
          case "pong": {
            // Keep-alive response, nothing to do
            break;
--- a/ui/src/hooks/useExpandChat.ts
+++ b/ui/src/hooks/useExpandChat.ts
@@ -226,6 +226,20 @@ export function useExpandChat({
            break
          }
          case 'rate_limited': {
            // Show rate limit info as system message
            setMessages((prev) => [
              ...prev,
              {
                id: generateId(),
                role: 'system',
                content: `Rate limited. Retrying in ${data.retry_in}s... (attempt ${data.attempt}/${data.max_attempts})`,
                timestamp: new Date(),
              },
            ])
            break
          }
          case 'pong': {
            // Keep-alive response, nothing to do
            break
--- a/ui/src/hooks/useSpecChat.ts
+++ b/ui/src/hooks/useSpecChat.ts
@@ -322,6 +322,20 @@ export function useSpecChat({
            break
          }
          case 'rate_limited': {
            // Show rate limit info as system message
            setMessages((prev) => [
              ...prev,
              {
                id: generateId(),
                role: 'system',
                content: `Rate limited. Retrying in ${data.retry_in}s... (attempt ${data.attempt}/${data.max_attempts})`,
                timestamp: new Date(),
              },
            ])
            break
          }
          case 'pong': {
            // Keep-alive response, nothing to do
            break
--- a/ui/src/lib/types.ts
+++ b/ui/src/lib/types.ts
@@ -407,6 +407,13 @@ export interface SpecChatResponseDoneMessage {
  type: 'response_done'
 }
 export interface SpecChatRateLimitedMessage {
  type: 'rate_limited'
  retry_in: number
  attempt: number
  max_attempts: number
 }
 export type SpecChatServerMessage =
  | SpecChatTextMessage
  | SpecChatQuestionMessage
@@ -416,6 +423,7 @@ export type SpecChatServerMessage =
  | SpecChatErrorMessage
  | SpecChatPongMessage
  | SpecChatResponseDoneMessage
  | SpecChatRateLimitedMessage
 // Image attachment for chat messages
 export interface ImageAttachment {
@@ -501,6 +509,13 @@ export interface AssistantChatPongMessage {
  type: 'pong'
 }
 export interface AssistantChatRateLimitedMessage {
  type: 'rate_limited'
  retry_in: number
  attempt: number
  max_attempts: number
 }
 export type AssistantChatServerMessage =
  | AssistantChatTextMessage
  | AssistantChatToolCallMessage
@@ -509,6 +524,7 @@ export type AssistantChatServerMessage =
  | AssistantChatErrorMessage
  | AssistantChatConversationCreatedMessage
  | AssistantChatPongMessage
  | AssistantChatRateLimitedMessage
 // ============================================================================
 // Expand Chat Types
@@ -532,6 +548,7 @@ export type ExpandChatServerMessage =
  | SpecChatErrorMessage       // Reuse error message type
  | SpecChatPongMessage        // Reuse pong message type
  | SpecChatResponseDoneMessage // Reuse response_done type
  | SpecChatRateLimitedMessage // Reuse rate_limited message type
 // Bulk feature creation
 export interface FeatureBulkCreate {