refactor: optimize token usage, deduplicate code, fix bugs across agents

Token reduction (~40% per session, ~2.3M fewer tokens per 200-feature project): - Agent-type-specific tool lists: coding 9, testing 5, init 5 (was 19 for all) - Right-sized max_turns: coding 300, testing 100 (was 1000 for all) - Trimmed coding prompt template (~150 lines removed) - Streamlined testing prompt with batch support - YOLO mode now strips browser testing instructions from prompt - Added Grep, WebFetch, WebSearch to expand project session Performance improvements: - Rate limit retries start at ~15s with jitter (was fixed 60s) - Post-spawn delay reduced to 0.5s (was 2s) - Orchestrator consolidated to 1 DB query per loop (was 5-7) - Testing agents batch 3 features per session (was 1) - Smart context compaction preserves critical state, discards noise Bug fixes: - Removed ghost feature_release_testing MCP tool (wasted tokens every test session) - Forward all 9 Vertex AI env vars to chat sessions (was missing 3) - Fix DetachedInstanceError risk in test batch ORM access - Prevent duplicate testing of same features in parallel mode Code deduplication: - _get_project_path(): 9 copies -> 1 shared utility (project_helpers.py) - validate_project_name(): 9 copies -> 2 variants in 1 file (validation.py) - ROOT_DIR: 10 copies -> 1 definition (chat_constants.py) - API_ENV_VARS: 4 copies -> 1 source of truth (env_constants.py) Security hardening: - Unified sensitive directory blocklist (14 dirs, was two divergent lists) - Cached get_blocked_paths() for O(1) directory listing checks - Terminal security warning when ALLOW_REMOTE=1 exposes WebSocket - 20 new security tests for EXTRA_READ_PATHS blocking - Extracted _validate_command_list() and _validate_pkill_processes() helpers Type safety: - 87 mypy errors -> 0 across 58 source files - Installed types-PyYAML for proper yaml stub types - Fixed SQLAlchemy Column[T] coercions across all routers Dead code removed: - 13 files deleted (~2,679 lines): unused UI components, debug logs, outdated docs - 7 unused npm packages removed (Radix UI components with 0 imports) - AgentAvatar.tsx reduced from 615 -> 119 lines (SVGs extracted to mascotData.tsx) New CLI options: - --testing-batch-size (1-5) for parallel mode test batching - --testing-feature-ids for direct multi-feature testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 07:23:35 +00:00 · 2026-02-01 13:16:24 +02:00
parent dc5bcc4ae9
commit 94e0b05cb1
57 changed files with 1974 additions and 4300 deletions
--- a/rate_limit_utils.py
+++ b/rate_limit_utils.py
@@ -6,6 +6,7 @@ Shared utilities for detecting and handling API rate limits.
 Used by both agent.py (production) and test_rate_limit_utils.py (tests).
 """

+import random
 import re
 from typing import Optional

@@ -81,18 +82,25 @@ def is_rate_limit_error(error_message: str) -> bool:

 def calculate_rate_limit_backoff(retries: int) -> int:
    """
-    Calculate exponential backoff for rate limits.
+    Calculate exponential backoff with jitter for rate limits.

-    Formula: min(60 * 2^retries, 3600) - caps at 1 hour
-    Sequence: 60s, 120s, 240s, 480s, 960s, 1920s, 3600s...
+    Base formula: min(15 * 2^retries, 3600)
+    Jitter: adds 0-30% random jitter to prevent thundering herd.
+    Base sequence: ~15-20s, ~30-40s, ~60-78s, ~120-156s, ...
+
+    The lower starting delay (15s vs 60s) allows faster recovery from
+    transient rate limits, while jitter prevents synchronized retries
+    when multiple agents hit limits simultaneously.

    Args:
        retries: Number of consecutive rate limit retries (0-indexed)

    Returns:
-        Delay in seconds (clamped to 1-3600 range)
+        Delay in seconds (clamped to 1-3600 range, with jitter)
    """
-    return int(min(max(60 * (2 ** retries), 1), 3600))
+    base = int(min(max(15 * (2 ** retries), 1), 3600))
+    jitter = random.uniform(0, base * 0.3)
+    return int(base + jitter)


 def calculate_error_backoff(retries: int) -> int: