- Enhance E2E testing and LLM analysis report and: - Add --analyze-log flag to run_e2e.sh to re-run LLM analysis on existing logs. - Add test:e2e and analyze-log scripts to package.json for easier execution. - Correct display errors and dependency validation output: - Update chalk usage in add-task.js to use bracket notation (chalk[color]) compatible with v5, resolving 'chalk.keyword is not a function' error. - Modify fix-dependencies command output to show red failure box with issue count instead of green success box when validation fails. - Refactor interactive model setup: - Verify inclusion of 'No change' option during interactive model setup flow (task-master models --setup). - Update model definitions: - Add max_tokens field for gpt-4o in supported-models.json. - Remove unused scripts: - Delete prepare-package.js and rule-transformer.test.js. Release candidate
162 lines
7.6 KiB
Bash
162 lines
7.6 KiB
Bash
#!/bin/bash
|
|
|
|
# --- LLM Analysis Helper Function ---
|
|
# This function should be sourced by the main E2E script or test scripts.
|
|
# It requires curl and jq to be installed.
|
|
# It expects the project root path to be passed as the second argument.
|
|
|
|
analyze_log_with_llm() {
|
|
local log_file="$1"
|
|
local project_root="$2" # Expect project root as the second argument
|
|
|
|
if [ -z "$project_root" ]; then
|
|
echo "[HELPER_ERROR] Project root argument is missing. Skipping LLM analysis." >&2
|
|
return 1
|
|
fi
|
|
|
|
local env_file="${project_root}/.env" # Path to .env in project root
|
|
|
|
local provider_summary_log="provider_add_task_summary.log" # File summarizing provider test outcomes
|
|
local api_key=""
|
|
# !!! IMPORTANT: Replace with your actual Claude API endpoint if different !!!
|
|
local api_endpoint="https://api.anthropic.com/v1/messages"
|
|
# !!! IMPORTANT: Ensure this matches the variable name in your .env file !!!
|
|
local api_key_name="ANTHROPIC_API_KEY"
|
|
|
|
echo "" # Add a newline before analysis starts
|
|
|
|
# Check for jq and curl
|
|
if ! command -v jq &> /dev/null; then
|
|
echo "[HELPER_ERROR] LLM Analysis requires 'jq'. Skipping analysis." >&2
|
|
return 1
|
|
fi
|
|
if ! command -v curl &> /dev/null; then
|
|
echo "[HELPER_ERROR] LLM Analysis requires 'curl'. Skipping analysis." >&2
|
|
return 1
|
|
fi
|
|
|
|
# Check for API Key in the PROJECT ROOT's .env file
|
|
if [ -f "$env_file" ]; then
|
|
# Original assignment - Reading from project root .env
|
|
api_key=$(grep "^${api_key_name}=" "$env_file" | sed -e "s/^${api_key_name}=//" -e 's/^[[:space:]"]*//' -e 's/[[:space:]"]*$//')
|
|
fi
|
|
|
|
if [ -z "$api_key" ]; then
|
|
echo "[HELPER_ERROR] ${api_key_name} not found or empty in project root .env file ($env_file). Skipping LLM analysis." >&2 # Updated error message
|
|
return 1
|
|
fi
|
|
|
|
# Log file path is passed as argument, need to ensure it exists relative to where the script *calling* this function is, OR use absolute path.
|
|
# Assuming absolute path or path relative to the initial PWD for simplicity here.
|
|
# The calling script passes the correct path relative to the original PWD.
|
|
if [ ! -f "$log_file" ]; then
|
|
echo "[HELPER_ERROR] Log file not found: $log_file (PWD: $(pwd)). Check path passed to function. Skipping LLM analysis." >&2 # Updated error
|
|
return 1
|
|
fi
|
|
|
|
local log_content
|
|
# Read entire file, handle potential errors
|
|
log_content=$(cat "$log_file") || {
|
|
echo "[HELPER_ERROR] Failed to read log file: $log_file. Skipping LLM analysis." >&2
|
|
return 1
|
|
}
|
|
|
|
# Prepare the prompt using a quoted heredoc for literal interpretation
|
|
read -r -d '' prompt_template <<'EOF'
|
|
Analyze the following E2E test log for the task-master tool. The log contains output from various 'task-master' commands executed sequentially.
|
|
|
|
Your goal is to:
|
|
1. Verify if the key E2E steps completed successfully based on the log messages (e.g., init, parse PRD, list tasks, analyze complexity, expand task, set status, manage models, add/remove dependencies, add/update/remove tasks/subtasks, generate files).
|
|
2. **Specifically analyze the Multi-Provider Add-Task Test Sequence:**
|
|
a. Identify which providers were tested for `add-task`. Look for log steps like "Testing Add-Task with Provider: ..." and the summary log 'provider_add_task_summary.log'.
|
|
b. For each tested provider, determine if `add-task` succeeded or failed. Note the created task ID if successful.
|
|
c. Review the corresponding `add_task_show_output_<provider>_id_<id>.log` file (if created) for each successful `add-task` execution.
|
|
d. **Compare the quality and completeness** of the task generated by each successful provider based on their `show` output. Assign a score (e.g., 1-10, 10 being best) based on relevance to the prompt, detail level, and correctness.
|
|
e. Note any providers where `add-task` failed or where the task ID could not be extracted.
|
|
3. Identify any general explicit "[ERROR]" messages or stack traces throughout the *entire* log.
|
|
4. Identify any potential warnings or unusual output that might indicate a problem even if not marked as an explicit error.
|
|
5. Provide an overall assessment of the test run's health based *only* on the log content.
|
|
|
|
Return your analysis **strictly** in the following JSON format. Do not include any text outside of the JSON structure:
|
|
|
|
{
|
|
"overall_status": "Success|Failure|Warning",
|
|
"verified_steps": [ "Initialization", "PRD Parsing", /* ...other general steps observed... */ ],
|
|
"provider_add_task_comparison": {
|
|
"prompt_used": "... (extract from log if possible or state 'standard auth prompt') ...",
|
|
"provider_results": {
|
|
"anthropic": { "status": "Success|Failure|ID_Extraction_Failed|Set_Model_Failed", "task_id": "...", "score": "X/10 | N/A", "notes": "..." },
|
|
"openai": { "status": "Success|Failure|...", "task_id": "...", "score": "X/10 | N/A", "notes": "..." },
|
|
/* ... include all tested providers ... */
|
|
},
|
|
"comparison_summary": "Brief overall comparison of generated tasks..."
|
|
},
|
|
"detected_issues": [ { "severity": "Error|Warning|Anomaly", "description": "...", "log_context": "[Optional, short snippet from log near the issue]" } ],
|
|
"llm_summary_points": [ "Overall summary point 1", "Provider comparison highlight", "Any major issues noted" ]
|
|
}
|
|
|
|
Here is the main log content:
|
|
|
|
%s
|
|
EOF
|
|
# Note: The final %s is a placeholder for printf later
|
|
|
|
local full_prompt
|
|
# Use printf to substitute the log content into the %s placeholder
|
|
if ! printf -v full_prompt "$prompt_template" "$log_content"; then
|
|
echo "[HELPER_ERROR] Failed to format prompt using printf." >&2
|
|
# It's unlikely printf itself fails, but good practice
|
|
return 1
|
|
fi
|
|
|
|
# Construct the JSON payload for Claude Messages API
|
|
local payload
|
|
payload=$(jq -n --arg prompt "$full_prompt" '{
|
|
"model": "claude-3-haiku-20240307", # Using Haiku for faster/cheaper testing
|
|
"max_tokens": 3072, # Increased slightly
|
|
"messages": [
|
|
{"role": "user", "content": $prompt}
|
|
]
|
|
# "temperature": 0.0 # Optional: Lower temperature for more deterministic JSON output
|
|
}') || {
|
|
echo "[HELPER_ERROR] Failed to create JSON payload using jq." >&2
|
|
return 1
|
|
}
|
|
|
|
local response_raw response_http_code response_body
|
|
# Capture body and HTTP status code separately
|
|
response_raw=$(curl -s -w "\nHTTP_STATUS_CODE:%{http_code}" -X POST "$api_endpoint" \
|
|
-H "Content-Type: application/json" \
|
|
-H "x-api-key: $api_key" \
|
|
-H "anthropic-version: 2023-06-01" \
|
|
--data "$payload")
|
|
|
|
# Extract status code and body
|
|
response_http_code=$(echo "$response_raw" | grep '^HTTP_STATUS_CODE:' | sed 's/HTTP_STATUS_CODE://')
|
|
response_body=$(echo "$response_raw" | sed '$d') # Remove last line (status code)
|
|
|
|
if [ "$response_http_code" != "200" ]; then
|
|
echo "[HELPER_ERROR] LLM API call failed with HTTP status $response_http_code." >&2
|
|
echo "[HELPER_ERROR] Response Body: $response_body" >&2
|
|
return 1
|
|
fi
|
|
|
|
if [ -z "$response_body" ]; then
|
|
echo "[HELPER_ERROR] LLM API call returned empty response body." >&2
|
|
return 1
|
|
fi
|
|
|
|
# Pipe the raw response body directly to the Node.js parser script
|
|
if echo "$response_body" | node "${project_root}/tests/e2e/parse_llm_output.cjs" "$log_file"; then
|
|
echo "[HELPER_SUCCESS] LLM analysis parsed and printed successfully by Node.js script."
|
|
return 0 # Success
|
|
else
|
|
local node_exit_code=$?
|
|
echo "[HELPER_ERROR] Node.js parsing script failed with exit code ${node_exit_code}."
|
|
echo "[HELPER_ERROR] Raw API response body (first 500 chars): $(echo "$response_body" | head -c 500)"
|
|
return 1 # Failure
|
|
fi
|
|
}
|
|
|
|
# Export the function so it might be available to subshells if sourced
|
|
export -f analyze_log_with_llm |