Files
n8n-mcp/TELEMETRY_ANALYSIS_INDEX.md
czlonkowski 60ab66d64d feat: telemetry-driven quick wins to reduce AI agent validation errors by 30-40%
Enhanced tools documentation, duplicate ID errors, and AI Agent validator based on telemetry analysis of 593 validation errors across 3 categories:
- 378 errors: Duplicate node IDs (64%)
- 179 errors: AI Agent configuration (30%)
- 36 errors: Other validations (6%)

Quick Win #1: Enhanced tools documentation (src/mcp/tools-documentation.ts)
- Added prominent warnings to call get_node_essentials() FIRST before configuring nodes
- Emphasized 5KB vs 100KB+ size difference between essentials and full info
- Updated workflow patterns to prioritize essentials over get_node_info

Quick Win #2: Improved duplicate ID error messages (src/services/workflow-validator.ts)
- Added crypto import for UUID generation examples
- Enhanced error messages with node indices, names, and types
- Included crypto.randomUUID() example in error messages
- Helps AI agents understand EXACTLY which nodes conflict and how to fix

Quick Win #3: Added AI Agent node-specific validator (src/services/node-specific-validators.ts)
- Validates prompt configuration (promptType + text requirement)
- Checks maxIterations bounds (1-50 recommended)
- Suggests error handling (onError + retryOnFail)
- Warns about high iteration limits (cost/performance impact)
- Integrated into enhanced-config-validator.ts

Test Coverage:
- Added duplicate ID validation tests (workflow-validator.test.ts)
- Added AI Agent validator tests (node-specific-validators.test.ts:2312-2491)
- All new tests passing (3527 total passing)

Version: 2.22.12 → 2.22.13

Expected Impact: 30-40% reduction in AI agent validation errors

Technical Details:
- Telemetry analysis: 593 validation errors (Dec 2024 - Jan 2025)
- 100% error recovery rate maintained (validation working correctly)
- Root cause: Documentation/guidance gaps, not validation logic failures
- Solution: Proactive guidance at decision points

References:
- Telemetry analysis findings
- Issue #392 (helpful error messages pattern)
- Existing Slack validator pattern (node-specific-validators.ts:98-230)

Concieved by Romuald Członkowski - www.aiadvisors.pl/en
2025-11-08 18:07:26 +01:00

15 KiB

n8n-MCP Telemetry Analysis - Complete Index

Navigation Guide for All Analysis Documents

Analysis Period: August 10 - November 8, 2025 (90 days) Report Date: November 8, 2025 Data Quality: High (506K+ events, 36/90 days with errors) Status: Critical Issues Identified - Action Required


Document Overview

This telemetry analysis consists of 5 comprehensive documents designed for different audiences and use cases.

Document Map

┌─────────────────────────────────────────────────────────────┐
│         TELEMETRY ANALYSIS COMPLETE PACKAGE                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. EXECUTIVE SUMMARY (this file + next level up)          │
│     ↓ Start here for quick overview                        │
│     └─→ TELEMETRY_EXECUTIVE_SUMMARY.md                     │
│         • For: Decision makers, leadership                 │
│         • Length: 5-10 minutes read                        │
│         • Contains: Key stats, risks, ROI                  │
│                                                             │
│  2. MAIN ANALYSIS REPORT                                   │
│     ↓ For comprehensive understanding                      │
│     └─→ TELEMETRY_ANALYSIS_REPORT.md                       │
│         • For: Product, engineering teams                  │
│         • Length: 30-45 minutes read                       │
│         • Contains: Detailed findings, patterns, trends    │
│                                                             │
│  3. TECHNICAL DEEP-DIVE                                    │
│     ↓ For root cause investigation                         │
│     └─→ TELEMETRY_TECHNICAL_DEEP_DIVE.md                   │
│         • For: Engineering team, architects                │
│         • Length: 45-60 minutes read                       │
│         • Contains: Root causes, hypotheses, gaps          │
│                                                             │
│  4. IMPLEMENTATION ROADMAP                                 │
│     ↓ For actionable next steps                            │
│     └─→ IMPLEMENTATION_ROADMAP.md                          │
│         • For: Engineering leads, project managers         │
│         • Length: 20-30 minutes read                       │
│         • Contains: Detailed implementation steps          │
│                                                             │
│  5. VISUALIZATION DATA                                     │
│     ↓ For presentations and dashboards                     │
│     └─→ TELEMETRY_DATA_FOR_VISUALIZATION.md                │
│         • For: All audiences (chart data)                  │
│         • Length: Reference material                       │
│         • Contains: Charts, graphs, metrics data           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Navigation

By Role

Executive Leadership / C-Level

Time Available: 5-10 minutes Priority: Understanding business impact

  1. Start: TELEMETRY_EXECUTIVE_SUMMARY.md
  2. Focus: Risk assessment, ROI, timeline
  3. Reference: Key Statistics (below)

Product Management

Time Available: 30 minutes Priority: User impact, feature decisions

  1. Start: TELEMETRY_ANALYSIS_REPORT.md (Section 1-3)
  2. Then: TELEMETRY_TECHNICAL_DEEP_DIVE.md (Section 1-2)
  3. Reference: TELEMETRY_DATA_FOR_VISUALIZATION.md (charts)

Engineering / DevOps

Time Available: 1-2 hours Priority: Root causes, implementation details

  1. Start: TELEMETRY_TECHNICAL_DEEP_DIVE.md
  2. Then: IMPLEMENTATION_ROADMAP.md
  3. Reference: TELEMETRY_ANALYSIS_REPORT.md (for metrics)

Engineering Leads / Architects

Time Available: 2-3 hours Priority: System design, priority decisions

  1. Start: TELEMETRY_ANALYSIS_REPORT.md (all sections)
  2. Then: TELEMETRY_TECHNICAL_DEEP_DIVE.md (all sections)
  3. Then: IMPLEMENTATION_ROADMAP.md
  4. Reference: Visualization data for presentations

Customer Support / Success

Time Available: 20 minutes Priority: Common issues, user guidance

  1. Start: TELEMETRY_EXECUTIVE_SUMMARY.md (Top 5 Issues section)
  2. Then: TELEMETRY_ANALYSIS_REPORT.md (Section 6: Search Queries)
  3. Reference: Top error messages list (below)

Marketing / Communications

Time Available: 15 minutes Priority: Messaging, external communications

  1. Start: TELEMETRY_EXECUTIVE_SUMMARY.md
  2. Focus: Business impact statement
  3. Key message: "We're fixing critical issues this week"

Key Statistics Summary

Error Metrics

Metric Value Status
Total Errors (90 days) 8,859 Baseline
Daily Average 60.68 Stable
Peak Day 276 (Oct 30) Outlier
ValidationError 3,080 (34.77%) Largest
TypeError 2,767 (31.23%) Second

Tool Performance

Metric Value Status
Critical Tool: get_node_info 11.72% failure Action Required
Average Success Rate 98.4% Good
Highest Risk Tools 5.5-6.4% failure Monitor

Performance

Metric Value Status
Sequential Updates Latency 55.2 seconds Bottleneck
Read-After-Write Latency 96.6 seconds Bottleneck
Search Retry Rate 17% High

User Engagement

Metric Value Status
Daily Sessions 895 avg Healthy
Daily Users 572 avg Healthy
Sessions per User 1.52 avg Good

Top 5 Critical Issues

1. Workflow-Level Validation Failures (39% of errors)

  • File: TELEMETRY_ANALYSIS_REPORT.md, Section 2.1
  • Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 1.1
  • Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 1, Issue 1.2

2. get_node_info Unreliability (11.72% failure)

  • File: TELEMETRY_ANALYSIS_REPORT.md, Section 3.2
  • Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 4.1
  • Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 1, Issue 1.1

3. Slow Sequential Updates (55+ seconds)

  • File: TELEMETRY_ANALYSIS_REPORT.md, Section 4.1
  • Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 6.1
  • Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 1, Issue 1.3

4. Search Inefficiency (17% retry rate)

  • File: TELEMETRY_ANALYSIS_REPORT.md, Section 6.1
  • Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 6.3
  • Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 2, Issue 2.2
  • File: TELEMETRY_ANALYSIS_REPORT.md, Section 1.2
  • Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 2
  • Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 2, Issue 2.3

Implementation Timeline

Week 1 (Immediate)

Expected Impact: 40-50% error reduction

  1. Fix get_node_info reliability

    • File: IMPLEMENTATION_ROADMAP.md, Phase 1, Issue 1.1
    • Effort: 1 day
  2. Improve validation error messages

    • File: IMPLEMENTATION_ROADMAP.md, Phase 1, Issue 1.2
    • Effort: 2 days
  3. Add batch workflow update operation

    • File: IMPLEMENTATION_ROADMAP.md, Phase 1, Issue 1.3
    • Effort: 2-3 days

Week 2-3 (High Priority)

Expected Impact: +30% additional improvement

  1. Implement validation caching

    • File: IMPLEMENTATION_ROADMAP.md, Phase 2, Issue 2.1
    • Effort: 1-2 days
  2. Improve search ranking

    • File: IMPLEMENTATION_ROADMAP.md, Phase 2, Issue 2.2
    • Effort: 2 days
  3. Add TypeScript types for top nodes

    • File: IMPLEMENTATION_ROADMAP.md, Phase 2, Issue 2.3
    • Effort: 3 days

Week 4 (Optimization)

Expected Impact: +10% additional improvement

  1. Return updated state in responses

    • File: IMPLEMENTATION_ROADMAP.md, Phase 3, Issue 3.1
    • Effort: 1-2 days
  2. Add workflow diff generation

    • File: IMPLEMENTATION_ROADMAP.md, Phase 3, Issue 3.2
    • Effort: 1-2 days

Key Findings by Category

Validation Issues

  • Most common error category (96.6% of all errors)
  • Workflow-level validation: 39.11% of validation errors
  • Generic error messages prevent self-resolution
  • See: TELEMETRY_ANALYSIS_REPORT.md, Section 2

Tool Reliability Issues

  • get_node_info critical (11.72% failure rate)
  • Information retrieval tools less reliable than state management tools
  • Validation tools consistently underperform (5.5-6.4% failure)
  • See: TELEMETRY_ANALYSIS_REPORT.md, Section 3 & TECHNICAL_DEEP_DIVE.md, Section 4

Performance Bottlenecks

  • Sequential operations extremely slow (55+ seconds)
  • Read-after-write pattern inefficient (96.6 seconds)
  • Search refinement rate high (17% need multiple searches)
  • See: TELEMETRY_ANALYSIS_REPORT.md, Section 4 & TECHNICAL_DEEP_DIVE.md, Section 6

User Behavior

  • Top searches: test (5.8K), webhook (5.1K), http (4.2K)
  • Most searches indicate where users struggle
  • Session metrics show healthy engagement
  • See: TELEMETRY_ANALYSIS_REPORT.md, Section 6

Temporal Patterns

  • Error rate volatile with significant spikes
  • October incident period with slow recovery
  • Currently stabilizing at 60-65 errors/day baseline
  • See: TELEMETRY_ANALYSIS_REPORT.md, Section 9 & TECHNICAL_DEEP_DIVE.md, Section 5

Metrics to Track Post-Implementation

Primary Success Metrics

  1. get_node_info failure rate: 11.72% → <1%
  2. Validation error clarity: Generic → Specific (95% have guidance)
  3. Update latency: 55.2s → <5s
  4. Overall error count: 8,859 → <2,000 per quarter

Secondary Metrics

  1. Tool success rates across board: >99%
  2. Search retry rate: 17% → <5%
  3. Workflow validation time: <2 seconds
  4. User satisfaction: +50% improvement

Dashboard Recommendations

  • See: TELEMETRY_DATA_FOR_VISUALIZATION.md, Section 14
  • Create live dashboard in Grafana/Datadog
  • Update daily; review weekly

SQL Queries Reference

All analysis derived from these core queries:

Error Analysis

-- Error type distribution
SELECT error_type, SUM(error_count) as total_occurrences
FROM telemetry_errors_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY error_type ORDER BY total_occurrences DESC;

-- Temporal trends
SELECT date, SUM(error_count) as daily_errors
FROM telemetry_errors_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date ORDER BY date DESC;

Tool Performance

-- Tool success rates
SELECT tool_name, SUM(usage_count), SUM(success_count),
  ROUND(100.0 * SUM(success_count) / SUM(usage_count), 2) as success_rate
FROM telemetry_tool_usage_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY tool_name
ORDER BY success_rate ASC;

Validation Errors

-- Validation errors by node type
SELECT node_type, error_type, SUM(error_count) as total
FROM telemetry_validation_errors_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY node_type, error_type
ORDER BY total DESC;

Complete query library in: TELEMETRY_ANALYSIS_REPORT.md, Section 12


FAQ

Q: Which document should I read first?

A: TELEMETRY_EXECUTIVE_SUMMARY.md (5 min) to understand the situation

Q: What's the most critical issue?

A: Workflow-level validation failures (39% of errors) with generic error messages that prevent users from self-fixing

Q: How long will fixes take?

A: Week 1: 40-50% improvement; Full implementation: 4-5 weeks

Q: What's the ROI?

A: ~26x return in first year; payback in <2 weeks

Q: Should we implement all recommendations?

A: Phase 1 (Week 1) is mandatory; Phase 2-3 are high-value optimization

Q: How confident are these findings?

A: Very high; based on 506K events across 90 days with consistent patterns

Q: What should support/success team do?

A: Review Section 6 of ANALYSIS_REPORT.md for top user pain points and search patterns


Additional Resources

For Presentations

  • Use TELEMETRY_DATA_FOR_VISUALIZATION.md for all chart/graph data
  • Recommend audience: TELEMETRY_EXECUTIVE_SUMMARY.md, Section "Stakeholder Questions & Answers"

For Team Meetings

  • Stand-up briefing: Key Statistics Summary (above)
  • Engineering sync: IMPLEMENTATION_ROADMAP.md
  • Product review: TELEMETRY_ANALYSIS_REPORT.md, Sections 1-3

For Documentation

  • User-facing docs: TELEMETRY_ANALYSIS_REPORT.md, Section 6 (search queries reveal documentation gaps)
  • Error code docs: IMPLEMENTATION_ROADMAP.md, Phase 4

For Monitoring

  • KPI dashboard: TELEMETRY_DATA_FOR_VISUALIZATION.md, Section 14
  • Alert thresholds: IMPLEMENTATION_ROADMAP.md, success metrics

Contact & Questions

Analysis Prepared By: AI Telemetry Analyst Date: November 8, 2025 Data Freshness: Last updated October 31, 2025 (daily updates) Review Frequency: Weekly recommended

For questions about specific findings, refer to:

  • Executive level: TELEMETRY_EXECUTIVE_SUMMARY.md
  • Technical details: TELEMETRY_TECHNICAL_DEEP_DIVE.md
  • Implementation: IMPLEMENTATION_ROADMAP.md

Document Checklist

Use this checklist to ensure you've reviewed appropriate documents:

Essential Reading (Everyone)

  • TELEMETRY_EXECUTIVE_SUMMARY.md (5-10 min)
  • Top 5 Issues section above (5 min)

Role-Specific

  • Leadership: TELEMETRY_EXECUTIVE_SUMMARY.md (Risk & ROI sections)
  • Engineering: TELEMETRY_TECHNICAL_DEEP_DIVE.md (all sections)
  • Product: TELEMETRY_ANALYSIS_REPORT.md (Sections 1-3)
  • Project Manager: IMPLEMENTATION_ROADMAP.md (Timeline section)
  • Support: TELEMETRY_ANALYSIS_REPORT.md (Section 6: Search Queries)

For Implementation

  • IMPLEMENTATION_ROADMAP.md (all sections)
  • TELEMETRY_TECHNICAL_DEEP_DIVE.md (root cause analysis)

For Presentations

  • TELEMETRY_DATA_FOR_VISUALIZATION.md (all chart data)
  • TELEMETRY_EXECUTIVE_SUMMARY.md (key statistics)

Version History

Version Date Changes
1.0 Nov 8, 2025 Initial comprehensive analysis

Next Steps

  1. Today: Review TELEMETRY_EXECUTIVE_SUMMARY.md
  2. Tomorrow: Schedule team review meeting
  3. This Week: Estimate Phase 1 implementation effort
  4. Next Week: Begin Phase 1 development

Status: Analysis Complete - Ready for Action

All documents are located in: /Users/romualdczlonkowski/Pliki/n8n-mcp/n8n-mcp/

Files:

  • TELEMETRY_ANALYSIS_INDEX.md (this file)
  • TELEMETRY_EXECUTIVE_SUMMARY.md
  • TELEMETRY_ANALYSIS_REPORT.md
  • TELEMETRY_TECHNICAL_DEEP_DIVE.md
  • IMPLEMENTATION_ROADMAP.md
  • TELEMETRY_DATA_FOR_VISUALIZATION.md