mirror of https://github.com/czlonkowski/n8n-mcp.git synced 2026-01-30 06:22:04 +00:00

Files

czlonkowski 60ab66d64d feat: telemetry-driven quick wins to reduce AI agent validation errors by 30-40%

Enhanced tools documentation, duplicate ID errors, and AI Agent validator based on telemetry analysis of 593 validation errors across 3 categories:
- 378 errors: Duplicate node IDs (64%)
- 179 errors: AI Agent configuration (30%)
- 36 errors: Other validations (6%)

Quick Win #1: Enhanced tools documentation (src/mcp/tools-documentation.ts)
- Added prominent warnings to call get_node_essentials() FIRST before configuring nodes
- Emphasized 5KB vs 100KB+ size difference between essentials and full info
- Updated workflow patterns to prioritize essentials over get_node_info

Quick Win #2: Improved duplicate ID error messages (src/services/workflow-validator.ts)
- Added crypto import for UUID generation examples
- Enhanced error messages with node indices, names, and types
- Included crypto.randomUUID() example in error messages
- Helps AI agents understand EXACTLY which nodes conflict and how to fix

Quick Win #3: Added AI Agent node-specific validator (src/services/node-specific-validators.ts)
- Validates prompt configuration (promptType + text requirement)
- Checks maxIterations bounds (1-50 recommended)
- Suggests error handling (onError + retryOnFail)
- Warns about high iteration limits (cost/performance impact)
- Integrated into enhanced-config-validator.ts

Test Coverage:
- Added duplicate ID validation tests (workflow-validator.test.ts)
- Added AI Agent validator tests (node-specific-validators.test.ts:2312-2491)
- All new tests passing (3527 total passing)

Version: 2.22.12 → 2.22.13

Expected Impact: 30-40% reduction in AI agent validation errors

Technical Details:
- Telemetry analysis: 593 validation errors (Dec 2024 - Jan 2025)
- 100% error recovery rate maintained (validation working correctly)
- Root cause: Documentation/guidance gaps, not validation logic failures
- Solution: Proactive guidance at decision points

References:
- Telemetry analysis findings
- Issue #392 (helpful error messages pattern)
- Existing Slack validator pattern (node-specific-validators.ts:98-230)

Concieved by Romuald Członkowski - www.aiadvisors.pl/en

2025-11-08 18:07:26 +01:00

15 KiB

Raw Permalink Blame History

n8n-MCP Telemetry Analysis - Complete Index

Analysis Period: August 10 - November 8, 2025 (90 days) Report Date: November 8, 2025 Data Quality: High (506K+ events, 36/90 days with errors) Status: Critical Issues Identified - Action Required

Document Overview

This telemetry analysis consists of 5 comprehensive documents designed for different audiences and use cases.

Document Map

┌─────────────────────────────────────────────────────────────┐
│         TELEMETRY ANALYSIS COMPLETE PACKAGE                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. EXECUTIVE SUMMARY (this file + next level up)          │
│     ↓ Start here for quick overview                        │
│     └─→ TELEMETRY_EXECUTIVE_SUMMARY.md                     │
│         • For: Decision makers, leadership                 │
│         • Length: 5-10 minutes read                        │
│         • Contains: Key stats, risks, ROI                  │
│                                                             │
│  2. MAIN ANALYSIS REPORT                                   │
│     ↓ For comprehensive understanding                      │
│     └─→ TELEMETRY_ANALYSIS_REPORT.md                       │
│         • For: Product, engineering teams                  │
│         • Length: 30-45 minutes read                       │
│         • Contains: Detailed findings, patterns, trends    │
│                                                             │
│  3. TECHNICAL DEEP-DIVE                                    │
│     ↓ For root cause investigation                         │
│     └─→ TELEMETRY_TECHNICAL_DEEP_DIVE.md                   │
│         • For: Engineering team, architects                │
│         • Length: 45-60 minutes read                       │
│         • Contains: Root causes, hypotheses, gaps          │
│                                                             │
│  4. IMPLEMENTATION ROADMAP                                 │
│     ↓ For actionable next steps                            │
│     └─→ IMPLEMENTATION_ROADMAP.md                          │
│         • For: Engineering leads, project managers         │
│         • Length: 20-30 minutes read                       │
│         • Contains: Detailed implementation steps          │
│                                                             │
│  5. VISUALIZATION DATA                                     │
│     ↓ For presentations and dashboards                     │
│     └─→ TELEMETRY_DATA_FOR_VISUALIZATION.md                │
│         • For: All audiences (chart data)                  │
│         • Length: Reference material                       │
│         • Contains: Charts, graphs, metrics data           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

By Role

Executive Leadership / C-Level

Time Available: 5-10 minutes Priority: Understanding business impact

Start: TELEMETRY_EXECUTIVE_SUMMARY.md
Focus: Risk assessment, ROI, timeline
Reference: Key Statistics (below)

Product Management

Time Available: 30 minutes Priority: User impact, feature decisions

Start: TELEMETRY_ANALYSIS_REPORT.md (Section 1-3)
Then: TELEMETRY_TECHNICAL_DEEP_DIVE.md (Section 1-2)
Reference: TELEMETRY_DATA_FOR_VISUALIZATION.md (charts)

Engineering / DevOps

Time Available: 1-2 hours Priority: Root causes, implementation details

Start: TELEMETRY_TECHNICAL_DEEP_DIVE.md
Then: IMPLEMENTATION_ROADMAP.md
Reference: TELEMETRY_ANALYSIS_REPORT.md (for metrics)

Engineering Leads / Architects

Time Available: 2-3 hours Priority: System design, priority decisions

Start: TELEMETRY_ANALYSIS_REPORT.md (all sections)
Then: TELEMETRY_TECHNICAL_DEEP_DIVE.md (all sections)
Then: IMPLEMENTATION_ROADMAP.md
Reference: Visualization data for presentations

Customer Support / Success

Time Available: 20 minutes Priority: Common issues, user guidance

Start: TELEMETRY_EXECUTIVE_SUMMARY.md (Top 5 Issues section)
Then: TELEMETRY_ANALYSIS_REPORT.md (Section 6: Search Queries)
Reference: Top error messages list (below)

Marketing / Communications

Time Available: 15 minutes Priority: Messaging, external communications

Start: TELEMETRY_EXECUTIVE_SUMMARY.md
Focus: Business impact statement
Key message: "We're fixing critical issues this week"

Key Statistics Summary

Error Metrics

Metric	Value	Status
Total Errors (90 days)	8,859	Baseline
Daily Average	60.68	Stable
Peak Day	276 (Oct 30)	Outlier
ValidationError	3,080 (34.77%)	Largest
TypeError	2,767 (31.23%)	Second

Tool Performance

Metric	Value	Status
Critical Tool: get_node_info	11.72% failure	Action Required
Average Success Rate	98.4%	Good
Highest Risk Tools	5.5-6.4% failure	Monitor

Performance

Metric	Value	Status
Sequential Updates Latency	55.2 seconds	Bottleneck
Read-After-Write Latency	96.6 seconds	Bottleneck
Search Retry Rate	17%	High

User Engagement

Metric	Value	Status
Daily Sessions	895 avg	Healthy
Daily Users	572 avg	Healthy
Sessions per User	1.52 avg	Good

Top 5 Critical Issues

1. Workflow-Level Validation Failures (39% of errors)

File: TELEMETRY_ANALYSIS_REPORT.md, Section 2.1
Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 1.1
Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 1, Issue 1.2

2. `get_node_info` Unreliability (11.72% failure)

File: TELEMETRY_ANALYSIS_REPORT.md, Section 3.2
Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 4.1
Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 1, Issue 1.1

3. Slow Sequential Updates (55+ seconds)

File: TELEMETRY_ANALYSIS_REPORT.md, Section 4.1
Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 6.1
Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 1, Issue 1.3

4. Search Inefficiency (17% retry rate)

File: TELEMETRY_ANALYSIS_REPORT.md, Section 6.1
Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 6.3
Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 2, Issue 2.2

File: TELEMETRY_ANALYSIS_REPORT.md, Section 1.2
Detail: TELEMETRY_TECHNICAL_DEEP_DIVE.md, Section 2
Fix: IMPLEMENTATION_ROADMAP.md, Section Phase 2, Issue 2.3

Implementation Timeline

Week 1 (Immediate)

Expected Impact: 40-50% error reduction

Fix get_node_info reliability
- File: IMPLEMENTATION_ROADMAP.md, Phase 1, Issue 1.1
- Effort: 1 day
Improve validation error messages
- File: IMPLEMENTATION_ROADMAP.md, Phase 1, Issue 1.2
- Effort: 2 days
Add batch workflow update operation
- File: IMPLEMENTATION_ROADMAP.md, Phase 1, Issue 1.3
- Effort: 2-3 days

Week 2-3 (High Priority)

Expected Impact: +30% additional improvement

Implement validation caching
- File: IMPLEMENTATION_ROADMAP.md, Phase 2, Issue 2.1
- Effort: 1-2 days
Improve search ranking
- File: IMPLEMENTATION_ROADMAP.md, Phase 2, Issue 2.2
- Effort: 2 days
Add TypeScript types for top nodes
- File: IMPLEMENTATION_ROADMAP.md, Phase 2, Issue 2.3
- Effort: 3 days

Week 4 (Optimization)

Expected Impact: +10% additional improvement

Return updated state in responses
- File: IMPLEMENTATION_ROADMAP.md, Phase 3, Issue 3.1
- Effort: 1-2 days
Add workflow diff generation
- File: IMPLEMENTATION_ROADMAP.md, Phase 3, Issue 3.2
- Effort: 1-2 days

Key Findings by Category

Validation Issues

Most common error category (96.6% of all errors)
Workflow-level validation: 39.11% of validation errors
Generic error messages prevent self-resolution
See: TELEMETRY_ANALYSIS_REPORT.md, Section 2

Tool Reliability Issues

get_node_info critical (11.72% failure rate)
Information retrieval tools less reliable than state management tools
Validation tools consistently underperform (5.5-6.4% failure)
See: TELEMETRY_ANALYSIS_REPORT.md, Section 3 & TECHNICAL_DEEP_DIVE.md, Section 4

Performance Bottlenecks

Sequential operations extremely slow (55+ seconds)
Read-after-write pattern inefficient (96.6 seconds)
Search refinement rate high (17% need multiple searches)
See: TELEMETRY_ANALYSIS_REPORT.md, Section 4 & TECHNICAL_DEEP_DIVE.md, Section 6

User Behavior

Top searches: test (5.8K), webhook (5.1K), http (4.2K)
Most searches indicate where users struggle
Session metrics show healthy engagement
See: TELEMETRY_ANALYSIS_REPORT.md, Section 6

Temporal Patterns

Error rate volatile with significant spikes
October incident period with slow recovery
Currently stabilizing at 60-65 errors/day baseline
See: TELEMETRY_ANALYSIS_REPORT.md, Section 9 & TECHNICAL_DEEP_DIVE.md, Section 5

Metrics to Track Post-Implementation

Primary Success Metrics

get_node_info failure rate: 11.72% → <1%
Validation error clarity: Generic → Specific (95% have guidance)
Update latency: 55.2s → <5s
Overall error count: 8,859 → <2,000 per quarter

Secondary Metrics

Tool success rates across board: >99%
Search retry rate: 17% → <5%
Workflow validation time: <2 seconds
User satisfaction: +50% improvement

Dashboard Recommendations

See: TELEMETRY_DATA_FOR_VISUALIZATION.md, Section 14
Create live dashboard in Grafana/Datadog
Update daily; review weekly

SQL Queries Reference

All analysis derived from these core queries:

Error Analysis

-- Error type distribution
SELECT error_type, SUM(error_count) as total_occurrences
FROM telemetry_errors_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY error_type ORDER BY total_occurrences DESC;

-- Temporal trends
SELECT date, SUM(error_count) as daily_errors
FROM telemetry_errors_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date ORDER BY date DESC;

Tool Performance

-- Tool success rates
SELECT tool_name, SUM(usage_count), SUM(success_count),
  ROUND(100.0 * SUM(success_count) / SUM(usage_count), 2) as success_rate
FROM telemetry_tool_usage_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY tool_name
ORDER BY success_rate ASC;

Validation Errors

-- Validation errors by node type
SELECT node_type, error_type, SUM(error_count) as total
FROM telemetry_validation_errors_daily
WHERE date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY node_type, error_type
ORDER BY total DESC;

Complete query library in: TELEMETRY_ANALYSIS_REPORT.md, Section 12

FAQ

Q: Which document should I read first?

A: TELEMETRY_EXECUTIVE_SUMMARY.md (5 min) to understand the situation

Q: What's the most critical issue?

A: Workflow-level validation failures (39% of errors) with generic error messages that prevent users from self-fixing

Q: How long will fixes take?

A: Week 1: 40-50% improvement; Full implementation: 4-5 weeks

Q: What's the ROI?

A: ~26x return in first year; payback in <2 weeks

Q: Should we implement all recommendations?

A: Phase 1 (Week 1) is mandatory; Phase 2-3 are high-value optimization

Q: How confident are these findings?

A: Very high; based on 506K events across 90 days with consistent patterns

Q: What should support/success team do?

A: Review Section 6 of ANALYSIS_REPORT.md for top user pain points and search patterns

Additional Resources

For Presentations

Use TELEMETRY_DATA_FOR_VISUALIZATION.md for all chart/graph data
Recommend audience: TELEMETRY_EXECUTIVE_SUMMARY.md, Section "Stakeholder Questions & Answers"

For Team Meetings

Stand-up briefing: Key Statistics Summary (above)
Engineering sync: IMPLEMENTATION_ROADMAP.md
Product review: TELEMETRY_ANALYSIS_REPORT.md, Sections 1-3

For Documentation

User-facing docs: TELEMETRY_ANALYSIS_REPORT.md, Section 6 (search queries reveal documentation gaps)
Error code docs: IMPLEMENTATION_ROADMAP.md, Phase 4

For Monitoring

KPI dashboard: TELEMETRY_DATA_FOR_VISUALIZATION.md, Section 14
Alert thresholds: IMPLEMENTATION_ROADMAP.md, success metrics

Contact & Questions

Analysis Prepared By: AI Telemetry Analyst Date: November 8, 2025 Data Freshness: Last updated October 31, 2025 (daily updates) Review Frequency: Weekly recommended

For questions about specific findings, refer to:

Executive level: TELEMETRY_EXECUTIVE_SUMMARY.md
Technical details: TELEMETRY_TECHNICAL_DEEP_DIVE.md
Implementation: IMPLEMENTATION_ROADMAP.md

Document Checklist

Use this checklist to ensure you've reviewed appropriate documents:

Essential Reading (Everyone)

TELEMETRY_EXECUTIVE_SUMMARY.md (5-10 min)
Top 5 Issues section above (5 min)

Role-Specific

Leadership: TELEMETRY_EXECUTIVE_SUMMARY.md (Risk & ROI sections)
Engineering: TELEMETRY_TECHNICAL_DEEP_DIVE.md (all sections)
Product: TELEMETRY_ANALYSIS_REPORT.md (Sections 1-3)
Project Manager: IMPLEMENTATION_ROADMAP.md (Timeline section)
Support: TELEMETRY_ANALYSIS_REPORT.md (Section 6: Search Queries)

For Implementation

IMPLEMENTATION_ROADMAP.md (all sections)
TELEMETRY_TECHNICAL_DEEP_DIVE.md (root cause analysis)

For Presentations

TELEMETRY_DATA_FOR_VISUALIZATION.md (all chart data)
TELEMETRY_EXECUTIVE_SUMMARY.md (key statistics)

Version History

Version	Date	Changes
1.0	Nov 8, 2025	Initial comprehensive analysis

Next Steps

Today: Review TELEMETRY_EXECUTIVE_SUMMARY.md
Tomorrow: Schedule team review meeting
This Week: Estimate Phase 1 implementation effort
Next Week: Begin Phase 1 development

Status: Analysis Complete - Ready for Action

All documents are located in: /Users/romualdczlonkowski/Pliki/n8n-mcp/n8n-mcp/

Files:

TELEMETRY_ANALYSIS_INDEX.md (this file)
TELEMETRY_EXECUTIVE_SUMMARY.md
TELEMETRY_ANALYSIS_REPORT.md
TELEMETRY_TECHNICAL_DEEP_DIVE.md
IMPLEMENTATION_ROADMAP.md
TELEMETRY_DATA_FOR_VISUALIZATION.md

15 KiB Raw Permalink Blame History

n8n-MCP Telemetry Analysis - Complete Index

Navigation Guide for All Analysis Documents

Document Overview

Document Map

Quick Navigation

By Role

Executive Leadership / C-Level

Product Management

Engineering / DevOps

Engineering Leads / Architects

Customer Support / Success

Marketing / Communications

Key Statistics Summary

Error Metrics

Tool Performance

Performance

User Engagement

Top 5 Critical Issues

1. Workflow-Level Validation Failures (39% of errors)

2. get_node_info Unreliability (11.72% failure)

3. Slow Sequential Updates (55+ seconds)

4. Search Inefficiency (17% retry rate)

5. Type-Related Validation Errors (31.23% of errors)

Implementation Timeline

Week 1 (Immediate)

Week 2-3 (High Priority)

Week 4 (Optimization)

Key Findings by Category

Validation Issues

Tool Reliability Issues

Performance Bottlenecks

User Behavior

Temporal Patterns

Metrics to Track Post-Implementation

Primary Success Metrics

Secondary Metrics

Dashboard Recommendations

SQL Queries Reference

Error Analysis

Tool Performance

Validation Errors

FAQ

Q: Which document should I read first?

Q: What's the most critical issue?

Q: How long will fixes take?

Q: What's the ROI?

Q: Should we implement all recommendations?

Q: How confident are these findings?

Q: What should support/success team do?

Additional Resources

For Presentations

For Team Meetings

For Documentation

For Monitoring

Contact & Questions

Document Checklist

Essential Reading (Everyone)

Role-Specific

For Implementation

For Presentations

Version History

Next Steps

15 KiB

Raw Permalink Blame History

2. `get_node_info` Unreliability (11.72% failure)