mirror of https://github.com/czlonkowski/n8n-mcp.git synced 2026-01-30 06:22:04 +00:00

Files

czlonkowski 1e586c0b23 feat: add template metadata generation and smart discovery

- Implement OpenAI batch API integration for metadata generation
- Add search_templates_by_metadata tool with advanced filtering
- Enhance list_templates to include descriptions and optional metadata
- Generate metadata for 2,534 templates (97.5% coverage)
- Update README with Template Tools section and enhanced Claude setup
- Add comprehensive documentation for metadata system

Enables intelligent template discovery through:
- Complexity levels (simple/medium/complex)
- Setup time estimates (5-480 minutes)
- Target audience filtering (developers/marketers/analysts)
- Required services detection
- Category and use case classification

Co-Authored-By: Claude <noreply@anthropic.com>

2025-09-15 00:18:53 +02:00

8.1 KiB

Raw Blame History

Template Metadata Generation

This document describes the template metadata generation system introduced in n8n-MCP v2.10.0, which uses OpenAI's batch API to automatically analyze and categorize workflow templates.

Overview

The template metadata system analyzes n8n workflow templates to extract structured information about their purpose, complexity, requirements, and target audience. This enables intelligent template discovery through advanced filtering capabilities.

Architecture

Components

MetadataGenerator (src/templates/metadata-generator.ts)
- Interfaces with OpenAI API
- Generates structured metadata using JSON schemas
- Provides fallback defaults for error cases
BatchProcessor (src/templates/batch-processor.ts)
- Manages OpenAI batch API operations
- Handles parallel batch submission
- Monitors batch status and retrieves results
Template Repository (src/templates/template-repository.ts)
- Stores metadata in SQLite database
- Provides advanced search capabilities
- Supports JSON extraction queries

Metadata Schema

Each template's metadata contains:

{
  categories: string[]           // Max 5 categories (e.g., "automation", "integration")
  complexity: "simple" | "medium" | "complex"
  use_cases: string[]           // Max 5 primary use cases
  estimated_setup_minutes: number // 5-480 minutes
  required_services: string[]    // External services needed
  key_features: string[]        // Max 5 main capabilities
  target_audience: string[]     // Max 3 target user types
}

Generation Process

1. Initial Setup

# Set OpenAI API key in .env
OPENAI_API_KEY=your-api-key-here

2. Generate Metadata for Existing Templates

# Generate metadata only (no template fetching)
npm run fetch:templates -- --metadata-only

# Generate metadata during update
npm run fetch:templates -- --mode=update --generate-metadata

3. Batch Processing

The system uses OpenAI's batch API for cost-effective processing:

50% cost reduction compared to synchronous API calls
24-hour processing window for batch completion
Parallel batch submission for faster processing
Automatic retry for failed items

Configuration Options

Environment variables:

OPENAI_API_KEY: Required for metadata generation
OPENAI_MODEL: Model to use (default: "gpt-4o-mini")
OPENAI_BATCH_SIZE: Templates per batch (default: 100, max: 500)
METADATA_LIMIT: Limit templates to process (for testing)

How It Works

1. Template Analysis

For each template, the generator analyzes:

Template name and description
Node types and their frequency
Workflow structure and connections
Overall complexity

2. Node Summarization

Nodes are grouped into categories:

HTTP/Webhooks
Database operations
Communication (Slack, Email)
AI/ML operations
Spreadsheets
Service-specific nodes

3. Metadata Generation

The AI model receives:

Template: [name]
Description: [description]
Nodes Used (X): [summarized node list]
Workflow has X nodes with Y connections

And generates structured metadata following the JSON schema.

4. Storage and Indexing

Metadata is stored as JSON in SQLite and indexed for fast querying:

-- Example query for simple automation templates
SELECT * FROM templates 
WHERE json_extract(metadata, '$.complexity') = 'simple'
AND json_extract(metadata, '$.categories') LIKE '%automation%'

MCP Tool Integration

search_templates_by_metadata

Advanced filtering tool with multiple parameters:

search_templates_by_metadata({
  category: "automation",           // Filter by category
  complexity: "simple",             // Skill level
  maxSetupMinutes: 30,             // Time constraint
  targetAudience: "marketers",     // Role-based
  requiredService: "slack"         // Service dependency
})

list_templates

Enhanced to include metadata:

list_templates({
  includeMetadata: true,  // Include full metadata
  limit: 20,
  offset: 0
})

Usage Examples

Finding Beginner-Friendly Templates

const templates = await search_templates_by_metadata({
  complexity: "simple",
  maxSetupMinutes: 15
});

Role-Specific Templates

const marketingTemplates = await search_templates_by_metadata({
  targetAudience: "marketers",
  category: "communication"
});

Service Integration Templates

const openaiTemplates = await search_templates_by_metadata({
  requiredService: "openai",
  complexity: "medium"
});

Performance Metrics

Coverage: 97.5% of templates have metadata (2,534/2,598)
Generation Time: ~2-4 hours for full database (using batch API)
Query Performance: <100ms for metadata searches
Storage Overhead: ~2MB additional database size

Troubleshooting

Common Issues

Batch Processing Stuck
- Check batch status: The API provides status updates
- Batches auto-expire after 24 hours
- Monitor using the batch ID in logs
Missing Metadata
- ~2.5% of templates may fail metadata generation
- Fallback defaults are provided
- Can regenerate with --metadata-only flag
API Rate Limits
- Batch API has generous limits (50,000 requests/batch)
- Cost is 50% of synchronous API
- Processing happens within 24-hour window

Monitoring Batch Status

# Check current batch status (if logged)
curl https://api.openai.com/v1/batches/[batch-id] \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Cost Analysis

Batch API Pricing (gpt-4o-mini)

Input: $0.075 per 1M tokens (50% of standard)
Output: $0.30 per 1M tokens (50% of standard)
Average template: ~300 input tokens, ~200 output tokens
Total cost for 2,500 templates: ~$0.50

Comparison with Synchronous API

Synchronous cost: ~$1.00 for same volume
Time saved: Parallel processing vs sequential
Reliability: Automatic retries included

Future Enhancements

Planned Improvements

Incremental Updates
- Only generate metadata for new templates
- Track metadata version for updates
Enhanced Analysis
- Workflow complexity scoring
- Dependency graph analysis
- Performance impact estimates
User Feedback Loop
- Collect accuracy feedback
- Refine categorization over time
- Community-driven corrections
Alternative Models
- Support for local LLMs
- Claude API integration
- Configurable model selection

Implementation Details

Database Schema

-- Metadata stored as JSON column
ALTER TABLE templates ADD COLUMN metadata TEXT;

-- Indexes for common queries
CREATE INDEX idx_templates_complexity ON templates(
  json_extract(metadata, '$.complexity')
);
CREATE INDEX idx_templates_setup_time ON templates(
  json_extract(metadata, '$.estimated_setup_minutes')
);

Error Handling

The system provides robust error handling:

API Failures: Fallback to default metadata
Parsing Errors: Logged with template ID
Batch Failures: Individual item retry
Validation Errors: Zod schema enforcement

Maintenance

Regenerating Metadata

# Full regeneration (caution: costs ~$0.50)
npm run fetch:templates -- --mode=rebuild --generate-metadata

# Partial regeneration (templates without metadata)
npm run fetch:templates -- --metadata-only

Database Backup

# Backup before regeneration
cp data/nodes.db data/nodes.db.backup

# Restore if needed
cp data/nodes.db.backup data/nodes.db

Security Considerations

API Key Management
- Store in .env file (gitignored)
- Never commit API keys
- Use environment variables in CI/CD
Data Privacy
- Only template structure is sent to API
- No user data or credentials included
- Processing happens in OpenAI's secure environment

Conclusion

The template metadata system transforms template discovery from simple text search to intelligent, multi-dimensional filtering. By leveraging OpenAI's batch API, we achieve cost-effective, scalable metadata generation that significantly improves the user experience for finding relevant workflow templates.

8.1 KiB Raw Blame History