Files
n8n-mcp/docs/TEMPLATE_METADATA.md
czlonkowski 1e586c0b23 feat: add template metadata generation and smart discovery
- Implement OpenAI batch API integration for metadata generation
- Add search_templates_by_metadata tool with advanced filtering
- Enhance list_templates to include descriptions and optional metadata
- Generate metadata for 2,534 templates (97.5% coverage)
- Update README with Template Tools section and enhanced Claude setup
- Add comprehensive documentation for metadata system

Enables intelligent template discovery through:
- Complexity levels (simple/medium/complex)
- Setup time estimates (5-480 minutes)
- Target audience filtering (developers/marketers/analysts)
- Required services detection
- Category and use case classification

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-15 00:18:53 +02:00

8.1 KiB

Template Metadata Generation

This document describes the template metadata generation system introduced in n8n-MCP v2.10.0, which uses OpenAI's batch API to automatically analyze and categorize workflow templates.

Overview

The template metadata system analyzes n8n workflow templates to extract structured information about their purpose, complexity, requirements, and target audience. This enables intelligent template discovery through advanced filtering capabilities.

Architecture

Components

  1. MetadataGenerator (src/templates/metadata-generator.ts)

    • Interfaces with OpenAI API
    • Generates structured metadata using JSON schemas
    • Provides fallback defaults for error cases
  2. BatchProcessor (src/templates/batch-processor.ts)

    • Manages OpenAI batch API operations
    • Handles parallel batch submission
    • Monitors batch status and retrieves results
  3. Template Repository (src/templates/template-repository.ts)

    • Stores metadata in SQLite database
    • Provides advanced search capabilities
    • Supports JSON extraction queries

Metadata Schema

Each template's metadata contains:

{
  categories: string[]           // Max 5 categories (e.g., "automation", "integration")
  complexity: "simple" | "medium" | "complex"
  use_cases: string[]           // Max 5 primary use cases
  estimated_setup_minutes: number // 5-480 minutes
  required_services: string[]    // External services needed
  key_features: string[]        // Max 5 main capabilities
  target_audience: string[]     // Max 3 target user types
}

Generation Process

1. Initial Setup

# Set OpenAI API key in .env
OPENAI_API_KEY=your-api-key-here

2. Generate Metadata for Existing Templates

# Generate metadata only (no template fetching)
npm run fetch:templates -- --metadata-only

# Generate metadata during update
npm run fetch:templates -- --mode=update --generate-metadata

3. Batch Processing

The system uses OpenAI's batch API for cost-effective processing:

  • 50% cost reduction compared to synchronous API calls
  • 24-hour processing window for batch completion
  • Parallel batch submission for faster processing
  • Automatic retry for failed items

Configuration Options

Environment variables:

  • OPENAI_API_KEY: Required for metadata generation
  • OPENAI_MODEL: Model to use (default: "gpt-4o-mini")
  • OPENAI_BATCH_SIZE: Templates per batch (default: 100, max: 500)
  • METADATA_LIMIT: Limit templates to process (for testing)

How It Works

1. Template Analysis

For each template, the generator analyzes:

  • Template name and description
  • Node types and their frequency
  • Workflow structure and connections
  • Overall complexity

2. Node Summarization

Nodes are grouped into categories:

  • HTTP/Webhooks
  • Database operations
  • Communication (Slack, Email)
  • AI/ML operations
  • Spreadsheets
  • Service-specific nodes

3. Metadata Generation

The AI model receives:

Template: [name]
Description: [description]
Nodes Used (X): [summarized node list]
Workflow has X nodes with Y connections

And generates structured metadata following the JSON schema.

4. Storage and Indexing

Metadata is stored as JSON in SQLite and indexed for fast querying:

-- Example query for simple automation templates
SELECT * FROM templates 
WHERE json_extract(metadata, '$.complexity') = 'simple'
AND json_extract(metadata, '$.categories') LIKE '%automation%'

MCP Tool Integration

search_templates_by_metadata

Advanced filtering tool with multiple parameters:

search_templates_by_metadata({
  category: "automation",           // Filter by category
  complexity: "simple",             // Skill level
  maxSetupMinutes: 30,             // Time constraint
  targetAudience: "marketers",     // Role-based
  requiredService: "slack"         // Service dependency
})

list_templates

Enhanced to include metadata:

list_templates({
  includeMetadata: true,  // Include full metadata
  limit: 20,
  offset: 0
})

Usage Examples

Finding Beginner-Friendly Templates

const templates = await search_templates_by_metadata({
  complexity: "simple",
  maxSetupMinutes: 15
});

Role-Specific Templates

const marketingTemplates = await search_templates_by_metadata({
  targetAudience: "marketers",
  category: "communication"
});

Service Integration Templates

const openaiTemplates = await search_templates_by_metadata({
  requiredService: "openai",
  complexity: "medium"
});

Performance Metrics

  • Coverage: 97.5% of templates have metadata (2,534/2,598)
  • Generation Time: ~2-4 hours for full database (using batch API)
  • Query Performance: <100ms for metadata searches
  • Storage Overhead: ~2MB additional database size

Troubleshooting

Common Issues

  1. Batch Processing Stuck

    • Check batch status: The API provides status updates
    • Batches auto-expire after 24 hours
    • Monitor using the batch ID in logs
  2. Missing Metadata

    • ~2.5% of templates may fail metadata generation
    • Fallback defaults are provided
    • Can regenerate with --metadata-only flag
  3. API Rate Limits

    • Batch API has generous limits (50,000 requests/batch)
    • Cost is 50% of synchronous API
    • Processing happens within 24-hour window

Monitoring Batch Status

# Check current batch status (if logged)
curl https://api.openai.com/v1/batches/[batch-id] \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Cost Analysis

Batch API Pricing (gpt-4o-mini)

  • Input: $0.075 per 1M tokens (50% of standard)
  • Output: $0.30 per 1M tokens (50% of standard)
  • Average template: ~300 input tokens, ~200 output tokens
  • Total cost for 2,500 templates: ~$0.50

Comparison with Synchronous API

  • Synchronous cost: ~$1.00 for same volume
  • Time saved: Parallel processing vs sequential
  • Reliability: Automatic retries included

Future Enhancements

Planned Improvements

  1. Incremental Updates

    • Only generate metadata for new templates
    • Track metadata version for updates
  2. Enhanced Analysis

    • Workflow complexity scoring
    • Dependency graph analysis
    • Performance impact estimates
  3. User Feedback Loop

    • Collect accuracy feedback
    • Refine categorization over time
    • Community-driven corrections
  4. Alternative Models

    • Support for local LLMs
    • Claude API integration
    • Configurable model selection

Implementation Details

Database Schema

-- Metadata stored as JSON column
ALTER TABLE templates ADD COLUMN metadata TEXT;

-- Indexes for common queries
CREATE INDEX idx_templates_complexity ON templates(
  json_extract(metadata, '$.complexity')
);
CREATE INDEX idx_templates_setup_time ON templates(
  json_extract(metadata, '$.estimated_setup_minutes')
);

Error Handling

The system provides robust error handling:

  1. API Failures: Fallback to default metadata
  2. Parsing Errors: Logged with template ID
  3. Batch Failures: Individual item retry
  4. Validation Errors: Zod schema enforcement

Maintenance

Regenerating Metadata

# Full regeneration (caution: costs ~$0.50)
npm run fetch:templates -- --mode=rebuild --generate-metadata

# Partial regeneration (templates without metadata)
npm run fetch:templates -- --metadata-only

Database Backup

# Backup before regeneration
cp data/nodes.db data/nodes.db.backup

# Restore if needed
cp data/nodes.db.backup data/nodes.db

Security Considerations

  1. API Key Management

    • Store in .env file (gitignored)
    • Never commit API keys
    • Use environment variables in CI/CD
  2. Data Privacy

    • Only template structure is sent to API
    • No user data or credentials included
    • Processing happens in OpenAI's secure environment

Conclusion

The template metadata system transforms template discovery from simple text search to intelligent, multi-dimensional filtering. By leveraging OpenAI's batch API, we achieve cost-effective, scalable metadata generation that significantly improves the user experience for finding relevant workflow templates.