mirror of
https://github.com/czlonkowski/n8n-mcp.git
synced 2026-01-30 22:42:04 +00:00
Compare commits
8 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
48df87f76c | ||
|
|
540c5270c6 | ||
|
|
6210378687 | ||
|
|
8c2b1cfbbe | ||
|
|
d862f4961d | ||
|
|
2057f98e76 | ||
|
|
fff47f9f9d | ||
|
|
87cc84f593 |
4
.gitignore
vendored
4
.gitignore
vendored
@@ -89,6 +89,10 @@ docker-compose.override.yml
|
||||
temp/
|
||||
tmp/
|
||||
|
||||
# Batch processing error files (may contain API tokens from templates)
|
||||
docs/batch_*.jsonl
|
||||
**/batch_*_error.jsonl
|
||||
|
||||
# Database files
|
||||
# Database files - nodes.db is now tracked directly
|
||||
# data/*.db
|
||||
|
||||
336
MEMORY_TEMPLATE_UPDATE.md
Normal file
336
MEMORY_TEMPLATE_UPDATE.md
Normal file
@@ -0,0 +1,336 @@
|
||||
# Template Update Process - Quick Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The n8n-mcp project maintains a database of workflow templates from n8n.io. This guide explains how to update the template database incrementally without rebuilding from scratch.
|
||||
|
||||
## Current Database State
|
||||
|
||||
As of the last update:
|
||||
- **2,598 templates** in database
|
||||
- Templates from the last 12 months
|
||||
- Latest template: September 12, 2025
|
||||
|
||||
## Quick Commands
|
||||
|
||||
### Incremental Update (Recommended)
|
||||
```bash
|
||||
# Build if needed
|
||||
npm run build
|
||||
|
||||
# Fetch only NEW templates (5-10 minutes)
|
||||
npm run fetch:templates:update
|
||||
```
|
||||
|
||||
### Full Rebuild (Rare)
|
||||
```bash
|
||||
# Rebuild entire database from scratch (30-40 minutes)
|
||||
npm run fetch:templates
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Incremental Update Mode (`--update`)
|
||||
|
||||
The incremental update is **smart and efficient**:
|
||||
|
||||
1. **Loads existing template IDs** from database (~2,598 templates)
|
||||
2. **Fetches template list** from n8n.io API (all templates from last 12 months)
|
||||
3. **Filters** to find only NEW templates not in database
|
||||
4. **Fetches details** for new templates only (saves time and API calls)
|
||||
5. **Saves** new templates to database (existing ones untouched)
|
||||
6. **Rebuilds FTS5** search index for new templates
|
||||
|
||||
### Key Benefits
|
||||
|
||||
✅ **Non-destructive**: All existing templates preserved
|
||||
✅ **Fast**: Only fetches new templates (5-10 min vs 30-40 min)
|
||||
✅ **API friendly**: Reduces load on n8n.io API
|
||||
✅ **Safe**: Preserves AI-generated metadata
|
||||
✅ **Smart**: Automatically skips duplicates
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
| Mode | Templates Fetched | Time | Use Case |
|
||||
|------|------------------|------|----------|
|
||||
| **Update** | Only new (~50-200) | 5-10 min | Regular updates |
|
||||
| **Rebuild** | All (~8000+) | 30-40 min | Initial setup or corruption |
|
||||
|
||||
## Command Options
|
||||
|
||||
### Basic Update
|
||||
```bash
|
||||
npm run fetch:templates:update
|
||||
```
|
||||
|
||||
### Full Rebuild
|
||||
```bash
|
||||
npm run fetch:templates
|
||||
```
|
||||
|
||||
### With Metadata Generation
|
||||
```bash
|
||||
# Update templates and generate AI metadata
|
||||
npm run fetch:templates -- --update --generate-metadata
|
||||
|
||||
# Or just generate metadata for existing templates
|
||||
npm run fetch:templates -- --metadata-only
|
||||
```
|
||||
|
||||
### Help
|
||||
```bash
|
||||
npm run fetch:templates -- --help
|
||||
```
|
||||
|
||||
## Update Frequency
|
||||
|
||||
Recommended update schedule:
|
||||
- **Weekly**: Run incremental update to get latest templates
|
||||
- **Monthly**: Review database statistics
|
||||
- **As needed**: Rebuild only if database corruption suspected
|
||||
|
||||
## Template Filtering
|
||||
|
||||
The fetcher automatically filters templates:
|
||||
- ✅ **Includes**: Templates from last 12 months
|
||||
- ✅ **Includes**: Templates with >10 views
|
||||
- ❌ **Excludes**: Templates with ≤10 views (too niche)
|
||||
- ❌ **Excludes**: Templates older than 12 months
|
||||
|
||||
## Workflow
|
||||
|
||||
### Regular Update Workflow
|
||||
|
||||
```bash
|
||||
# 1. Check current state
|
||||
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
|
||||
|
||||
# 2. Build project (if code changed)
|
||||
npm run build
|
||||
|
||||
# 3. Run incremental update
|
||||
npm run fetch:templates:update
|
||||
|
||||
# 4. Verify new templates added
|
||||
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
|
||||
```
|
||||
|
||||
### After n8n Dependency Update
|
||||
|
||||
When you update n8n dependencies, templates remain compatible:
|
||||
```bash
|
||||
# 1. Update n8n (from MEMORY_N8N_UPDATE.md)
|
||||
npm run update:all
|
||||
|
||||
# 2. Fetch new templates incrementally
|
||||
npm run fetch:templates:update
|
||||
|
||||
# 3. Check how many templates were added
|
||||
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
|
||||
|
||||
# 4. Generate AI metadata for new templates (optional, requires OPENAI_API_KEY)
|
||||
npm run fetch:templates -- --metadata-only
|
||||
|
||||
# 5. IMPORTANT: Sanitize templates before pushing database
|
||||
npm run build
|
||||
npm run sanitize:templates
|
||||
```
|
||||
|
||||
Templates are independent of n8n version - they're just workflow JSON data.
|
||||
|
||||
**CRITICAL**: Always run `npm run sanitize:templates` before pushing the database to remove API tokens from template workflows.
|
||||
|
||||
**Note**: New templates fetched via `--update` mode will NOT have AI-generated metadata by default. You need to run `--metadata-only` separately to generate metadata for templates that don't have it yet.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No New Templates Found
|
||||
|
||||
This is normal! It means:
|
||||
- All recent templates are already in your database
|
||||
- n8n.io hasn't published many new templates recently
|
||||
- Your database is up to date
|
||||
|
||||
```bash
|
||||
📊 Update mode: 0 new templates to fetch (skipping 2598 existing)
|
||||
✅ All templates already have metadata
|
||||
```
|
||||
|
||||
### API Rate Limiting
|
||||
|
||||
If you hit rate limits:
|
||||
- The fetcher includes built-in delays (150ms between requests)
|
||||
- Wait a few minutes and try again
|
||||
- Use `--update` mode instead of full rebuild
|
||||
|
||||
### Database Corruption
|
||||
|
||||
If you suspect corruption:
|
||||
```bash
|
||||
# Full rebuild from scratch
|
||||
npm run fetch:templates
|
||||
|
||||
# This will:
|
||||
# - Drop and recreate templates table
|
||||
# - Fetch all templates fresh
|
||||
# - Rebuild search indexes
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
Templates are stored with:
|
||||
- Basic info (id, name, description, author, views, created_at)
|
||||
- Node types used (JSON array)
|
||||
- Complete workflow (gzip compressed, base64 encoded)
|
||||
- AI-generated metadata (optional, requires OpenAI API key)
|
||||
- FTS5 search index for fast text search
|
||||
|
||||
## Metadata Generation
|
||||
|
||||
Generate AI metadata for templates:
|
||||
```bash
|
||||
# Requires OPENAI_API_KEY in .env
|
||||
export OPENAI_API_KEY="sk-..."
|
||||
|
||||
# Generate for templates without metadata (recommended after incremental update)
|
||||
npm run fetch:templates -- --metadata-only
|
||||
|
||||
# Generate during template fetch (slower, but automatic)
|
||||
npm run fetch:templates:update -- --generate-metadata
|
||||
```
|
||||
|
||||
**Important**: Incremental updates (`--update`) do NOT generate metadata by default. After running `npm run fetch:templates:update`, you'll have new templates without metadata. Run `--metadata-only` separately to generate metadata for them.
|
||||
|
||||
### Check Metadata Coverage
|
||||
|
||||
```bash
|
||||
# See how many templates have metadata
|
||||
sqlite3 data/nodes.db "SELECT
|
||||
COUNT(*) as total,
|
||||
SUM(CASE WHEN metadata_json IS NOT NULL THEN 1 ELSE 0 END) as with_metadata,
|
||||
SUM(CASE WHEN metadata_json IS NULL THEN 1 ELSE 0 END) as without_metadata
|
||||
FROM templates"
|
||||
|
||||
# See recent templates without metadata
|
||||
sqlite3 data/nodes.db "SELECT id, name, created_at
|
||||
FROM templates
|
||||
WHERE metadata_json IS NULL
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 10"
|
||||
```
|
||||
|
||||
Metadata includes:
|
||||
- Categories
|
||||
- Complexity level (simple/medium/complex)
|
||||
- Use cases
|
||||
- Estimated setup time
|
||||
- Required services
|
||||
- Key features
|
||||
- Target audience
|
||||
|
||||
### Metadata Generation Troubleshooting
|
||||
|
||||
If metadata generation fails:
|
||||
|
||||
1. **Check error file**: Errors are saved to `temp/batch/batch_*_error.jsonl`
|
||||
2. **Common issues**:
|
||||
- `"Unsupported value: 'temperature'"` - Model doesn't support custom temperature
|
||||
- `"Invalid request"` - Check OPENAI_API_KEY is valid
|
||||
- Model availability issues
|
||||
3. **Model**: Uses `gpt-5-mini-2025-08-07` by default
|
||||
4. **Token limit**: 3000 tokens per request for detailed metadata
|
||||
|
||||
The system will automatically:
|
||||
- Process error files and assign default metadata to failed templates
|
||||
- Save error details for debugging
|
||||
- Continue processing even if some templates fail
|
||||
|
||||
**Example error handling**:
|
||||
```bash
|
||||
# If you see: "No output file available for batch job"
|
||||
# Check: temp/batch/batch_*_error.jsonl for error details
|
||||
# The system now automatically processes errors and generates default metadata
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Optional configuration:
|
||||
```bash
|
||||
# OpenAI for metadata generation
|
||||
OPENAI_API_KEY=sk-...
|
||||
OPENAI_MODEL=gpt-4o-mini # Default model
|
||||
OPENAI_BATCH_SIZE=50 # Batch size for metadata generation
|
||||
|
||||
# Metadata generation limits
|
||||
METADATA_LIMIT=100 # Max templates to process (0 = all)
|
||||
```
|
||||
|
||||
## Statistics
|
||||
|
||||
After update, check stats:
|
||||
```bash
|
||||
# Template count
|
||||
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
|
||||
|
||||
# Most recent template
|
||||
sqlite3 data/nodes.db "SELECT MAX(created_at) FROM templates"
|
||||
|
||||
# Templates by view count
|
||||
sqlite3 data/nodes.db "SELECT COUNT(*),
|
||||
CASE
|
||||
WHEN views < 50 THEN '<50'
|
||||
WHEN views < 100 THEN '50-100'
|
||||
WHEN views < 500 THEN '100-500'
|
||||
ELSE '500+'
|
||||
END as view_range
|
||||
FROM templates GROUP BY view_range"
|
||||
```
|
||||
|
||||
## Integration with n8n-mcp
|
||||
|
||||
Templates are available through MCP tools:
|
||||
- `list_templates`: List all templates
|
||||
- `get_template`: Get specific template with workflow
|
||||
- `search_templates`: Search by keyword
|
||||
- `list_node_templates`: Templates using specific nodes
|
||||
- `get_templates_for_task`: Templates for common tasks
|
||||
- `search_templates_by_metadata`: Advanced filtering
|
||||
|
||||
See `npm run test:templates` for usage examples.
|
||||
|
||||
## Time Estimates
|
||||
|
||||
Typical incremental update:
|
||||
- Loading existing IDs: 1-2 seconds
|
||||
- Fetching template list: 2-3 minutes
|
||||
- Filtering new templates: instant
|
||||
- Fetching details for 100 new templates: ~15 seconds (0.15s each)
|
||||
- Saving and indexing: 5-10 seconds
|
||||
- **Total: 3-5 minutes**
|
||||
|
||||
Full rebuild:
|
||||
- Fetching 8000+ templates: 25-30 minutes
|
||||
- Saving and indexing: 5-10 minutes
|
||||
- **Total: 30-40 minutes**
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use incremental updates** for regular maintenance
|
||||
2. **Rebuild only when necessary** (corruption, major changes)
|
||||
3. **Generate metadata incrementally** to avoid OpenAI costs
|
||||
4. **Monitor template count** to verify updates working
|
||||
5. **Keep database backed up** before major operations
|
||||
|
||||
## Next Steps
|
||||
|
||||
After updating templates:
|
||||
1. Test template search: `npm run test:templates`
|
||||
2. Verify MCP tools work: Test in Claude Desktop
|
||||
3. Check statistics in database
|
||||
4. Commit changes if desired (database changes)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `MEMORY_N8N_UPDATE.md` - Updating n8n dependencies
|
||||
- `CLAUDE.md` - Project overview and architecture
|
||||
- `README.md` - User documentation
|
||||
@@ -5,7 +5,7 @@
|
||||
[](https://www.npmjs.com/package/n8n-mcp)
|
||||
[](https://codecov.io/gh/czlonkowski/n8n-mcp)
|
||||
[](https://github.com/czlonkowski/n8n-mcp/actions)
|
||||
[](https://github.com/n8n-io/n8n)
|
||||
[](https://github.com/n8n-io/n8n)
|
||||
[](https://github.com/czlonkowski/n8n-mcp/pkgs/container/n8n-mcp)
|
||||
[](https://railway.com/deploy/n8n-mcp?referralCode=n8n-mcp)
|
||||
|
||||
|
||||
BIN
data/nodes.db
BIN
data/nodes.db
Binary file not shown.
1809
package-lock.json
generated
1809
package-lock.json
generated
File diff suppressed because it is too large
Load Diff
11
package.json
11
package.json
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "n8n-mcp",
|
||||
"version": "2.14.2",
|
||||
"version": "2.14.3",
|
||||
"description": "Integration between n8n workflow automation and Model Context Protocol (MCP)",
|
||||
"main": "dist/index.js",
|
||||
"bin": {
|
||||
@@ -37,6 +37,7 @@
|
||||
"update:n8n": "node scripts/update-n8n-deps.js",
|
||||
"update:n8n:check": "node scripts/update-n8n-deps.js --dry-run",
|
||||
"fetch:templates": "node dist/scripts/fetch-templates.js",
|
||||
"fetch:templates:update": "node dist/scripts/fetch-templates.js --update",
|
||||
"fetch:templates:robust": "node dist/scripts/fetch-templates-robust.js",
|
||||
"prebuild:fts5": "npx tsx scripts/prebuild-fts5.ts",
|
||||
"test:templates": "node dist/scripts/test-templates.js",
|
||||
@@ -128,14 +129,14 @@
|
||||
},
|
||||
"dependencies": {
|
||||
"@modelcontextprotocol/sdk": "^1.13.2",
|
||||
"@n8n/n8n-nodes-langchain": "^1.111.1",
|
||||
"@n8n/n8n-nodes-langchain": "^1.112.2",
|
||||
"@supabase/supabase-js": "^2.57.4",
|
||||
"dotenv": "^16.5.0",
|
||||
"express": "^5.1.0",
|
||||
"lru-cache": "^11.2.1",
|
||||
"n8n": "^1.112.3",
|
||||
"n8n-core": "^1.111.0",
|
||||
"n8n-workflow": "^1.109.0",
|
||||
"n8n": "^1.113.3",
|
||||
"n8n-core": "^1.112.1",
|
||||
"n8n-workflow": "^1.110.0",
|
||||
"openai": "^4.77.0",
|
||||
"sql.js": "^1.13.0",
|
||||
"uuid": "^10.0.0",
|
||||
|
||||
@@ -2,32 +2,50 @@
|
||||
import { createDatabaseAdapter } from '../database/database-adapter';
|
||||
import { logger } from '../utils/logger';
|
||||
import { TemplateSanitizer } from '../utils/template-sanitizer';
|
||||
import { gunzipSync, gzipSync } from 'zlib';
|
||||
|
||||
async function sanitizeTemplates() {
|
||||
console.log('🧹 Sanitizing workflow templates in database...\n');
|
||||
|
||||
|
||||
const db = await createDatabaseAdapter('./data/nodes.db');
|
||||
const sanitizer = new TemplateSanitizer();
|
||||
|
||||
|
||||
try {
|
||||
// Get all templates
|
||||
const templates = db.prepare('SELECT id, name, workflow_json FROM templates').all() as any[];
|
||||
// Get all templates - check both old and new format
|
||||
const templates = db.prepare('SELECT id, name, workflow_json, workflow_json_compressed FROM templates').all() as any[];
|
||||
console.log(`Found ${templates.length} templates to check\n`);
|
||||
|
||||
|
||||
let sanitizedCount = 0;
|
||||
const problematicTemplates: any[] = [];
|
||||
|
||||
|
||||
for (const template of templates) {
|
||||
if (!template.workflow_json) {
|
||||
continue; // Skip templates without workflow data
|
||||
let originalWorkflow: any = null;
|
||||
let useCompressed = false;
|
||||
|
||||
// Try compressed format first (newer format)
|
||||
if (template.workflow_json_compressed) {
|
||||
try {
|
||||
const buffer = Buffer.from(template.workflow_json_compressed, 'base64');
|
||||
const decompressed = gunzipSync(buffer).toString('utf-8');
|
||||
originalWorkflow = JSON.parse(decompressed);
|
||||
useCompressed = true;
|
||||
} catch (e) {
|
||||
console.log(`⚠️ Failed to decompress template ${template.id}, trying uncompressed`);
|
||||
}
|
||||
}
|
||||
|
||||
let originalWorkflow;
|
||||
try {
|
||||
originalWorkflow = JSON.parse(template.workflow_json);
|
||||
} catch (e) {
|
||||
console.log(`⚠️ Skipping template ${template.id}: Invalid JSON`);
|
||||
continue;
|
||||
// Fall back to uncompressed format (deprecated)
|
||||
if (!originalWorkflow && template.workflow_json) {
|
||||
try {
|
||||
originalWorkflow = JSON.parse(template.workflow_json);
|
||||
} catch (e) {
|
||||
console.log(`⚠️ Skipping template ${template.id}: Invalid JSON in both formats`);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
if (!originalWorkflow) {
|
||||
continue; // Skip templates without workflow data
|
||||
}
|
||||
|
||||
const { sanitized: sanitizedWorkflow, wasModified } = sanitizer.sanitizeWorkflow(originalWorkflow);
|
||||
@@ -35,18 +53,24 @@ async function sanitizeTemplates() {
|
||||
if (wasModified) {
|
||||
// Get detected tokens for reporting
|
||||
const detectedTokens = sanitizer.detectTokens(originalWorkflow);
|
||||
|
||||
// Update the template with sanitized version
|
||||
const stmt = db.prepare('UPDATE templates SET workflow_json = ? WHERE id = ?');
|
||||
stmt.run(JSON.stringify(sanitizedWorkflow), template.id);
|
||||
|
||||
|
||||
// Update the template with sanitized version in the same format
|
||||
if (useCompressed) {
|
||||
const compressed = gzipSync(JSON.stringify(sanitizedWorkflow)).toString('base64');
|
||||
const stmt = db.prepare('UPDATE templates SET workflow_json_compressed = ? WHERE id = ?');
|
||||
stmt.run(compressed, template.id);
|
||||
} else {
|
||||
const stmt = db.prepare('UPDATE templates SET workflow_json = ? WHERE id = ?');
|
||||
stmt.run(JSON.stringify(sanitizedWorkflow), template.id);
|
||||
}
|
||||
|
||||
sanitizedCount++;
|
||||
problematicTemplates.push({
|
||||
id: template.id,
|
||||
name: template.name,
|
||||
tokens: detectedTokens
|
||||
});
|
||||
|
||||
|
||||
console.log(`✅ Sanitized template ${template.id}: ${template.name}`);
|
||||
detectedTokens.forEach(token => {
|
||||
console.log(` - Found: ${token.substring(0, 20)}...`);
|
||||
|
||||
@@ -258,85 +258,132 @@ export class BatchProcessor {
|
||||
}
|
||||
|
||||
/**
|
||||
* Monitor batch job with exponential backoff
|
||||
* Monitor batch job with fixed 1-minute polling interval
|
||||
*/
|
||||
private async monitorBatchJob(batchId: string): Promise<any> {
|
||||
// Start with shorter wait times for better UX
|
||||
const waitTimes = [30, 60, 120, 300, 600, 900, 1800]; // Progressive wait times in seconds
|
||||
let waitIndex = 0;
|
||||
const pollInterval = 60; // Check every 60 seconds (1 minute)
|
||||
let attempts = 0;
|
||||
const maxAttempts = 100; // Safety limit
|
||||
const maxAttempts = 120; // 120 minutes max (2 hours)
|
||||
const startTime = Date.now();
|
||||
let lastStatus = '';
|
||||
|
||||
|
||||
while (attempts < maxAttempts) {
|
||||
const batchJob = await this.client.batches.retrieve(batchId);
|
||||
|
||||
// Only log if status changed
|
||||
const elapsedMinutes = Math.floor((Date.now() - startTime) / 60000);
|
||||
|
||||
// Log status on every check (not just on change)
|
||||
const statusSymbol = batchJob.status === 'in_progress' ? '⚙️' :
|
||||
batchJob.status === 'finalizing' ? '📦' :
|
||||
batchJob.status === 'validating' ? '🔍' :
|
||||
batchJob.status === 'completed' ? '✅' :
|
||||
batchJob.status === 'failed' ? '❌' : '⏳';
|
||||
|
||||
console.log(` ${statusSymbol} Batch ${batchId.slice(-8)}: ${batchJob.status} (${elapsedMinutes} min, check ${attempts + 1})`);
|
||||
|
||||
if (batchJob.status !== lastStatus) {
|
||||
const elapsedMinutes = Math.floor((Date.now() - startTime) / 60000);
|
||||
const statusSymbol = batchJob.status === 'in_progress' ? '⚙️' :
|
||||
batchJob.status === 'finalizing' ? '📦' :
|
||||
batchJob.status === 'validating' ? '🔍' : '⏳';
|
||||
|
||||
console.log(` ${statusSymbol} Batch ${batchId.slice(-8)}: ${batchJob.status} (${elapsedMinutes} min)`);
|
||||
logger.info(`Batch ${batchId} status changed: ${lastStatus} -> ${batchJob.status}`);
|
||||
lastStatus = batchJob.status;
|
||||
}
|
||||
|
||||
logger.debug(`Batch ${batchId} status: ${batchJob.status} (attempt ${attempts + 1})`);
|
||||
|
||||
|
||||
if (batchJob.status === 'completed') {
|
||||
const elapsedMinutes = Math.floor((Date.now() - startTime) / 60000);
|
||||
console.log(` ✅ Batch ${batchId.slice(-8)} completed in ${elapsedMinutes} minutes`);
|
||||
console.log(` ✅ Batch ${batchId.slice(-8)} completed successfully in ${elapsedMinutes} minutes`);
|
||||
logger.info(`Batch job ${batchId} completed successfully`);
|
||||
return batchJob;
|
||||
}
|
||||
|
||||
|
||||
if (['failed', 'expired', 'cancelled'].includes(batchJob.status)) {
|
||||
logger.error(`Batch job ${batchId} failed with status: ${batchJob.status}`);
|
||||
throw new Error(`Batch job failed with status: ${batchJob.status}`);
|
||||
}
|
||||
|
||||
// Wait before next check
|
||||
const waitTime = waitTimes[Math.min(waitIndex, waitTimes.length - 1)];
|
||||
logger.debug(`Waiting ${waitTime} seconds before next check...`);
|
||||
await this.sleep(waitTime * 1000);
|
||||
|
||||
waitIndex = Math.min(waitIndex + 1, waitTimes.length - 1);
|
||||
|
||||
// Wait before next check (always 1 minute)
|
||||
logger.debug(`Waiting ${pollInterval} seconds before next check...`);
|
||||
await this.sleep(pollInterval * 1000);
|
||||
|
||||
attempts++;
|
||||
}
|
||||
|
||||
throw new Error(`Batch job monitoring timed out after ${maxAttempts} attempts`);
|
||||
|
||||
throw new Error(`Batch job monitoring timed out after ${maxAttempts} minutes`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Retrieve and parse results
|
||||
*/
|
||||
private async retrieveResults(batchJob: any): Promise<MetadataResult[]> {
|
||||
if (!batchJob.output_file_id) {
|
||||
throw new Error('No output file available for batch job');
|
||||
}
|
||||
|
||||
// Download result file
|
||||
const fileResponse = await this.client.files.content(batchJob.output_file_id);
|
||||
const fileContent = await fileResponse.text();
|
||||
|
||||
// Parse JSONL results
|
||||
const results: MetadataResult[] = [];
|
||||
const lines = fileContent.trim().split('\n');
|
||||
|
||||
for (const line of lines) {
|
||||
if (!line) continue;
|
||||
|
||||
|
||||
// Check if we have an output file (successful results)
|
||||
if (batchJob.output_file_id) {
|
||||
const fileResponse = await this.client.files.content(batchJob.output_file_id);
|
||||
const fileContent = await fileResponse.text();
|
||||
|
||||
const lines = fileContent.trim().split('\n');
|
||||
for (const line of lines) {
|
||||
if (!line) continue;
|
||||
try {
|
||||
const result = JSON.parse(line);
|
||||
const parsed = this.generator.parseResult(result);
|
||||
results.push(parsed);
|
||||
} catch (error) {
|
||||
logger.error('Error parsing result line:', error);
|
||||
}
|
||||
}
|
||||
logger.info(`Retrieved ${results.length} successful results from batch job`);
|
||||
}
|
||||
|
||||
// Check if we have an error file (failed results)
|
||||
if (batchJob.error_file_id) {
|
||||
logger.warn(`Batch job has error file: ${batchJob.error_file_id}`);
|
||||
|
||||
try {
|
||||
const result = JSON.parse(line);
|
||||
const parsed = this.generator.parseResult(result);
|
||||
results.push(parsed);
|
||||
const errorResponse = await this.client.files.content(batchJob.error_file_id);
|
||||
const errorContent = await errorResponse.text();
|
||||
|
||||
// Save error file locally for debugging
|
||||
const errorFilePath = path.join(this.outputDir, `batch_${batchJob.id}_error.jsonl`);
|
||||
fs.writeFileSync(errorFilePath, errorContent);
|
||||
logger.warn(`Error file saved to: ${errorFilePath}`);
|
||||
|
||||
// Parse errors and create default metadata for failed templates
|
||||
const errorLines = errorContent.trim().split('\n');
|
||||
logger.warn(`Found ${errorLines.length} failed requests in error file`);
|
||||
|
||||
for (const line of errorLines) {
|
||||
if (!line) continue;
|
||||
try {
|
||||
const errorResult = JSON.parse(line);
|
||||
const templateId = parseInt(errorResult.custom_id?.replace('template-', '') || '0');
|
||||
|
||||
if (templateId > 0) {
|
||||
const errorMessage = errorResult.response?.body?.error?.message ||
|
||||
errorResult.error?.message ||
|
||||
'Unknown error';
|
||||
|
||||
logger.debug(`Template ${templateId} failed: ${errorMessage}`);
|
||||
|
||||
// Use getDefaultMetadata() from generator (it's private but accessible via bracket notation)
|
||||
const defaultMeta = (this.generator as any).getDefaultMetadata();
|
||||
results.push({
|
||||
templateId,
|
||||
metadata: defaultMeta,
|
||||
error: errorMessage
|
||||
});
|
||||
}
|
||||
} catch (parseError) {
|
||||
logger.error('Error parsing error line:', parseError);
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error('Error parsing result line:', error);
|
||||
logger.error('Failed to process error file:', error);
|
||||
}
|
||||
}
|
||||
|
||||
logger.info(`Retrieved ${results.length} results from batch job`);
|
||||
|
||||
// If we have no results at all, something is very wrong
|
||||
if (results.length === 0 && !batchJob.output_file_id && !batchJob.error_file_id) {
|
||||
throw new Error('No output file or error file available for batch job');
|
||||
}
|
||||
|
||||
logger.info(`Total results (successful + failed): ${results.length}`);
|
||||
return results;
|
||||
}
|
||||
|
||||
|
||||
@@ -34,7 +34,7 @@ export class MetadataGenerator {
|
||||
private client: OpenAI;
|
||||
private model: string;
|
||||
|
||||
constructor(apiKey: string, model: string = 'gpt-4o-mini') {
|
||||
constructor(apiKey: string, model: string = 'gpt-5-mini-2025-08-07') {
|
||||
this.client = new OpenAI({ apiKey });
|
||||
this.model = model;
|
||||
}
|
||||
@@ -131,8 +131,8 @@ export class MetadataGenerator {
|
||||
url: '/v1/chat/completions',
|
||||
body: {
|
||||
model: this.model,
|
||||
temperature: 0.3, // Lower temperature for more consistent structured outputs
|
||||
max_completion_tokens: 1000,
|
||||
// temperature removed - batch API only supports default (1.0) for this model
|
||||
max_completion_tokens: 3000,
|
||||
response_format: {
|
||||
type: 'json_schema',
|
||||
json_schema: this.getJsonSchema()
|
||||
@@ -288,8 +288,8 @@ export class MetadataGenerator {
|
||||
try {
|
||||
const completion = await this.client.chat.completions.create({
|
||||
model: this.model,
|
||||
temperature: 0.3, // Lower temperature for more consistent structured outputs
|
||||
max_completion_tokens: 1000,
|
||||
// temperature removed - not supported in batch API for this model
|
||||
max_completion_tokens: 3000,
|
||||
response_format: {
|
||||
type: 'json_schema',
|
||||
json_schema: this.getJsonSchema()
|
||||
|
||||
@@ -19,11 +19,17 @@ export const defaultSanitizerConfig: SanitizerConfig = {
|
||||
tokenPatterns: [
|
||||
/apify_api_[A-Za-z0-9]+/g,
|
||||
/sk-[A-Za-z0-9]+/g, // OpenAI tokens
|
||||
/pat[A-Za-z0-9_]{40,}/g, // Airtable Personal Access Tokens
|
||||
/ghp_[A-Za-z0-9]{36,}/g, // GitHub Personal Access Tokens
|
||||
/gho_[A-Za-z0-9]{36,}/g, // GitHub OAuth tokens
|
||||
/Bearer\s+[A-Za-z0-9\-._~+\/]+=*/g // Generic bearer tokens
|
||||
],
|
||||
replacements: new Map([
|
||||
['apify_api_', 'apify_api_YOUR_TOKEN_HERE'],
|
||||
['sk-', 'sk-YOUR_OPENAI_KEY_HERE'],
|
||||
['pat', 'patYOUR_AIRTABLE_TOKEN_HERE'],
|
||||
['ghp_', 'ghp_YOUR_GITHUB_TOKEN_HERE'],
|
||||
['gho_', 'gho_YOUR_GITHUB_TOKEN_HERE'],
|
||||
['Bearer ', 'Bearer YOUR_TOKEN_HERE']
|
||||
])
|
||||
};
|
||||
|
||||
@@ -71,7 +71,7 @@ describe('BatchProcessor', () => {
|
||||
|
||||
options = {
|
||||
apiKey: 'test-api-key',
|
||||
model: 'gpt-4o-mini',
|
||||
model: 'gpt-5-mini-2025-08-07',
|
||||
batchSize: 3,
|
||||
outputDir: './test-temp'
|
||||
};
|
||||
@@ -177,13 +177,38 @@ describe('BatchProcessor', () => {
|
||||
|
||||
it('should handle batch submission errors gracefully', async () => {
|
||||
mockClient.files.create.mockRejectedValue(new Error('Upload failed'));
|
||||
|
||||
|
||||
const results = await processor.processTemplates([mockTemplates[0]]);
|
||||
|
||||
|
||||
// Should not throw, should return empty results
|
||||
expect(results.size).toBe(0);
|
||||
});
|
||||
|
||||
it('should log submission errors to console and logger', async () => {
|
||||
const consoleErrorSpy = vi.spyOn(console, 'error');
|
||||
const { logger } = await import('../../../src/utils/logger');
|
||||
const loggerErrorSpy = vi.spyOn(logger, 'error');
|
||||
|
||||
mockClient.files.create.mockRejectedValue(new Error('Network error'));
|
||||
|
||||
await processor.processTemplates([mockTemplates[0]]);
|
||||
|
||||
// Should log error to console (actual format from line 95: " ❌ Batch N failed:", error)
|
||||
expect(consoleErrorSpy).toHaveBeenCalledWith(
|
||||
expect.stringContaining('Batch'),
|
||||
expect.objectContaining({ message: 'Network error' })
|
||||
);
|
||||
|
||||
// Should also log to logger (line 94)
|
||||
expect(loggerErrorSpy).toHaveBeenCalledWith(
|
||||
expect.stringMatching(/Error processing batch/),
|
||||
expect.objectContaining({ message: 'Network error' })
|
||||
);
|
||||
|
||||
consoleErrorSpy.mockRestore();
|
||||
loggerErrorSpy.mockRestore();
|
||||
});
|
||||
|
||||
// Skipping: Parallel batch processing creates unhandled promise rejections in tests
|
||||
// The error handling works in production but the parallel promise structure is
|
||||
// difficult to test cleanly without refactoring the implementation
|
||||
@@ -368,7 +393,7 @@ describe('BatchProcessor', () => {
|
||||
it('should download and parse results correctly', async () => {
|
||||
const batchJob = { output_file_id: 'output-123' };
|
||||
const fileContent = '{"custom_id": "template-1"}\n{"custom_id": "template-2"}';
|
||||
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(fileContent)
|
||||
});
|
||||
@@ -377,7 +402,7 @@ describe('BatchProcessor', () => {
|
||||
{ templateId: 1, metadata: { categories: ['test'] } },
|
||||
{ templateId: 2, metadata: { categories: ['test2'] } }
|
||||
];
|
||||
|
||||
|
||||
mockGenerator.parseResult.mockReturnValueOnce(mockResults[0])
|
||||
.mockReturnValueOnce(mockResults[1]);
|
||||
|
||||
@@ -389,17 +414,17 @@ describe('BatchProcessor', () => {
|
||||
});
|
||||
|
||||
it('should throw error when no output file available', async () => {
|
||||
const batchJob = { output_file_id: null };
|
||||
const batchJob = { output_file_id: null, error_file_id: null };
|
||||
|
||||
await expect(
|
||||
(processor as any).retrieveResults(batchJob)
|
||||
).rejects.toThrow('No output file available for batch job');
|
||||
).rejects.toThrow('No output file or error file available for batch job');
|
||||
});
|
||||
|
||||
it('should handle malformed result lines gracefully', async () => {
|
||||
const batchJob = { output_file_id: 'output-123' };
|
||||
const fileContent = '{"valid": "json"}\ninvalid json line\n{"another": "valid"}';
|
||||
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(fileContent)
|
||||
});
|
||||
@@ -422,6 +447,227 @@ describe('BatchProcessor', () => {
|
||||
(processor as any).retrieveResults(batchJob)
|
||||
).rejects.toThrow('Download failed');
|
||||
});
|
||||
|
||||
it('should process error file when present', async () => {
|
||||
const batchJob = {
|
||||
id: 'batch-123',
|
||||
output_file_id: 'output-123',
|
||||
error_file_id: 'error-456'
|
||||
};
|
||||
|
||||
const outputContent = '{"custom_id": "template-1"}';
|
||||
const errorContent = '{"custom_id": "template-2", "error": {"message": "Rate limit exceeded"}}\n{"custom_id": "template-3", "response": {"body": {"error": {"message": "Invalid request"}}}}';
|
||||
|
||||
mockClient.files.content
|
||||
.mockResolvedValueOnce({ text: () => Promise.resolve(outputContent) })
|
||||
.mockResolvedValueOnce({ text: () => Promise.resolve(errorContent) });
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const successResult = { templateId: 1, metadata: { categories: ['success'] } };
|
||||
mockGenerator.parseResult.mockReturnValue(successResult);
|
||||
|
||||
// Mock getDefaultMetadata
|
||||
const defaultMetadata = {
|
||||
categories: ['General'],
|
||||
complexity: 'medium',
|
||||
estimatedSetupMinutes: 15,
|
||||
useCases: [],
|
||||
requiredServices: [],
|
||||
targetAudience: []
|
||||
};
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
// Should have 1 successful + 2 failed results
|
||||
expect(results).toHaveLength(3);
|
||||
expect(mockClient.files.content).toHaveBeenCalledWith('output-123');
|
||||
expect(mockClient.files.content).toHaveBeenCalledWith('error-456');
|
||||
expect(mockedFs.writeFileSync).toHaveBeenCalled();
|
||||
|
||||
// Check error file was saved
|
||||
const savedPath = (mockedFs.writeFileSync as any).mock.calls[0][0];
|
||||
expect(savedPath).toContain('batch_batch-123_error.jsonl');
|
||||
});
|
||||
|
||||
it('should handle error file with empty lines', async () => {
|
||||
const batchJob = {
|
||||
id: 'batch-789',
|
||||
error_file_id: 'error-789'
|
||||
};
|
||||
|
||||
const errorContent = '\n{"custom_id": "template-1", "error": {"message": "Failed"}}\n\n{"custom_id": "template-2", "error": {"message": "Error"}}\n';
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(errorContent)
|
||||
});
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const defaultMetadata = {
|
||||
categories: ['General'],
|
||||
complexity: 'medium',
|
||||
estimatedSetupMinutes: 15,
|
||||
useCases: [],
|
||||
requiredServices: [],
|
||||
targetAudience: []
|
||||
};
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
// Should skip empty lines and process only valid ones
|
||||
expect(results).toHaveLength(2);
|
||||
expect(results[0].templateId).toBe(1);
|
||||
expect(results[0].error).toBe('Failed');
|
||||
expect(results[1].templateId).toBe(2);
|
||||
expect(results[1].error).toBe('Error');
|
||||
});
|
||||
|
||||
it('should assign default metadata to failed templates', async () => {
|
||||
const batchJob = {
|
||||
error_file_id: 'error-456'
|
||||
};
|
||||
|
||||
const errorContent = '{"custom_id": "template-42", "error": {"message": "Timeout"}}';
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(errorContent)
|
||||
});
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const defaultMetadata = {
|
||||
categories: ['General'],
|
||||
complexity: 'medium',
|
||||
estimatedSetupMinutes: 15,
|
||||
useCases: ['General automation'],
|
||||
requiredServices: [],
|
||||
targetAudience: ['Developers']
|
||||
};
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
expect(results).toHaveLength(1);
|
||||
expect(results[0].templateId).toBe(42);
|
||||
expect(results[0].metadata).toEqual(defaultMetadata);
|
||||
expect(results[0].error).toBe('Timeout');
|
||||
});
|
||||
|
||||
it('should handle malformed error lines gracefully', async () => {
|
||||
const batchJob = {
|
||||
error_file_id: 'error-999'
|
||||
};
|
||||
|
||||
const errorContent = '{"custom_id": "template-1", "error": {"message": "Valid error"}}\ninvalid json\n{"invalid": "no custom_id"}\n{"custom_id": "template-2", "error": {"message": "Another valid"}}';
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(errorContent)
|
||||
});
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const defaultMetadata = { categories: ['General'] };
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
// Should only process valid error lines with template IDs
|
||||
expect(results).toHaveLength(2);
|
||||
expect(results[0].templateId).toBe(1);
|
||||
expect(results[1].templateId).toBe(2);
|
||||
});
|
||||
|
||||
it('should extract error message from response body', async () => {
|
||||
const batchJob = {
|
||||
error_file_id: 'error-123'
|
||||
};
|
||||
|
||||
const errorContent = '{"custom_id": "template-5", "response": {"body": {"error": {"message": "API error from response body"}}}}';
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(errorContent)
|
||||
});
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const defaultMetadata = { categories: ['General'] };
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
expect(results).toHaveLength(1);
|
||||
expect(results[0].error).toBe('API error from response body');
|
||||
});
|
||||
|
||||
it('should use unknown error when no error message found', async () => {
|
||||
const batchJob = {
|
||||
error_file_id: 'error-000'
|
||||
};
|
||||
|
||||
const errorContent = '{"custom_id": "template-10"}';
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(errorContent)
|
||||
});
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const defaultMetadata = { categories: ['General'] };
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
expect(results).toHaveLength(1);
|
||||
expect(results[0].error).toBe('Unknown error');
|
||||
});
|
||||
|
||||
it('should handle error file download failure gracefully', async () => {
|
||||
const batchJob = {
|
||||
output_file_id: 'output-123',
|
||||
error_file_id: 'error-failed'
|
||||
};
|
||||
|
||||
const outputContent = '{"custom_id": "template-1"}';
|
||||
|
||||
mockClient.files.content
|
||||
.mockResolvedValueOnce({ text: () => Promise.resolve(outputContent) })
|
||||
.mockRejectedValueOnce(new Error('Error file download failed'));
|
||||
|
||||
const successResult = { templateId: 1, metadata: { categories: ['success'] } };
|
||||
mockGenerator.parseResult.mockReturnValue(successResult);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
// Should still return successful results even if error file fails
|
||||
expect(results).toHaveLength(1);
|
||||
expect(results[0].templateId).toBe(1);
|
||||
});
|
||||
|
||||
it('should skip templates with invalid or zero ID in error file', async () => {
|
||||
const batchJob = {
|
||||
error_file_id: 'error-invalid'
|
||||
};
|
||||
|
||||
const errorContent = '{"custom_id": "template-0", "error": {"message": "Zero ID"}}\n{"custom_id": "invalid-id", "error": {"message": "Invalid"}}\n{"custom_id": "template-5", "error": {"message": "Valid ID"}}';
|
||||
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(errorContent)
|
||||
});
|
||||
|
||||
mockedFs.writeFileSync = vi.fn();
|
||||
|
||||
const defaultMetadata = { categories: ['General'] };
|
||||
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
|
||||
|
||||
const results = await (processor as any).retrieveResults(batchJob);
|
||||
|
||||
// Should only include template with valid ID > 0
|
||||
expect(results).toHaveLength(1);
|
||||
expect(results[0].templateId).toBe(5);
|
||||
});
|
||||
});
|
||||
|
||||
describe('cleanup', () => {
|
||||
@@ -526,7 +772,7 @@ describe('BatchProcessor', () => {
|
||||
mockClient.files.create.mockRejectedValue(new Error('Upload failed'));
|
||||
|
||||
const submitBatch = (processor as any).submitBatch.bind(processor);
|
||||
|
||||
|
||||
await expect(
|
||||
submitBatch(templates, 'error_test')
|
||||
).rejects.toThrow('Upload failed');
|
||||
@@ -544,7 +790,7 @@ describe('BatchProcessor', () => {
|
||||
|
||||
// Mock successful processing
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
@@ -565,4 +811,391 @@ describe('BatchProcessor', () => {
|
||||
expect(mockClient.batches.create).toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
|
||||
describe('submitBatch', () => {
|
||||
it('should clean up input file immediately after upload', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
|
||||
// Mock sleep to speed up test
|
||||
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
|
||||
|
||||
const promise = (processor as any).submitBatch(templates, 'test_batch');
|
||||
|
||||
// Wait a bit for synchronous cleanup
|
||||
await new Promise(resolve => setTimeout(resolve, 10));
|
||||
|
||||
// Input file should be deleted immediately
|
||||
expect(mockedFs.unlinkSync).toHaveBeenCalled();
|
||||
|
||||
await promise;
|
||||
});
|
||||
|
||||
it('should clean up OpenAI files after batch completion', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-upload-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
|
||||
// Mock sleep to speed up test
|
||||
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
|
||||
|
||||
await (processor as any).submitBatch(templates, 'cleanup_test');
|
||||
|
||||
// Wait for promise chain to complete
|
||||
await new Promise(resolve => setTimeout(resolve, 50));
|
||||
|
||||
// Should have attempted to delete the input file
|
||||
expect(mockClient.files.del).toHaveBeenCalledWith('file-upload-123');
|
||||
});
|
||||
|
||||
it('should handle cleanup errors gracefully', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
mockClient.files.del.mockRejectedValue(new Error('Delete failed'));
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
|
||||
// Mock sleep to speed up test
|
||||
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
|
||||
|
||||
// Should not throw even if cleanup fails
|
||||
await expect(
|
||||
(processor as any).submitBatch(templates, 'error_cleanup')
|
||||
).resolves.toBeDefined();
|
||||
});
|
||||
|
||||
it('should handle local file cleanup errors silently', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockedFs.unlinkSync = vi.fn().mockImplementation(() => {
|
||||
throw new Error('Cannot delete file');
|
||||
});
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
|
||||
// Mock sleep to speed up test
|
||||
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
|
||||
|
||||
// Should not throw even if local cleanup fails
|
||||
await expect(
|
||||
(processor as any).submitBatch(templates, 'local_cleanup_error')
|
||||
).resolves.toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('progress callback', () => {
|
||||
it('should call progress callback during batch submission', async () => {
|
||||
const templates = [
|
||||
{ templateId: 1, name: 'T1', nodes: ['node1'] },
|
||||
{ templateId: 2, name: 'T2', nodes: ['node2'] },
|
||||
{ templateId: 3, name: 'T3', nodes: ['node3'] },
|
||||
{ templateId: 4, name: 'T4', nodes: ['node4'] }
|
||||
];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve('{"custom_id": "template-1"}')
|
||||
});
|
||||
mockGenerator.parseResult.mockReturnValue({
|
||||
templateId: 1,
|
||||
metadata: { categories: ['test'] }
|
||||
});
|
||||
|
||||
const progressCallback = vi.fn();
|
||||
|
||||
await processor.processTemplates(templates, progressCallback);
|
||||
|
||||
// Should be called during submission and retrieval
|
||||
expect(progressCallback).toHaveBeenCalled();
|
||||
expect(progressCallback.mock.calls.some((call: any) =>
|
||||
call[0].includes('Submitting')
|
||||
)).toBe(true);
|
||||
});
|
||||
|
||||
it('should work without progress callback', async () => {
|
||||
const templates = [{ templateId: 1, name: 'T1', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve('{"custom_id": "template-1"}')
|
||||
});
|
||||
mockGenerator.parseResult.mockReturnValue({
|
||||
templateId: 1,
|
||||
metadata: { categories: ['test'] }
|
||||
});
|
||||
|
||||
// Should not throw without callback
|
||||
await expect(
|
||||
processor.processTemplates(templates)
|
||||
).resolves.toBeDefined();
|
||||
});
|
||||
|
||||
it('should call progress callback with correct parameters', async () => {
|
||||
const templates = [
|
||||
{ templateId: 1, name: 'T1', nodes: ['node1'] },
|
||||
{ templateId: 2, name: 'T2', nodes: ['node2'] }
|
||||
];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve('{"custom_id": "template-1"}')
|
||||
});
|
||||
mockGenerator.parseResult.mockReturnValue({
|
||||
templateId: 1,
|
||||
metadata: { categories: ['test'] }
|
||||
});
|
||||
|
||||
const progressCallback = vi.fn();
|
||||
|
||||
await processor.processTemplates(templates, progressCallback);
|
||||
|
||||
// Check that callback was called with proper arguments
|
||||
const submissionCall = progressCallback.mock.calls.find((call: any) =>
|
||||
call[0].includes('Submitting')
|
||||
);
|
||||
expect(submissionCall).toBeDefined();
|
||||
if (submissionCall) {
|
||||
expect(submissionCall[1]).toBeGreaterThanOrEqual(0);
|
||||
expect(submissionCall[2]).toBe(2);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('batch result merging', () => {
|
||||
it('should merge results from multiple batches', async () => {
|
||||
const templates = Array.from({ length: 6 }, (_, i) => ({
|
||||
templateId: i + 1,
|
||||
name: `T${i + 1}`,
|
||||
nodes: ['node']
|
||||
}));
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
|
||||
// Create different completed jobs for each batch
|
||||
let batchCounter = 0;
|
||||
mockClient.batches.create.mockImplementation(() => {
|
||||
batchCounter++;
|
||||
return Promise.resolve({
|
||||
id: `batch-${batchCounter}`,
|
||||
status: 'completed',
|
||||
output_file_id: `output-${batchCounter}`
|
||||
});
|
||||
});
|
||||
|
||||
mockClient.batches.retrieve.mockImplementation((id: string) => {
|
||||
return Promise.resolve({
|
||||
id,
|
||||
status: 'completed',
|
||||
output_file_id: `output-${id.split('-')[1]}`
|
||||
});
|
||||
});
|
||||
|
||||
let fileCounter = 0;
|
||||
mockClient.files.content.mockImplementation(() => {
|
||||
fileCounter++;
|
||||
return Promise.resolve({
|
||||
text: () => Promise.resolve(`{"custom_id": "template-${fileCounter}"}`)
|
||||
});
|
||||
});
|
||||
|
||||
mockGenerator.parseResult.mockImplementation((result: any) => {
|
||||
const id = parseInt(result.custom_id.split('-')[1]);
|
||||
return {
|
||||
templateId: id,
|
||||
metadata: { categories: [`batch-${Math.ceil(id / 3)}`] }
|
||||
};
|
||||
});
|
||||
|
||||
const results = await processor.processTemplates(templates);
|
||||
|
||||
// Should have results from both batches (6 templates, batchSize=3)
|
||||
expect(results.size).toBeGreaterThan(0);
|
||||
expect(mockClient.batches.create).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
|
||||
it('should handle empty batch results', async () => {
|
||||
const templates = [
|
||||
{ templateId: 1, name: 'T1', nodes: ['node'] },
|
||||
{ templateId: 2, name: 'T2', nodes: ['node'] }
|
||||
];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
const completedJob = {
|
||||
id: 'batch-123',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-123'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
|
||||
// Return empty content
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve('')
|
||||
});
|
||||
|
||||
const results = await processor.processTemplates(templates);
|
||||
|
||||
// Should handle empty results gracefully
|
||||
expect(results.size).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('sleep', () => {
|
||||
it('should delay for specified milliseconds', async () => {
|
||||
const start = Date.now();
|
||||
await (processor as any).sleep(100);
|
||||
const elapsed = Date.now() - start;
|
||||
|
||||
expect(elapsed).toBeGreaterThanOrEqual(95);
|
||||
expect(elapsed).toBeLessThan(150);
|
||||
});
|
||||
});
|
||||
|
||||
describe('processBatch (legacy method)', () => {
|
||||
it('should process a single batch synchronously', async () => {
|
||||
const templates = [
|
||||
{ templateId: 1, name: 'Test1', nodes: ['node1'] },
|
||||
{ templateId: 2, name: 'Test2', nodes: ['node2'] }
|
||||
];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-abc' });
|
||||
const completedJob = {
|
||||
id: 'batch-xyz',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-xyz'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
|
||||
const fileContent = '{"custom_id": "template-1"}\n{"custom_id": "template-2"}';
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve(fileContent)
|
||||
});
|
||||
|
||||
const mockResults = [
|
||||
{ templateId: 1, metadata: { categories: ['test1'] } },
|
||||
{ templateId: 2, metadata: { categories: ['test2'] } }
|
||||
];
|
||||
mockGenerator.parseResult.mockReturnValueOnce(mockResults[0])
|
||||
.mockReturnValueOnce(mockResults[1]);
|
||||
|
||||
// Mock sleep to speed up test
|
||||
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
|
||||
|
||||
const results = await (processor as any).processBatch(templates, 'legacy_test');
|
||||
|
||||
expect(results).toHaveLength(2);
|
||||
expect(results[0].templateId).toBe(1);
|
||||
expect(results[1].templateId).toBe(2);
|
||||
expect(mockClient.batches.create).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('should clean up files after processing', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-clean' });
|
||||
const completedJob = {
|
||||
id: 'batch-clean',
|
||||
status: 'completed',
|
||||
output_file_id: 'output-clean'
|
||||
};
|
||||
mockClient.batches.create.mockResolvedValue(completedJob);
|
||||
mockClient.batches.retrieve.mockResolvedValue(completedJob);
|
||||
mockClient.files.content.mockResolvedValue({
|
||||
text: () => Promise.resolve('{"custom_id": "template-1"}')
|
||||
});
|
||||
mockGenerator.parseResult.mockReturnValue({
|
||||
templateId: 1,
|
||||
metadata: { categories: ['test'] }
|
||||
});
|
||||
|
||||
// Mock sleep to speed up test
|
||||
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
|
||||
|
||||
await (processor as any).processBatch(templates, 'cleanup_test');
|
||||
|
||||
// Should clean up all files
|
||||
expect(mockedFs.unlinkSync).toHaveBeenCalled();
|
||||
expect(mockClient.files.del).toHaveBeenCalledWith('file-clean');
|
||||
expect(mockClient.files.del).toHaveBeenCalledWith('output-clean');
|
||||
});
|
||||
|
||||
it('should clean up local file on error', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockRejectedValue(new Error('Upload failed'));
|
||||
|
||||
await expect(
|
||||
(processor as any).processBatch(templates, 'error_test')
|
||||
).rejects.toThrow('Upload failed');
|
||||
|
||||
// Should clean up local file even on error
|
||||
expect(mockedFs.unlinkSync).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('should handle batch job monitoring errors', async () => {
|
||||
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
|
||||
|
||||
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
|
||||
mockClient.batches.create.mockResolvedValue({ id: 'batch-123' });
|
||||
mockClient.batches.retrieve.mockResolvedValue({
|
||||
id: 'batch-123',
|
||||
status: 'failed'
|
||||
});
|
||||
|
||||
await expect(
|
||||
(processor as any).processBatch(templates, 'failed_batch')
|
||||
).rejects.toThrow('Batch job failed with status: failed');
|
||||
|
||||
// Should still attempt cleanup
|
||||
expect(mockedFs.unlinkSync).toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
});
|
||||
@@ -18,7 +18,7 @@ describe('MetadataGenerator', () => {
|
||||
let generator: MetadataGenerator;
|
||||
|
||||
beforeEach(() => {
|
||||
generator = new MetadataGenerator('test-api-key', 'gpt-4o-mini');
|
||||
generator = new MetadataGenerator('test-api-key', 'gpt-5-mini-2025-08-07');
|
||||
});
|
||||
|
||||
describe('createBatchRequest', () => {
|
||||
@@ -35,7 +35,7 @@ describe('MetadataGenerator', () => {
|
||||
expect(request.custom_id).toBe('template-123');
|
||||
expect(request.method).toBe('POST');
|
||||
expect(request.url).toBe('/v1/chat/completions');
|
||||
expect(request.body.model).toBe('gpt-4o-mini');
|
||||
expect(request.body.model).toBe('gpt-5-mini-2025-08-07');
|
||||
expect(request.body.response_format.type).toBe('json_schema');
|
||||
expect(request.body.response_format.json_schema.strict).toBe(true);
|
||||
expect(request.body.messages).toHaveLength(2);
|
||||
@@ -217,7 +217,7 @@ describe('MetadataGenerator', () => {
|
||||
// but should not cause any injection in our code
|
||||
expect(userMessage).toContain('<script>alert("xss")</script>');
|
||||
expect(userMessage).toContain('javascript:alert(1)');
|
||||
expect(request.body.model).toBe('gpt-4o-mini');
|
||||
expect(request.body.model).toBe('gpt-5-mini-2025-08-07');
|
||||
});
|
||||
|
||||
it('should handle extremely long template names', () => {
|
||||
|
||||
Reference in New Issue
Block a user