Compare commits

...

8 Commits

Author SHA1 Message Date
Romuald Członkowski
48df87f76c Merge pull request #239 from czlonkowski/chore/update-n8n-dependencies
chore: update n8n to v1.113.3 and enhance template system
2025-09-30 12:05:25 +02:00
czlonkowski
540c5270c6 test: increase batch-processor coverage to 98.87%
- Add 19 new test cases covering error file processing
- Test default metadata assignment for failed templates
- Add file cleanup and error handling tests
- Test progress callback functionality
- Add batch result merging tests
- Test legacy processBatch method

Coverage improved from 51.51% to 98.87%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 11:49:08 +02:00
czlonkowski
6210378687 test: update batch processor test for new error message
- Update error message expectation to match enhanced error handling
- Fixes CI test failure after error handling improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 11:29:43 +02:00
czlonkowski
8c2b1cfbbe fix: sanitize API tokens from database templates
- Update sanitization script to handle compressed workflows
- Add decompression/recompression support for workflow_json_compressed
- Sanitized 24 templates containing OpenAI and Apify API tokens
- Database now clean of exposed API keys

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 11:04:15 +02:00
czlonkowski
d862f4961d feat: enhance template sanitization and prevent secret leaks
- Add Airtable PAT and GitHub token patterns to template sanitizer
- Add batch error files to .gitignore (may contain API tokens)
- Document sanitization requirement in MEMORY_TEMPLATE_UPDATE.md
- Prevents accidental secret commits during template updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 10:57:14 +02:00
czlonkowski
2057f98e76 fix: improve batch job monitoring with 1-minute polling
- Change from exponential backoff to fixed 1-minute polling interval
- Log status on EVERY check (not just on status change)
- Show check number and elapsed time in each log
- Increase max timeout to 120 minutes (was 100 attempts with variable times)
- Add better status symbols for completed/failed states

This fixes the issue where batches completed on OpenAI's side but monitoring
appeared to hang because it was waiting too long between checks.

Note: Error files with API tokens are now excluded from commits for security.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 10:46:28 +02:00
czlonkowski
fff47f9f9d feat: add incremental template updates and fix metadata generation
Template Updates:
- Add npm script for incremental template fetch (fetch:templates:update)
- Create MEMORY_TEMPLATE_UPDATE.md with comprehensive documentation
- Update 48 new templates (2598 → 2646 total)
- Latest template now from September 24, 2025

Metadata Generation Fixes:
- Update model from gpt-4o-mini to gpt-5-mini-2025-08-07
- Remove temperature parameter (not supported in batch API)
- Increase max_completion_tokens from 1000 to 3000
- Add comprehensive error file handling to batch-processor
- Process failed requests and assign default metadata
- Save error files for debugging (temp/batch/)

Test Updates:
- Update all test files to use gpt-5-mini-2025-08-07 model
- 3 test assertions updated in metadata-generator.test.ts
- 1 test option updated in batch-processor.test.ts

Documentation:
- Add troubleshooting section for metadata generation
- Include error handling examples
- Document incremental vs full rebuild modes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 09:59:42 +02:00
czlonkowski
87cc84f593 chore: update n8n to v1.113.3
- Updated n8n from 1.112.3 to 1.113.3
- Updated n8n-core from 1.111.0 to 1.112.1
- Updated n8n-workflow from 1.109.0 to 1.110.0
- Updated @n8n/n8n-nodes-langchain from 1.111.1 to 1.112.2
- Rebuilt node database with 536 nodes
- Bumped version to 2.14.3
- Updated n8n version badge in README
- All validation tests passing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-29 23:35:03 +02:00
12 changed files with 2803 additions and 243 deletions

4
.gitignore vendored
View File

@@ -89,6 +89,10 @@ docker-compose.override.yml
temp/
tmp/
# Batch processing error files (may contain API tokens from templates)
docs/batch_*.jsonl
**/batch_*_error.jsonl
# Database files
# Database files - nodes.db is now tracked directly
# data/*.db

336
MEMORY_TEMPLATE_UPDATE.md Normal file
View File

@@ -0,0 +1,336 @@
# Template Update Process - Quick Reference
## Overview
The n8n-mcp project maintains a database of workflow templates from n8n.io. This guide explains how to update the template database incrementally without rebuilding from scratch.
## Current Database State
As of the last update:
- **2,598 templates** in database
- Templates from the last 12 months
- Latest template: September 12, 2025
## Quick Commands
### Incremental Update (Recommended)
```bash
# Build if needed
npm run build
# Fetch only NEW templates (5-10 minutes)
npm run fetch:templates:update
```
### Full Rebuild (Rare)
```bash
# Rebuild entire database from scratch (30-40 minutes)
npm run fetch:templates
```
## How It Works
### Incremental Update Mode (`--update`)
The incremental update is **smart and efficient**:
1. **Loads existing template IDs** from database (~2,598 templates)
2. **Fetches template list** from n8n.io API (all templates from last 12 months)
3. **Filters** to find only NEW templates not in database
4. **Fetches details** for new templates only (saves time and API calls)
5. **Saves** new templates to database (existing ones untouched)
6. **Rebuilds FTS5** search index for new templates
### Key Benefits
**Non-destructive**: All existing templates preserved
**Fast**: Only fetches new templates (5-10 min vs 30-40 min)
**API friendly**: Reduces load on n8n.io API
**Safe**: Preserves AI-generated metadata
**Smart**: Automatically skips duplicates
## Performance Comparison
| Mode | Templates Fetched | Time | Use Case |
|------|------------------|------|----------|
| **Update** | Only new (~50-200) | 5-10 min | Regular updates |
| **Rebuild** | All (~8000+) | 30-40 min | Initial setup or corruption |
## Command Options
### Basic Update
```bash
npm run fetch:templates:update
```
### Full Rebuild
```bash
npm run fetch:templates
```
### With Metadata Generation
```bash
# Update templates and generate AI metadata
npm run fetch:templates -- --update --generate-metadata
# Or just generate metadata for existing templates
npm run fetch:templates -- --metadata-only
```
### Help
```bash
npm run fetch:templates -- --help
```
## Update Frequency
Recommended update schedule:
- **Weekly**: Run incremental update to get latest templates
- **Monthly**: Review database statistics
- **As needed**: Rebuild only if database corruption suspected
## Template Filtering
The fetcher automatically filters templates:
-**Includes**: Templates from last 12 months
-**Includes**: Templates with >10 views
-**Excludes**: Templates with ≤10 views (too niche)
-**Excludes**: Templates older than 12 months
## Workflow
### Regular Update Workflow
```bash
# 1. Check current state
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
# 2. Build project (if code changed)
npm run build
# 3. Run incremental update
npm run fetch:templates:update
# 4. Verify new templates added
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
```
### After n8n Dependency Update
When you update n8n dependencies, templates remain compatible:
```bash
# 1. Update n8n (from MEMORY_N8N_UPDATE.md)
npm run update:all
# 2. Fetch new templates incrementally
npm run fetch:templates:update
# 3. Check how many templates were added
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
# 4. Generate AI metadata for new templates (optional, requires OPENAI_API_KEY)
npm run fetch:templates -- --metadata-only
# 5. IMPORTANT: Sanitize templates before pushing database
npm run build
npm run sanitize:templates
```
Templates are independent of n8n version - they're just workflow JSON data.
**CRITICAL**: Always run `npm run sanitize:templates` before pushing the database to remove API tokens from template workflows.
**Note**: New templates fetched via `--update` mode will NOT have AI-generated metadata by default. You need to run `--metadata-only` separately to generate metadata for templates that don't have it yet.
## Troubleshooting
### No New Templates Found
This is normal! It means:
- All recent templates are already in your database
- n8n.io hasn't published many new templates recently
- Your database is up to date
```bash
📊 Update mode: 0 new templates to fetch (skipping 2598 existing)
✅ All templates already have metadata
```
### API Rate Limiting
If you hit rate limits:
- The fetcher includes built-in delays (150ms between requests)
- Wait a few minutes and try again
- Use `--update` mode instead of full rebuild
### Database Corruption
If you suspect corruption:
```bash
# Full rebuild from scratch
npm run fetch:templates
# This will:
# - Drop and recreate templates table
# - Fetch all templates fresh
# - Rebuild search indexes
```
## Database Schema
Templates are stored with:
- Basic info (id, name, description, author, views, created_at)
- Node types used (JSON array)
- Complete workflow (gzip compressed, base64 encoded)
- AI-generated metadata (optional, requires OpenAI API key)
- FTS5 search index for fast text search
## Metadata Generation
Generate AI metadata for templates:
```bash
# Requires OPENAI_API_KEY in .env
export OPENAI_API_KEY="sk-..."
# Generate for templates without metadata (recommended after incremental update)
npm run fetch:templates -- --metadata-only
# Generate during template fetch (slower, but automatic)
npm run fetch:templates:update -- --generate-metadata
```
**Important**: Incremental updates (`--update`) do NOT generate metadata by default. After running `npm run fetch:templates:update`, you'll have new templates without metadata. Run `--metadata-only` separately to generate metadata for them.
### Check Metadata Coverage
```bash
# See how many templates have metadata
sqlite3 data/nodes.db "SELECT
COUNT(*) as total,
SUM(CASE WHEN metadata_json IS NOT NULL THEN 1 ELSE 0 END) as with_metadata,
SUM(CASE WHEN metadata_json IS NULL THEN 1 ELSE 0 END) as without_metadata
FROM templates"
# See recent templates without metadata
sqlite3 data/nodes.db "SELECT id, name, created_at
FROM templates
WHERE metadata_json IS NULL
ORDER BY created_at DESC
LIMIT 10"
```
Metadata includes:
- Categories
- Complexity level (simple/medium/complex)
- Use cases
- Estimated setup time
- Required services
- Key features
- Target audience
### Metadata Generation Troubleshooting
If metadata generation fails:
1. **Check error file**: Errors are saved to `temp/batch/batch_*_error.jsonl`
2. **Common issues**:
- `"Unsupported value: 'temperature'"` - Model doesn't support custom temperature
- `"Invalid request"` - Check OPENAI_API_KEY is valid
- Model availability issues
3. **Model**: Uses `gpt-5-mini-2025-08-07` by default
4. **Token limit**: 3000 tokens per request for detailed metadata
The system will automatically:
- Process error files and assign default metadata to failed templates
- Save error details for debugging
- Continue processing even if some templates fail
**Example error handling**:
```bash
# If you see: "No output file available for batch job"
# Check: temp/batch/batch_*_error.jsonl for error details
# The system now automatically processes errors and generates default metadata
```
## Environment Variables
Optional configuration:
```bash
# OpenAI for metadata generation
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini # Default model
OPENAI_BATCH_SIZE=50 # Batch size for metadata generation
# Metadata generation limits
METADATA_LIMIT=100 # Max templates to process (0 = all)
```
## Statistics
After update, check stats:
```bash
# Template count
sqlite3 data/nodes.db "SELECT COUNT(*) FROM templates"
# Most recent template
sqlite3 data/nodes.db "SELECT MAX(created_at) FROM templates"
# Templates by view count
sqlite3 data/nodes.db "SELECT COUNT(*),
CASE
WHEN views < 50 THEN '<50'
WHEN views < 100 THEN '50-100'
WHEN views < 500 THEN '100-500'
ELSE '500+'
END as view_range
FROM templates GROUP BY view_range"
```
## Integration with n8n-mcp
Templates are available through MCP tools:
- `list_templates`: List all templates
- `get_template`: Get specific template with workflow
- `search_templates`: Search by keyword
- `list_node_templates`: Templates using specific nodes
- `get_templates_for_task`: Templates for common tasks
- `search_templates_by_metadata`: Advanced filtering
See `npm run test:templates` for usage examples.
## Time Estimates
Typical incremental update:
- Loading existing IDs: 1-2 seconds
- Fetching template list: 2-3 minutes
- Filtering new templates: instant
- Fetching details for 100 new templates: ~15 seconds (0.15s each)
- Saving and indexing: 5-10 seconds
- **Total: 3-5 minutes**
Full rebuild:
- Fetching 8000+ templates: 25-30 minutes
- Saving and indexing: 5-10 minutes
- **Total: 30-40 minutes**
## Best Practices
1. **Use incremental updates** for regular maintenance
2. **Rebuild only when necessary** (corruption, major changes)
3. **Generate metadata incrementally** to avoid OpenAI costs
4. **Monitor template count** to verify updates working
5. **Keep database backed up** before major operations
## Next Steps
After updating templates:
1. Test template search: `npm run test:templates`
2. Verify MCP tools work: Test in Claude Desktop
3. Check statistics in database
4. Commit changes if desired (database changes)
## Related Documentation
- `MEMORY_N8N_UPDATE.md` - Updating n8n dependencies
- `CLAUDE.md` - Project overview and architecture
- `README.md` - User documentation

View File

@@ -5,7 +5,7 @@
[![npm version](https://img.shields.io/npm/v/n8n-mcp.svg)](https://www.npmjs.com/package/n8n-mcp)
[![codecov](https://codecov.io/gh/czlonkowski/n8n-mcp/graph/badge.svg?token=YOUR_TOKEN)](https://codecov.io/gh/czlonkowski/n8n-mcp)
[![Tests](https://img.shields.io/badge/tests-2883%20passing-brightgreen.svg)](https://github.com/czlonkowski/n8n-mcp/actions)
[![n8n version](https://img.shields.io/badge/n8n-^1.112.3-orange.svg)](https://github.com/n8n-io/n8n)
[![n8n version](https://img.shields.io/badge/n8n-^1.113.3-orange.svg)](https://github.com/n8n-io/n8n)
[![Docker](https://img.shields.io/badge/docker-ghcr.io%2Fczlonkowski%2Fn8n--mcp-green.svg)](https://github.com/czlonkowski/n8n-mcp/pkgs/container/n8n-mcp)
[![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/n8n-mcp?referralCode=n8n-mcp)

Binary file not shown.

1809
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
{
"name": "n8n-mcp",
"version": "2.14.2",
"version": "2.14.3",
"description": "Integration between n8n workflow automation and Model Context Protocol (MCP)",
"main": "dist/index.js",
"bin": {
@@ -37,6 +37,7 @@
"update:n8n": "node scripts/update-n8n-deps.js",
"update:n8n:check": "node scripts/update-n8n-deps.js --dry-run",
"fetch:templates": "node dist/scripts/fetch-templates.js",
"fetch:templates:update": "node dist/scripts/fetch-templates.js --update",
"fetch:templates:robust": "node dist/scripts/fetch-templates-robust.js",
"prebuild:fts5": "npx tsx scripts/prebuild-fts5.ts",
"test:templates": "node dist/scripts/test-templates.js",
@@ -128,14 +129,14 @@
},
"dependencies": {
"@modelcontextprotocol/sdk": "^1.13.2",
"@n8n/n8n-nodes-langchain": "^1.111.1",
"@n8n/n8n-nodes-langchain": "^1.112.2",
"@supabase/supabase-js": "^2.57.4",
"dotenv": "^16.5.0",
"express": "^5.1.0",
"lru-cache": "^11.2.1",
"n8n": "^1.112.3",
"n8n-core": "^1.111.0",
"n8n-workflow": "^1.109.0",
"n8n": "^1.113.3",
"n8n-core": "^1.112.1",
"n8n-workflow": "^1.110.0",
"openai": "^4.77.0",
"sql.js": "^1.13.0",
"uuid": "^10.0.0",

View File

@@ -2,32 +2,50 @@
import { createDatabaseAdapter } from '../database/database-adapter';
import { logger } from '../utils/logger';
import { TemplateSanitizer } from '../utils/template-sanitizer';
import { gunzipSync, gzipSync } from 'zlib';
async function sanitizeTemplates() {
console.log('🧹 Sanitizing workflow templates in database...\n');
const db = await createDatabaseAdapter('./data/nodes.db');
const sanitizer = new TemplateSanitizer();
try {
// Get all templates
const templates = db.prepare('SELECT id, name, workflow_json FROM templates').all() as any[];
// Get all templates - check both old and new format
const templates = db.prepare('SELECT id, name, workflow_json, workflow_json_compressed FROM templates').all() as any[];
console.log(`Found ${templates.length} templates to check\n`);
let sanitizedCount = 0;
const problematicTemplates: any[] = [];
for (const template of templates) {
if (!template.workflow_json) {
continue; // Skip templates without workflow data
let originalWorkflow: any = null;
let useCompressed = false;
// Try compressed format first (newer format)
if (template.workflow_json_compressed) {
try {
const buffer = Buffer.from(template.workflow_json_compressed, 'base64');
const decompressed = gunzipSync(buffer).toString('utf-8');
originalWorkflow = JSON.parse(decompressed);
useCompressed = true;
} catch (e) {
console.log(`⚠️ Failed to decompress template ${template.id}, trying uncompressed`);
}
}
let originalWorkflow;
try {
originalWorkflow = JSON.parse(template.workflow_json);
} catch (e) {
console.log(`⚠️ Skipping template ${template.id}: Invalid JSON`);
continue;
// Fall back to uncompressed format (deprecated)
if (!originalWorkflow && template.workflow_json) {
try {
originalWorkflow = JSON.parse(template.workflow_json);
} catch (e) {
console.log(`⚠️ Skipping template ${template.id}: Invalid JSON in both formats`);
continue;
}
}
if (!originalWorkflow) {
continue; // Skip templates without workflow data
}
const { sanitized: sanitizedWorkflow, wasModified } = sanitizer.sanitizeWorkflow(originalWorkflow);
@@ -35,18 +53,24 @@ async function sanitizeTemplates() {
if (wasModified) {
// Get detected tokens for reporting
const detectedTokens = sanitizer.detectTokens(originalWorkflow);
// Update the template with sanitized version
const stmt = db.prepare('UPDATE templates SET workflow_json = ? WHERE id = ?');
stmt.run(JSON.stringify(sanitizedWorkflow), template.id);
// Update the template with sanitized version in the same format
if (useCompressed) {
const compressed = gzipSync(JSON.stringify(sanitizedWorkflow)).toString('base64');
const stmt = db.prepare('UPDATE templates SET workflow_json_compressed = ? WHERE id = ?');
stmt.run(compressed, template.id);
} else {
const stmt = db.prepare('UPDATE templates SET workflow_json = ? WHERE id = ?');
stmt.run(JSON.stringify(sanitizedWorkflow), template.id);
}
sanitizedCount++;
problematicTemplates.push({
id: template.id,
name: template.name,
tokens: detectedTokens
});
console.log(`✅ Sanitized template ${template.id}: ${template.name}`);
detectedTokens.forEach(token => {
console.log(` - Found: ${token.substring(0, 20)}...`);

View File

@@ -258,85 +258,132 @@ export class BatchProcessor {
}
/**
* Monitor batch job with exponential backoff
* Monitor batch job with fixed 1-minute polling interval
*/
private async monitorBatchJob(batchId: string): Promise<any> {
// Start with shorter wait times for better UX
const waitTimes = [30, 60, 120, 300, 600, 900, 1800]; // Progressive wait times in seconds
let waitIndex = 0;
const pollInterval = 60; // Check every 60 seconds (1 minute)
let attempts = 0;
const maxAttempts = 100; // Safety limit
const maxAttempts = 120; // 120 minutes max (2 hours)
const startTime = Date.now();
let lastStatus = '';
while (attempts < maxAttempts) {
const batchJob = await this.client.batches.retrieve(batchId);
// Only log if status changed
const elapsedMinutes = Math.floor((Date.now() - startTime) / 60000);
// Log status on every check (not just on change)
const statusSymbol = batchJob.status === 'in_progress' ? '⚙️' :
batchJob.status === 'finalizing' ? '📦' :
batchJob.status === 'validating' ? '🔍' :
batchJob.status === 'completed' ? '✅' :
batchJob.status === 'failed' ? '❌' : '⏳';
console.log(` ${statusSymbol} Batch ${batchId.slice(-8)}: ${batchJob.status} (${elapsedMinutes} min, check ${attempts + 1})`);
if (batchJob.status !== lastStatus) {
const elapsedMinutes = Math.floor((Date.now() - startTime) / 60000);
const statusSymbol = batchJob.status === 'in_progress' ? '⚙️' :
batchJob.status === 'finalizing' ? '📦' :
batchJob.status === 'validating' ? '🔍' : '⏳';
console.log(` ${statusSymbol} Batch ${batchId.slice(-8)}: ${batchJob.status} (${elapsedMinutes} min)`);
logger.info(`Batch ${batchId} status changed: ${lastStatus} -> ${batchJob.status}`);
lastStatus = batchJob.status;
}
logger.debug(`Batch ${batchId} status: ${batchJob.status} (attempt ${attempts + 1})`);
if (batchJob.status === 'completed') {
const elapsedMinutes = Math.floor((Date.now() - startTime) / 60000);
console.log(` ✅ Batch ${batchId.slice(-8)} completed in ${elapsedMinutes} minutes`);
console.log(` ✅ Batch ${batchId.slice(-8)} completed successfully in ${elapsedMinutes} minutes`);
logger.info(`Batch job ${batchId} completed successfully`);
return batchJob;
}
if (['failed', 'expired', 'cancelled'].includes(batchJob.status)) {
logger.error(`Batch job ${batchId} failed with status: ${batchJob.status}`);
throw new Error(`Batch job failed with status: ${batchJob.status}`);
}
// Wait before next check
const waitTime = waitTimes[Math.min(waitIndex, waitTimes.length - 1)];
logger.debug(`Waiting ${waitTime} seconds before next check...`);
await this.sleep(waitTime * 1000);
waitIndex = Math.min(waitIndex + 1, waitTimes.length - 1);
// Wait before next check (always 1 minute)
logger.debug(`Waiting ${pollInterval} seconds before next check...`);
await this.sleep(pollInterval * 1000);
attempts++;
}
throw new Error(`Batch job monitoring timed out after ${maxAttempts} attempts`);
throw new Error(`Batch job monitoring timed out after ${maxAttempts} minutes`);
}
/**
* Retrieve and parse results
*/
private async retrieveResults(batchJob: any): Promise<MetadataResult[]> {
if (!batchJob.output_file_id) {
throw new Error('No output file available for batch job');
}
// Download result file
const fileResponse = await this.client.files.content(batchJob.output_file_id);
const fileContent = await fileResponse.text();
// Parse JSONL results
const results: MetadataResult[] = [];
const lines = fileContent.trim().split('\n');
for (const line of lines) {
if (!line) continue;
// Check if we have an output file (successful results)
if (batchJob.output_file_id) {
const fileResponse = await this.client.files.content(batchJob.output_file_id);
const fileContent = await fileResponse.text();
const lines = fileContent.trim().split('\n');
for (const line of lines) {
if (!line) continue;
try {
const result = JSON.parse(line);
const parsed = this.generator.parseResult(result);
results.push(parsed);
} catch (error) {
logger.error('Error parsing result line:', error);
}
}
logger.info(`Retrieved ${results.length} successful results from batch job`);
}
// Check if we have an error file (failed results)
if (batchJob.error_file_id) {
logger.warn(`Batch job has error file: ${batchJob.error_file_id}`);
try {
const result = JSON.parse(line);
const parsed = this.generator.parseResult(result);
results.push(parsed);
const errorResponse = await this.client.files.content(batchJob.error_file_id);
const errorContent = await errorResponse.text();
// Save error file locally for debugging
const errorFilePath = path.join(this.outputDir, `batch_${batchJob.id}_error.jsonl`);
fs.writeFileSync(errorFilePath, errorContent);
logger.warn(`Error file saved to: ${errorFilePath}`);
// Parse errors and create default metadata for failed templates
const errorLines = errorContent.trim().split('\n');
logger.warn(`Found ${errorLines.length} failed requests in error file`);
for (const line of errorLines) {
if (!line) continue;
try {
const errorResult = JSON.parse(line);
const templateId = parseInt(errorResult.custom_id?.replace('template-', '') || '0');
if (templateId > 0) {
const errorMessage = errorResult.response?.body?.error?.message ||
errorResult.error?.message ||
'Unknown error';
logger.debug(`Template ${templateId} failed: ${errorMessage}`);
// Use getDefaultMetadata() from generator (it's private but accessible via bracket notation)
const defaultMeta = (this.generator as any).getDefaultMetadata();
results.push({
templateId,
metadata: defaultMeta,
error: errorMessage
});
}
} catch (parseError) {
logger.error('Error parsing error line:', parseError);
}
}
} catch (error) {
logger.error('Error parsing result line:', error);
logger.error('Failed to process error file:', error);
}
}
logger.info(`Retrieved ${results.length} results from batch job`);
// If we have no results at all, something is very wrong
if (results.length === 0 && !batchJob.output_file_id && !batchJob.error_file_id) {
throw new Error('No output file or error file available for batch job');
}
logger.info(`Total results (successful + failed): ${results.length}`);
return results;
}

View File

@@ -34,7 +34,7 @@ export class MetadataGenerator {
private client: OpenAI;
private model: string;
constructor(apiKey: string, model: string = 'gpt-4o-mini') {
constructor(apiKey: string, model: string = 'gpt-5-mini-2025-08-07') {
this.client = new OpenAI({ apiKey });
this.model = model;
}
@@ -131,8 +131,8 @@ export class MetadataGenerator {
url: '/v1/chat/completions',
body: {
model: this.model,
temperature: 0.3, // Lower temperature for more consistent structured outputs
max_completion_tokens: 1000,
// temperature removed - batch API only supports default (1.0) for this model
max_completion_tokens: 3000,
response_format: {
type: 'json_schema',
json_schema: this.getJsonSchema()
@@ -288,8 +288,8 @@ export class MetadataGenerator {
try {
const completion = await this.client.chat.completions.create({
model: this.model,
temperature: 0.3, // Lower temperature for more consistent structured outputs
max_completion_tokens: 1000,
// temperature removed - not supported in batch API for this model
max_completion_tokens: 3000,
response_format: {
type: 'json_schema',
json_schema: this.getJsonSchema()

View File

@@ -19,11 +19,17 @@ export const defaultSanitizerConfig: SanitizerConfig = {
tokenPatterns: [
/apify_api_[A-Za-z0-9]+/g,
/sk-[A-Za-z0-9]+/g, // OpenAI tokens
/pat[A-Za-z0-9_]{40,}/g, // Airtable Personal Access Tokens
/ghp_[A-Za-z0-9]{36,}/g, // GitHub Personal Access Tokens
/gho_[A-Za-z0-9]{36,}/g, // GitHub OAuth tokens
/Bearer\s+[A-Za-z0-9\-._~+\/]+=*/g // Generic bearer tokens
],
replacements: new Map([
['apify_api_', 'apify_api_YOUR_TOKEN_HERE'],
['sk-', 'sk-YOUR_OPENAI_KEY_HERE'],
['pat', 'patYOUR_AIRTABLE_TOKEN_HERE'],
['ghp_', 'ghp_YOUR_GITHUB_TOKEN_HERE'],
['gho_', 'gho_YOUR_GITHUB_TOKEN_HERE'],
['Bearer ', 'Bearer YOUR_TOKEN_HERE']
])
};

View File

@@ -71,7 +71,7 @@ describe('BatchProcessor', () => {
options = {
apiKey: 'test-api-key',
model: 'gpt-4o-mini',
model: 'gpt-5-mini-2025-08-07',
batchSize: 3,
outputDir: './test-temp'
};
@@ -177,13 +177,38 @@ describe('BatchProcessor', () => {
it('should handle batch submission errors gracefully', async () => {
mockClient.files.create.mockRejectedValue(new Error('Upload failed'));
const results = await processor.processTemplates([mockTemplates[0]]);
// Should not throw, should return empty results
expect(results.size).toBe(0);
});
it('should log submission errors to console and logger', async () => {
const consoleErrorSpy = vi.spyOn(console, 'error');
const { logger } = await import('../../../src/utils/logger');
const loggerErrorSpy = vi.spyOn(logger, 'error');
mockClient.files.create.mockRejectedValue(new Error('Network error'));
await processor.processTemplates([mockTemplates[0]]);
// Should log error to console (actual format from line 95: " ❌ Batch N failed:", error)
expect(consoleErrorSpy).toHaveBeenCalledWith(
expect.stringContaining('Batch'),
expect.objectContaining({ message: 'Network error' })
);
// Should also log to logger (line 94)
expect(loggerErrorSpy).toHaveBeenCalledWith(
expect.stringMatching(/Error processing batch/),
expect.objectContaining({ message: 'Network error' })
);
consoleErrorSpy.mockRestore();
loggerErrorSpy.mockRestore();
});
// Skipping: Parallel batch processing creates unhandled promise rejections in tests
// The error handling works in production but the parallel promise structure is
// difficult to test cleanly without refactoring the implementation
@@ -368,7 +393,7 @@ describe('BatchProcessor', () => {
it('should download and parse results correctly', async () => {
const batchJob = { output_file_id: 'output-123' };
const fileContent = '{"custom_id": "template-1"}\n{"custom_id": "template-2"}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(fileContent)
});
@@ -377,7 +402,7 @@ describe('BatchProcessor', () => {
{ templateId: 1, metadata: { categories: ['test'] } },
{ templateId: 2, metadata: { categories: ['test2'] } }
];
mockGenerator.parseResult.mockReturnValueOnce(mockResults[0])
.mockReturnValueOnce(mockResults[1]);
@@ -389,17 +414,17 @@ describe('BatchProcessor', () => {
});
it('should throw error when no output file available', async () => {
const batchJob = { output_file_id: null };
const batchJob = { output_file_id: null, error_file_id: null };
await expect(
(processor as any).retrieveResults(batchJob)
).rejects.toThrow('No output file available for batch job');
).rejects.toThrow('No output file or error file available for batch job');
});
it('should handle malformed result lines gracefully', async () => {
const batchJob = { output_file_id: 'output-123' };
const fileContent = '{"valid": "json"}\ninvalid json line\n{"another": "valid"}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(fileContent)
});
@@ -422,6 +447,227 @@ describe('BatchProcessor', () => {
(processor as any).retrieveResults(batchJob)
).rejects.toThrow('Download failed');
});
it('should process error file when present', async () => {
const batchJob = {
id: 'batch-123',
output_file_id: 'output-123',
error_file_id: 'error-456'
};
const outputContent = '{"custom_id": "template-1"}';
const errorContent = '{"custom_id": "template-2", "error": {"message": "Rate limit exceeded"}}\n{"custom_id": "template-3", "response": {"body": {"error": {"message": "Invalid request"}}}}';
mockClient.files.content
.mockResolvedValueOnce({ text: () => Promise.resolve(outputContent) })
.mockResolvedValueOnce({ text: () => Promise.resolve(errorContent) });
mockedFs.writeFileSync = vi.fn();
const successResult = { templateId: 1, metadata: { categories: ['success'] } };
mockGenerator.parseResult.mockReturnValue(successResult);
// Mock getDefaultMetadata
const defaultMetadata = {
categories: ['General'],
complexity: 'medium',
estimatedSetupMinutes: 15,
useCases: [],
requiredServices: [],
targetAudience: []
};
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
// Should have 1 successful + 2 failed results
expect(results).toHaveLength(3);
expect(mockClient.files.content).toHaveBeenCalledWith('output-123');
expect(mockClient.files.content).toHaveBeenCalledWith('error-456');
expect(mockedFs.writeFileSync).toHaveBeenCalled();
// Check error file was saved
const savedPath = (mockedFs.writeFileSync as any).mock.calls[0][0];
expect(savedPath).toContain('batch_batch-123_error.jsonl');
});
it('should handle error file with empty lines', async () => {
const batchJob = {
id: 'batch-789',
error_file_id: 'error-789'
};
const errorContent = '\n{"custom_id": "template-1", "error": {"message": "Failed"}}\n\n{"custom_id": "template-2", "error": {"message": "Error"}}\n';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(errorContent)
});
mockedFs.writeFileSync = vi.fn();
const defaultMetadata = {
categories: ['General'],
complexity: 'medium',
estimatedSetupMinutes: 15,
useCases: [],
requiredServices: [],
targetAudience: []
};
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
// Should skip empty lines and process only valid ones
expect(results).toHaveLength(2);
expect(results[0].templateId).toBe(1);
expect(results[0].error).toBe('Failed');
expect(results[1].templateId).toBe(2);
expect(results[1].error).toBe('Error');
});
it('should assign default metadata to failed templates', async () => {
const batchJob = {
error_file_id: 'error-456'
};
const errorContent = '{"custom_id": "template-42", "error": {"message": "Timeout"}}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(errorContent)
});
mockedFs.writeFileSync = vi.fn();
const defaultMetadata = {
categories: ['General'],
complexity: 'medium',
estimatedSetupMinutes: 15,
useCases: ['General automation'],
requiredServices: [],
targetAudience: ['Developers']
};
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
expect(results).toHaveLength(1);
expect(results[0].templateId).toBe(42);
expect(results[0].metadata).toEqual(defaultMetadata);
expect(results[0].error).toBe('Timeout');
});
it('should handle malformed error lines gracefully', async () => {
const batchJob = {
error_file_id: 'error-999'
};
const errorContent = '{"custom_id": "template-1", "error": {"message": "Valid error"}}\ninvalid json\n{"invalid": "no custom_id"}\n{"custom_id": "template-2", "error": {"message": "Another valid"}}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(errorContent)
});
mockedFs.writeFileSync = vi.fn();
const defaultMetadata = { categories: ['General'] };
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
// Should only process valid error lines with template IDs
expect(results).toHaveLength(2);
expect(results[0].templateId).toBe(1);
expect(results[1].templateId).toBe(2);
});
it('should extract error message from response body', async () => {
const batchJob = {
error_file_id: 'error-123'
};
const errorContent = '{"custom_id": "template-5", "response": {"body": {"error": {"message": "API error from response body"}}}}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(errorContent)
});
mockedFs.writeFileSync = vi.fn();
const defaultMetadata = { categories: ['General'] };
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
expect(results).toHaveLength(1);
expect(results[0].error).toBe('API error from response body');
});
it('should use unknown error when no error message found', async () => {
const batchJob = {
error_file_id: 'error-000'
};
const errorContent = '{"custom_id": "template-10"}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(errorContent)
});
mockedFs.writeFileSync = vi.fn();
const defaultMetadata = { categories: ['General'] };
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
expect(results).toHaveLength(1);
expect(results[0].error).toBe('Unknown error');
});
it('should handle error file download failure gracefully', async () => {
const batchJob = {
output_file_id: 'output-123',
error_file_id: 'error-failed'
};
const outputContent = '{"custom_id": "template-1"}';
mockClient.files.content
.mockResolvedValueOnce({ text: () => Promise.resolve(outputContent) })
.mockRejectedValueOnce(new Error('Error file download failed'));
const successResult = { templateId: 1, metadata: { categories: ['success'] } };
mockGenerator.parseResult.mockReturnValue(successResult);
const results = await (processor as any).retrieveResults(batchJob);
// Should still return successful results even if error file fails
expect(results).toHaveLength(1);
expect(results[0].templateId).toBe(1);
});
it('should skip templates with invalid or zero ID in error file', async () => {
const batchJob = {
error_file_id: 'error-invalid'
};
const errorContent = '{"custom_id": "template-0", "error": {"message": "Zero ID"}}\n{"custom_id": "invalid-id", "error": {"message": "Invalid"}}\n{"custom_id": "template-5", "error": {"message": "Valid ID"}}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(errorContent)
});
mockedFs.writeFileSync = vi.fn();
const defaultMetadata = { categories: ['General'] };
(processor as any).generator.getDefaultMetadata = vi.fn().mockReturnValue(defaultMetadata);
const results = await (processor as any).retrieveResults(batchJob);
// Should only include template with valid ID > 0
expect(results).toHaveLength(1);
expect(results[0].templateId).toBe(5);
});
});
describe('cleanup', () => {
@@ -526,7 +772,7 @@ describe('BatchProcessor', () => {
mockClient.files.create.mockRejectedValue(new Error('Upload failed'));
const submitBatch = (processor as any).submitBatch.bind(processor);
await expect(
submitBatch(templates, 'error_test')
).rejects.toThrow('Upload failed');
@@ -544,7 +790,7 @@ describe('BatchProcessor', () => {
// Mock successful processing
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
@@ -565,4 +811,391 @@ describe('BatchProcessor', () => {
expect(mockClient.batches.create).toHaveBeenCalled();
});
});
describe('submitBatch', () => {
it('should clean up input file immediately after upload', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
// Mock sleep to speed up test
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
const promise = (processor as any).submitBatch(templates, 'test_batch');
// Wait a bit for synchronous cleanup
await new Promise(resolve => setTimeout(resolve, 10));
// Input file should be deleted immediately
expect(mockedFs.unlinkSync).toHaveBeenCalled();
await promise;
});
it('should clean up OpenAI files after batch completion', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockClient.files.create.mockResolvedValue({ id: 'file-upload-123' });
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
// Mock sleep to speed up test
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
await (processor as any).submitBatch(templates, 'cleanup_test');
// Wait for promise chain to complete
await new Promise(resolve => setTimeout(resolve, 50));
// Should have attempted to delete the input file
expect(mockClient.files.del).toHaveBeenCalledWith('file-upload-123');
});
it('should handle cleanup errors gracefully', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
mockClient.files.del.mockRejectedValue(new Error('Delete failed'));
const completedJob = {
id: 'batch-123',
status: 'completed'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
// Mock sleep to speed up test
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
// Should not throw even if cleanup fails
await expect(
(processor as any).submitBatch(templates, 'error_cleanup')
).resolves.toBeDefined();
});
it('should handle local file cleanup errors silently', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockedFs.unlinkSync = vi.fn().mockImplementation(() => {
throw new Error('Cannot delete file');
});
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
id: 'batch-123',
status: 'completed'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
// Mock sleep to speed up test
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
// Should not throw even if local cleanup fails
await expect(
(processor as any).submitBatch(templates, 'local_cleanup_error')
).resolves.toBeDefined();
});
});
describe('progress callback', () => {
it('should call progress callback during batch submission', async () => {
const templates = [
{ templateId: 1, name: 'T1', nodes: ['node1'] },
{ templateId: 2, name: 'T2', nodes: ['node2'] },
{ templateId: 3, name: 'T3', nodes: ['node3'] },
{ templateId: 4, name: 'T4', nodes: ['node4'] }
];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve('{"custom_id": "template-1"}')
});
mockGenerator.parseResult.mockReturnValue({
templateId: 1,
metadata: { categories: ['test'] }
});
const progressCallback = vi.fn();
await processor.processTemplates(templates, progressCallback);
// Should be called during submission and retrieval
expect(progressCallback).toHaveBeenCalled();
expect(progressCallback.mock.calls.some((call: any) =>
call[0].includes('Submitting')
)).toBe(true);
});
it('should work without progress callback', async () => {
const templates = [{ templateId: 1, name: 'T1', nodes: ['node1'] }];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve('{"custom_id": "template-1"}')
});
mockGenerator.parseResult.mockReturnValue({
templateId: 1,
metadata: { categories: ['test'] }
});
// Should not throw without callback
await expect(
processor.processTemplates(templates)
).resolves.toBeDefined();
});
it('should call progress callback with correct parameters', async () => {
const templates = [
{ templateId: 1, name: 'T1', nodes: ['node1'] },
{ templateId: 2, name: 'T2', nodes: ['node2'] }
];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve('{"custom_id": "template-1"}')
});
mockGenerator.parseResult.mockReturnValue({
templateId: 1,
metadata: { categories: ['test'] }
});
const progressCallback = vi.fn();
await processor.processTemplates(templates, progressCallback);
// Check that callback was called with proper arguments
const submissionCall = progressCallback.mock.calls.find((call: any) =>
call[0].includes('Submitting')
);
expect(submissionCall).toBeDefined();
if (submissionCall) {
expect(submissionCall[1]).toBeGreaterThanOrEqual(0);
expect(submissionCall[2]).toBe(2);
}
});
});
describe('batch result merging', () => {
it('should merge results from multiple batches', async () => {
const templates = Array.from({ length: 6 }, (_, i) => ({
templateId: i + 1,
name: `T${i + 1}`,
nodes: ['node']
}));
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
// Create different completed jobs for each batch
let batchCounter = 0;
mockClient.batches.create.mockImplementation(() => {
batchCounter++;
return Promise.resolve({
id: `batch-${batchCounter}`,
status: 'completed',
output_file_id: `output-${batchCounter}`
});
});
mockClient.batches.retrieve.mockImplementation((id: string) => {
return Promise.resolve({
id,
status: 'completed',
output_file_id: `output-${id.split('-')[1]}`
});
});
let fileCounter = 0;
mockClient.files.content.mockImplementation(() => {
fileCounter++;
return Promise.resolve({
text: () => Promise.resolve(`{"custom_id": "template-${fileCounter}"}`)
});
});
mockGenerator.parseResult.mockImplementation((result: any) => {
const id = parseInt(result.custom_id.split('-')[1]);
return {
templateId: id,
metadata: { categories: [`batch-${Math.ceil(id / 3)}`] }
};
});
const results = await processor.processTemplates(templates);
// Should have results from both batches (6 templates, batchSize=3)
expect(results.size).toBeGreaterThan(0);
expect(mockClient.batches.create).toHaveBeenCalledTimes(2);
});
it('should handle empty batch results', async () => {
const templates = [
{ templateId: 1, name: 'T1', nodes: ['node'] },
{ templateId: 2, name: 'T2', nodes: ['node'] }
];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
const completedJob = {
id: 'batch-123',
status: 'completed',
output_file_id: 'output-123'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
// Return empty content
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve('')
});
const results = await processor.processTemplates(templates);
// Should handle empty results gracefully
expect(results.size).toBe(0);
});
});
describe('sleep', () => {
it('should delay for specified milliseconds', async () => {
const start = Date.now();
await (processor as any).sleep(100);
const elapsed = Date.now() - start;
expect(elapsed).toBeGreaterThanOrEqual(95);
expect(elapsed).toBeLessThan(150);
});
});
describe('processBatch (legacy method)', () => {
it('should process a single batch synchronously', async () => {
const templates = [
{ templateId: 1, name: 'Test1', nodes: ['node1'] },
{ templateId: 2, name: 'Test2', nodes: ['node2'] }
];
mockClient.files.create.mockResolvedValue({ id: 'file-abc' });
const completedJob = {
id: 'batch-xyz',
status: 'completed',
output_file_id: 'output-xyz'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
const fileContent = '{"custom_id": "template-1"}\n{"custom_id": "template-2"}';
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve(fileContent)
});
const mockResults = [
{ templateId: 1, metadata: { categories: ['test1'] } },
{ templateId: 2, metadata: { categories: ['test2'] } }
];
mockGenerator.parseResult.mockReturnValueOnce(mockResults[0])
.mockReturnValueOnce(mockResults[1]);
// Mock sleep to speed up test
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
const results = await (processor as any).processBatch(templates, 'legacy_test');
expect(results).toHaveLength(2);
expect(results[0].templateId).toBe(1);
expect(results[1].templateId).toBe(2);
expect(mockClient.batches.create).toHaveBeenCalled();
});
it('should clean up files after processing', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockClient.files.create.mockResolvedValue({ id: 'file-clean' });
const completedJob = {
id: 'batch-clean',
status: 'completed',
output_file_id: 'output-clean'
};
mockClient.batches.create.mockResolvedValue(completedJob);
mockClient.batches.retrieve.mockResolvedValue(completedJob);
mockClient.files.content.mockResolvedValue({
text: () => Promise.resolve('{"custom_id": "template-1"}')
});
mockGenerator.parseResult.mockReturnValue({
templateId: 1,
metadata: { categories: ['test'] }
});
// Mock sleep to speed up test
(processor as any).sleep = vi.fn().mockResolvedValue(undefined);
await (processor as any).processBatch(templates, 'cleanup_test');
// Should clean up all files
expect(mockedFs.unlinkSync).toHaveBeenCalled();
expect(mockClient.files.del).toHaveBeenCalledWith('file-clean');
expect(mockClient.files.del).toHaveBeenCalledWith('output-clean');
});
it('should clean up local file on error', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockClient.files.create.mockRejectedValue(new Error('Upload failed'));
await expect(
(processor as any).processBatch(templates, 'error_test')
).rejects.toThrow('Upload failed');
// Should clean up local file even on error
expect(mockedFs.unlinkSync).toHaveBeenCalled();
});
it('should handle batch job monitoring errors', async () => {
const templates = [{ templateId: 1, name: 'Test', nodes: ['node1'] }];
mockClient.files.create.mockResolvedValue({ id: 'file-123' });
mockClient.batches.create.mockResolvedValue({ id: 'batch-123' });
mockClient.batches.retrieve.mockResolvedValue({
id: 'batch-123',
status: 'failed'
});
await expect(
(processor as any).processBatch(templates, 'failed_batch')
).rejects.toThrow('Batch job failed with status: failed');
// Should still attempt cleanup
expect(mockedFs.unlinkSync).toHaveBeenCalled();
});
});
});

View File

@@ -18,7 +18,7 @@ describe('MetadataGenerator', () => {
let generator: MetadataGenerator;
beforeEach(() => {
generator = new MetadataGenerator('test-api-key', 'gpt-4o-mini');
generator = new MetadataGenerator('test-api-key', 'gpt-5-mini-2025-08-07');
});
describe('createBatchRequest', () => {
@@ -35,7 +35,7 @@ describe('MetadataGenerator', () => {
expect(request.custom_id).toBe('template-123');
expect(request.method).toBe('POST');
expect(request.url).toBe('/v1/chat/completions');
expect(request.body.model).toBe('gpt-4o-mini');
expect(request.body.model).toBe('gpt-5-mini-2025-08-07');
expect(request.body.response_format.type).toBe('json_schema');
expect(request.body.response_format.json_schema.strict).toBe(true);
expect(request.body.messages).toHaveLength(2);
@@ -217,7 +217,7 @@ describe('MetadataGenerator', () => {
// but should not cause any injection in our code
expect(userMessage).toContain('<script>alert("xss")</script>');
expect(userMessage).toContain('javascript:alert(1)');
expect(request.body.model).toBe('gpt-4o-mini');
expect(request.body.model).toBe('gpt-5-mini-2025-08-07');
});
it('should handle extremely long template names', () => {