fix: critical telemetry improvements for data quality and security (#421)

* fix: critical telemetry improvements for data quality and security Fixed three critical issues in workflow mutation telemetry: 1. Fixed Inconsistent Sanitization (Security Critical) - Problem: 30% of workflows unsanitized, exposing credentials/tokens - Solution: Use robust WorkflowSanitizer.sanitizeWorkflowRaw() - Impact: 100% sanitization with 17 sensitive patterns redacted - Files: workflow-sanitizer.ts, mutation-tracker.ts 2. Enabled Validation Data Capture (Data Quality) - Problem: Zero validation metrics captured (all NULL) - Solution: Add pre/post mutation validation with WorkflowValidator - Impact: Measure mutation quality, track error resolution - Non-blocking validation that captures errors/warnings - Files: handlers-workflow-diff.ts 3. Improved Intent Capture (Data Quality) - Problem: 92.62% generic "Partial workflow update" intents - Solution: Enhanced docs + automatic intent inference - Impact: Meaningful intents auto-generated from operations - Files: n8n-update-partial-workflow.ts, handlers-workflow-diff.ts Expected Results: - 100% sanitization coverage (up from 70%) - 100% validation capture (up from 0%) - 50%+ meaningful intents (up from 7.33%) Version bumped to 2.22.17 🤖 Generated with [Claude Code](https://claude.com/claude-code) Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en Co-Authored-By: Claude <noreply@anthropic.com> * perf: implement validator instance caching to avoid redundant initialization - Add module-level cached WorkflowValidator instance - Create getValidator() helper to reuse validator across mutations - Update pre/post mutation validation to use cached instance - Avoids redundant NodeSimilarityService initialization on every mutation Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: restore backward-compatible sanitization with context preservation Fixed CI test failures by updating WorkflowSanitizer to use pattern-specific placeholders while maintaining backward compatibility: Changes: - Convert SENSITIVE_PATTERNS to PatternDefinition objects with specific placeholders - Update sanitizeString() to preserve context (Bearer prefix, URL paths) - Refactor sanitizeObject() to handle sensitive fields vs URL fields differently - Remove overly greedy field patterns that conflicted with token patterns Pattern-specific placeholders: - [REDACTED_URL_WITH_AUTH] for URLs with credentials - [REDACTED_TOKEN] for long tokens (32+ chars) - [REDACTED_APIKEY] for OpenAI-style keys - Bearer [REDACTED] for Bearer tokens (preserves "Bearer " prefix) - [REDACTED] for generic sensitive fields Test Results: - All 13 mutation-tracker tests passing - URL with auth: preserves path after credentials - Long tokens: properly detected and marked - OpenAI keys: correctly identified - Bearer tokens: prefix preserved - Sensitive field names: generic redaction for non-URL fields Fixes #421 CI failures Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: prevent double-redaction in workflow sanitizer Added safeguard to stop pattern matching once a placeholder is detected, preventing token patterns from matching text inside placeholders like [REDACTED_URL_WITH_AUTH]. Also expanded database URL pattern to match full URLs including port and path, and updated test expectations to match context-preserving sanitization. Fixes: - Database URLs now properly sanitized to [REDACTED_URL_WITH_AUTH] - Prevents [[REDACTED]] double-redaction issue - All 25 workflow-sanitizer tests passing - No regression in mutation-tracker tests Conceived by Romuald Członkowski - www.aiadvisors.pl/en --------- Co-authored-by: Claude <noreply@anthropic.com>
2026-02-06 13:33:11 +00:00 · 2025-11-13 22:13:31 +01:00
parent 99c5907b71
commit 597bd290b6
11 changed files with 630 additions and 137 deletions
--- a/src/telemetry/mutation-tracker.ts
+++ b/src/telemetry/mutation-tracker.ts
@@ -41,8 +41,8 @@ export class MutationTracker {
      }

      // Sanitize workflows to remove credentials and sensitive data
-      const workflowBefore = this.sanitizeFullWorkflow(data.workflowBefore);
-      const workflowAfter = this.sanitizeFullWorkflow(data.workflowAfter);
+      const workflowBefore = WorkflowSanitizer.sanitizeWorkflowRaw(data.workflowBefore);
+      const workflowAfter = WorkflowSanitizer.sanitizeWorkflowRaw(data.workflowAfter);

      // Sanitize user intent
      const sanitizedIntent = intentSanitizer.sanitize(data.userIntent);
@@ -200,98 +200,6 @@ export class MutationTracker {
    return metrics;
  }

-  /**
-   * Sanitize a full workflow while preserving structure
-   * Removes credentials and sensitive data but keeps all nodes, connections, parameters
-   */
-  private sanitizeFullWorkflow(workflow: any): any {
-    if (!workflow) return workflow;
-
-    // Deep clone to avoid modifying original
-    const sanitized = JSON.parse(JSON.stringify(workflow));
-
-    // Remove sensitive workflow-level fields
-    delete sanitized.credentials;
-    delete sanitized.sharedWorkflows;
-    delete sanitized.ownedBy;
-    delete sanitized.createdBy;
-    delete sanitized.updatedBy;
-
-    // Sanitize each node
-    if (sanitized.nodes && Array.isArray(sanitized.nodes)) {
-      sanitized.nodes = sanitized.nodes.map((node: any) => {
-        const sanitizedNode = { ...node };
-
-        // Remove credentials field
-        delete sanitizedNode.credentials;
-
-        // Sanitize parameters if present
-        if (sanitizedNode.parameters && typeof sanitizedNode.parameters === 'object') {
-          sanitizedNode.parameters = this.sanitizeParameters(sanitizedNode.parameters);
-        }
-
-        return sanitizedNode;
-      });
-    }
-
-    return sanitized;
-  }
-
-  /**
-   * Recursively sanitize parameters object
-   */
-  private sanitizeParameters(params: any): any {
-    if (!params || typeof params !== 'object') return params;
-
-    const sensitiveKeys = [
-      'apiKey', 'api_key', 'token', 'secret', 'password', 'credential',
-      'auth', 'authorization', 'privateKey', 'accessToken', 'refreshToken'
-    ];
-
-    const sanitized: any = Array.isArray(params) ? [] : {};
-
-    for (const [key, value] of Object.entries(params)) {
-      const lowerKey = key.toLowerCase();
-
-      // Check if key is sensitive
-      if (sensitiveKeys.some(sk => lowerKey.includes(sk.toLowerCase()))) {
-        sanitized[key] = '[REDACTED]';
-      } else if (typeof value === 'object' && value !== null) {
-        // Recursively sanitize nested objects
-        sanitized[key] = this.sanitizeParameters(value);
-      } else if (typeof value === 'string') {
-        // Sanitize string values that might contain sensitive data
-        sanitized[key] = this.sanitizeStringValue(value);
-      } else {
-        sanitized[key] = value;
-      }
-    }
-
-    return sanitized;
-  }
-
-  /**
-   * Sanitize string values that might contain sensitive data
-   */
-  private sanitizeStringValue(value: string): string {
-    if (!value || typeof value !== 'string') return value;
-
-    let sanitized = value;
-
-    // Redact URLs with authentication
-    sanitized = sanitized.replace(/https?:\/\/[^:]+:[^@]+@[^\s/]+/g, '[REDACTED_URL_WITH_AUTH]');
-
-    // Redact long API keys/tokens (20+ alphanumeric chars)
-    sanitized = sanitized.replace(/\b[A-Za-z0-9_-]{32,}\b/g, '[REDACTED_TOKEN]');
-
-    // Redact OpenAI-style keys
-    sanitized = sanitized.replace(/\bsk-[A-Za-z0-9]{32,}\b/g, '[REDACTED_APIKEY]');
-
-    // Redact Bearer tokens
-    sanitized = sanitized.replace(/Bearer\s+[^\s]+/gi, 'Bearer [REDACTED]');
-
-    return sanitized;
-  }

  /**
   * Calculate validation improvement metrics