fix: add structural hash tracking for workflow mutations (#422)

* feat: add structural hashes and success tracking for workflow mutations

Enables cross-referencing workflow_mutations with telemetry_workflows by adding structural hashes (nodeTypes + connections) alongside existing full hashes.

**Database Changes:**
- Added workflow_structure_hash_before/after columns
- Added is_truly_successful computed column
- Created 3 analytics views: successful_mutations, mutation_training_data, mutations_with_workflow_quality
- Created 2 helper functions: get_mutation_success_rate_by_intent(), get_mutation_crossref_stats()

**Code Changes:**
- Updated mutation-tracker.ts to generate both hash types
- Updated mutation-types.ts with new fields
- Auto-converts to snake_case via existing toSnakeCase() function

**Testing:**
- Added 5 new unit tests for structural hash generation
- All 17 tests passing

**Tooling:**
- Created backfill script to populate hashes for existing 1,499 mutations
- Created comprehensive documentation (STRUCTURAL_HASHES.md)

**Impact:**
- Before: 0% cross-reference match rate
- After: Expected 60-70% match rate (post-backfill)
- Unlocks quality impact analysis, training data curation, and mutation pattern insights

Conceived by Romuald Członkowski - www.aiadvisors.pl/en

* fix: correct test operation types for structural hash tests

Fixed TypeScript errors in mutation-tracker tests by adding required
'updates' parameter to updateNode operations. Used 'as any' for test
operations to maintain backward compatibility while tests are updated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en

* chore: remove documentation files from tracking

Removed internal documentation files from version control:
- Telemetry implementation docs
- Implementation roadmap
- Disabled tools analysis docs

These files are for internal reference only.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en

* chore: remove telemetry documentation files from tracking

Removed all telemetry analysis and documentation files from root directory.
These files are for internal reference only and should not be in version control.

Files removed:
- TELEMETRY_ANALYSIS*.md
- TELEMETRY_MUTATION_SPEC.md
- TELEMETRY_*_DATASET.md
- VALIDATION_ANALYSIS*.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en

* chore: bump version to 2.22.18 and update CHANGELOG

Version 2.22.18 adds structural hash tracking for workflow mutations,
enabling cross-referencing with workflow quality data and automated
success detection.

Key changes:
- Added workflowStructureHashBefore/After fields
- Added isTrulySuccessful computed field
- Enhanced mutation tracking with structural hashes
- All tests passing (17/17)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en

* chore: remove migration and documentation files from PR

Removed internal database migration files and documentation from
version control:
- docs/migrations/
- docs/telemetry/

Updated CHANGELOG to remove database migration references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Conceived by Romuald Członkowski - https://www.aiadvisors.pl/en
This commit is contained in:
Romuald Członkowski
2025-11-14 13:57:54 +01:00
committed by GitHub
parent 597bd290b6
commit 1bbfaabbc2
23 changed files with 484 additions and 7900 deletions

View File

@@ -1,165 +0,0 @@
-- Migration: Create workflow_mutations table for tracking partial update operations
-- Purpose: Capture workflow transformation data to improve partial updates tooling
-- Date: 2025-01-12
-- Create workflow_mutations table
CREATE TABLE IF NOT EXISTS workflow_mutations (
-- Primary key
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- User identification (anonymized)
user_id TEXT NOT NULL,
session_id TEXT NOT NULL,
-- Workflow snapshots (compressed JSONB)
workflow_before JSONB NOT NULL,
workflow_after JSONB NOT NULL,
workflow_hash_before TEXT NOT NULL,
workflow_hash_after TEXT NOT NULL,
-- Intent capture
user_intent TEXT NOT NULL,
intent_classification TEXT,
tool_name TEXT NOT NULL CHECK (tool_name IN ('n8n_update_partial_workflow', 'n8n_update_full_workflow')),
-- Operations performed
operations JSONB NOT NULL,
operation_count INTEGER NOT NULL CHECK (operation_count >= 0),
operation_types TEXT[] NOT NULL,
-- Validation metrics
validation_before JSONB,
validation_after JSONB,
validation_improved BOOLEAN,
errors_resolved INTEGER DEFAULT 0 CHECK (errors_resolved >= 0),
errors_introduced INTEGER DEFAULT 0 CHECK (errors_introduced >= 0),
-- Change metrics
nodes_added INTEGER DEFAULT 0 CHECK (nodes_added >= 0),
nodes_removed INTEGER DEFAULT 0 CHECK (nodes_removed >= 0),
nodes_modified INTEGER DEFAULT 0 CHECK (nodes_modified >= 0),
connections_added INTEGER DEFAULT 0 CHECK (connections_added >= 0),
connections_removed INTEGER DEFAULT 0 CHECK (connections_removed >= 0),
properties_changed INTEGER DEFAULT 0 CHECK (properties_changed >= 0),
-- Outcome tracking
mutation_success BOOLEAN NOT NULL,
mutation_error TEXT,
-- Performance metrics
duration_ms INTEGER CHECK (duration_ms >= 0),
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create indexes for efficient querying
-- Primary indexes for filtering
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_user_id
ON workflow_mutations(user_id);
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_session_id
ON workflow_mutations(session_id);
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_created_at
ON workflow_mutations(created_at DESC);
-- Intent and classification indexes
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_intent_classification
ON workflow_mutations(intent_classification)
WHERE intent_classification IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_tool_name
ON workflow_mutations(tool_name);
-- Operation analysis indexes
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_operation_types
ON workflow_mutations USING GIN(operation_types);
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_operation_count
ON workflow_mutations(operation_count);
-- Outcome indexes
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_success
ON workflow_mutations(mutation_success);
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_validation_improved
ON workflow_mutations(validation_improved)
WHERE validation_improved IS NOT NULL;
-- Change metrics indexes
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_nodes_added
ON workflow_mutations(nodes_added)
WHERE nodes_added > 0;
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_nodes_modified
ON workflow_mutations(nodes_modified)
WHERE nodes_modified > 0;
-- Hash indexes for deduplication
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_hash_before
ON workflow_mutations(workflow_hash_before);
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_hash_after
ON workflow_mutations(workflow_hash_after);
-- Composite indexes for common queries
-- Find successful mutations by intent classification
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_success_classification
ON workflow_mutations(mutation_success, intent_classification)
WHERE intent_classification IS NOT NULL;
-- Find mutations that improved validation
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_validation_success
ON workflow_mutations(validation_improved, mutation_success)
WHERE validation_improved IS TRUE;
-- Find mutations by user and time range
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_user_time
ON workflow_mutations(user_id, created_at DESC);
-- Find mutations with significant changes (expression index)
CREATE INDEX IF NOT EXISTS idx_workflow_mutations_significant_changes
ON workflow_mutations((nodes_added + nodes_removed + nodes_modified))
WHERE (nodes_added + nodes_removed + nodes_modified) > 0;
-- Comments for documentation
COMMENT ON TABLE workflow_mutations IS
'Tracks workflow mutations from partial update operations to analyze transformation patterns and improve tooling';
COMMENT ON COLUMN workflow_mutations.workflow_before IS
'Complete workflow JSON before mutation (sanitized, credentials removed)';
COMMENT ON COLUMN workflow_mutations.workflow_after IS
'Complete workflow JSON after mutation (sanitized, credentials removed)';
COMMENT ON COLUMN workflow_mutations.user_intent IS
'User instruction or intent for the workflow change (sanitized for PII)';
COMMENT ON COLUMN workflow_mutations.intent_classification IS
'Classified pattern: add_functionality, modify_configuration, rewire_logic, fix_validation, cleanup, unknown';
COMMENT ON COLUMN workflow_mutations.operations IS
'Array of diff operations performed (addNode, updateNode, addConnection, etc.)';
COMMENT ON COLUMN workflow_mutations.validation_improved IS
'Whether the mutation reduced validation errors (NULL if validation data unavailable)';
-- Row-level security
ALTER TABLE workflow_mutations ENABLE ROW LEVEL SECURITY;
-- Create policy for anonymous inserts (required for telemetry)
CREATE POLICY "Allow anonymous inserts"
ON workflow_mutations
FOR INSERT
TO anon
WITH CHECK (true);
-- Create policy for authenticated reads (for analysis)
CREATE POLICY "Allow authenticated reads"
ON workflow_mutations
FOR SELECT
TO authenticated
USING (true);