# n8n-MCP Telemetry Database Pruning Strategy **Analysis Date:** 2025-10-10 **Current Database Size:** 265 MB (telemetry_events: 199 MB, telemetry_workflows: 66 MB) **Free Tier Limit:** 500 MB **Projected 4-Week Size:** 609 MB (exceeds limit by 109 MB) --- ## Executive Summary **Critical Finding:** At current growth rate (56.75% of data from last 7 days), we will exceed the 500 MB free tier limit in approximately 2 weeks. Implementing a 7-day retention policy can immediately save 36.5 MB (37.6%) and prevent database overflow. **Key Insights:** - 641,487 event records consuming 199 MB - 17,247 workflow records consuming 66 MB - Daily growth rate: ~7-8 MB/day for events - 43.25% of data is older than 7 days but provides diminishing value **Immediate Action Required:** Implement automated pruning to maintain database under 500 MB. --- ## 1. Current State Assessment ### Database Size and Distribution | Table | Rows | Current Size | Growth Rate | Bytes/Row | |-------|------|--------------|-------------|-----------| | telemetry_events | 641,487 | 199 MB | 56.66% from last 7d | 325 | | telemetry_workflows | 17,247 | 66 MB | 60.09% from last 7d | 4,013 | | **TOTAL** | **658,734** | **265 MB** | **56.75% from last 7d** | **403** | ### Event Type Distribution | Event Type | Count | % of Total | Storage | Avg Props Size | Oldest Event | |------------|-------|-----------|---------|----------------|--------------| | tool_sequence | 362,170 | 56.4% | 67 MB | 194 bytes | 2025-09-26 | | tool_used | 191,659 | 29.9% | 14 MB | 77 bytes | 2025-09-26 | | validation_details | 36,266 | 5.7% | 11 MB | 329 bytes | 2025-09-26 | | workflow_created | 23,151 | 3.6% | 2.6 MB | 115 bytes | 2025-09-26 | | session_start | 12,575 | 2.0% | 1.2 MB | 101 bytes | 2025-09-26 | | workflow_validation_failed | 9,739 | 1.5% | 314 KB | 33 bytes | 2025-09-26 | | error_occurred | 4,935 | 0.8% | 626 KB | 130 bytes | 2025-09-26 | | search_query | 974 | 0.2% | 106 KB | 112 bytes | 2025-09-26 | | Other | 18 | <0.1% | 5 KB | Various | Recent | ### Growth Pattern Analysis **Daily Data Accumulation (Last 15 Days):** | Date | Events/Day | Daily Size | Cumulative Size | |------|-----------|------------|-----------------| | 2025-10-10 | 28,457 | 4.3 MB | 97 MB | | 2025-10-09 | 54,717 | 8.2 MB | 93 MB | | 2025-10-08 | 52,901 | 7.9 MB | 85 MB | | 2025-10-07 | 52,538 | 8.1 MB | 77 MB | | 2025-10-06 | 51,401 | 7.8 MB | 69 MB | | 2025-10-05 | 50,528 | 7.9 MB | 61 MB | **Average Daily Growth:** ~7.7 MB/day **Weekly Growth:** ~54 MB/week **Projected to hit 500 MB limit:** ~17 days (late October 2025) ### Workflow Data Distribution | Complexity | Count | % | Avg Nodes | Avg JSON Size | Estimated Size | |-----------|-------|---|-----------|---------------|----------------| | Simple | 12,923 | 77.6% | 5.48 | 2,122 bytes | 20 MB | | Medium | 3,708 | 22.3% | 13.93 | 4,458 bytes | 12 MB | | Complex | 616 | 0.1% | 26.62 | 7,909 bytes | 3.2 MB | **Key Finding:** No duplicate workflow hashes found - each workflow is unique (good data quality). --- ## 2. Data Value Classification ### TIER 1: Critical - Keep Indefinitely **Error Patterns (error_occurred)** - **Why:** Essential for identifying systemic issues and regression detection - **Volume:** 4,935 events (626 KB) - **Recommendation:** Keep all errors with aggregated summaries for older data - **Retention:** Detailed errors 30 days, aggregated stats indefinitely **Tool Usage Statistics (Aggregated)** - **Why:** Product analytics and feature prioritization - **Recommendation:** Aggregate daily/weekly summaries after 14 days - **Keep:** Summary tables with tool usage counts, success rates, avg duration ### TIER 2: High Value - Keep 30 Days **Validation Details (validation_details)** - **Current:** 36,266 events, 11 MB, avg 329 bytes - **Why:** Important for understanding validation issues during current development cycle - **Value Period:** 30 days (covers current version development) - **After 30d:** Aggregate to summary stats (validation success rate by node type) **Workflow Creation Patterns (workflow_created)** - **Current:** 23,151 events, 2.6 MB - **Why:** Track feature adoption and workflow patterns - **Value Period:** 30 days for detailed analysis - **After 30d:** Keep aggregated metrics only ### TIER 3: Medium Value - Keep 14 Days **Session Data (session_start)** - **Current:** 12,575 events, 1.2 MB - **Why:** User engagement tracking - **Value Period:** 14 days sufficient for engagement analysis - **Pruning Impact:** 497 KB saved (40% reduction) **Workflow Validation Failures (workflow_validation_failed)** - **Current:** 9,739 events, 314 KB - **Why:** Tracks validation patterns but less detailed than validation_details - **Value Period:** 14 days - **Pruning Impact:** 170 KB saved (54% reduction) ### TIER 4: Short-Term Value - Keep 7 Days **Tool Sequences (tool_sequence)** - **Current:** 362,170 events, 67 MB (largest table!) - **Why:** Tracks multi-tool workflows but extremely high volume - **Value Period:** 7 days for recent pattern analysis - **Pruning Impact:** 29 MB saved (43% reduction) - HIGHEST IMPACT - **Rationale:** Tool usage patterns stabilize quickly; older sequences provide diminishing returns **Tool Usage Events (tool_used)** - **Current:** 191,659 events, 14 MB - **Why:** Individual tool executions - can be aggregated - **Value Period:** 7 days detailed, then aggregate - **Pruning Impact:** 6.2 MB saved (44% reduction) **Search Queries (search_query)** - **Current:** 974 events, 106 KB - **Why:** Low volume, useful for understanding search patterns - **Value Period:** 7 days sufficient - **Pruning Impact:** Minimal (~1 KB) ### TIER 5: Ephemeral - Keep 3 Days **Diagnostic/Health Checks (diagnostic_completed, health_check_completed)** - **Current:** 17 events, ~2.5 KB - **Why:** Operational health checks, only current state matters - **Value Period:** 3 days - **Pruning Impact:** Negligible but good hygiene ### Workflow Data Retention Strategy **telemetry_workflows Table (66 MB):** - **Simple workflows (5-6 nodes):** Keep 7 days → Save 11 MB - **Medium workflows (13-14 nodes):** Keep 14 days → Save 6.7 MB - **Complex workflows (26+ nodes):** Keep 30 days → Save 1.9 MB - **Total Workflow Savings:** 19.6 MB with tiered retention **Rationale:** Complex workflows are rarer and more valuable for understanding advanced use cases. --- ## 3. Pruning Recommendations with Space Savings ### Strategy A: Conservative 14-Day Retention (Recommended for Initial Implementation) | Action | Records Deleted | Space Saved | Risk Level | |--------|----------------|-------------|------------| | Delete tool_sequence > 14d | 0 | 0 MB | None - all recent | | Delete tool_used > 14d | 0 | 0 MB | None - all recent | | Delete validation_details > 14d | 4,259 | 1.2 MB | Low | | Delete session_start > 14d | 0 | 0 MB | None - all recent | | Delete workflows > 14d | 1 | <1 KB | None | | **TOTAL** | **4,260** | **1.2 MB** | **Low** | **Assessment:** Minimal immediate impact but data is too recent. Not sufficient to prevent overflow. ### Strategy B: Aggressive 7-Day Retention (RECOMMENDED) | Action | Records Deleted | Space Saved | Risk Level | |--------|----------------|-------------|------------| | Delete tool_sequence > 7d | 155,389 | 29 MB | Low - pattern data | | Delete tool_used > 7d | 82,827 | 6.2 MB | Low - usage metrics | | Delete validation_details > 7d | 17,465 | 5.4 MB | Medium - debugging data | | Delete workflow_created > 7d | 9,106 | 1.0 MB | Low - creation events | | Delete session_start > 7d | 5,664 | 497 KB | Low - session data | | Delete error_occurred > 7d | 2,321 | 206 KB | Medium - error history | | Delete workflow_validation_failed > 7d | 5,269 | 170 KB | Low - validation events | | Delete workflows > 7d (simple) | 5,146 | 11 MB | Low - simple workflows | | Delete workflows > 7d (medium) | 1,506 | 6.7 MB | Medium - medium workflows | | Delete workflows > 7d (complex) | 231 | 1.9 MB | High - complex workflows | | **TOTAL** | **284,924** | **62.1 MB** | **Medium** | **New Database Size:** 265 MB - 62.1 MB = **202.9 MB (76.6% of limit)** **Buffer:** 297 MB remaining (~38 days at current growth rate) ### Strategy C: Hybrid Tiered Retention (OPTIMAL LONG-TERM) | Event Type | Retention Period | Records Deleted | Space Saved | |-----------|------------------|----------------|-------------| | tool_sequence | 7 days | 155,389 | 29 MB | | tool_used | 7 days | 82,827 | 6.2 MB | | validation_details | 14 days | 4,259 | 1.2 MB | | workflow_created | 14 days | 3 | <1 KB | | session_start | 7 days | 5,664 | 497 KB | | error_occurred | 30 days (keep all) | 0 | 0 MB | | workflow_validation_failed | 7 days | 5,269 | 170 KB | | search_query | 7 days | 10 | 1 KB | | Workflows (simple) | 7 days | 5,146 | 11 MB | | Workflows (medium) | 14 days | 0 | 0 MB | | Workflows (complex) | 30 days (keep all) | 0 | 0 MB | | **TOTAL** | **Various** | **258,567** | **48.1 MB** | **New Database Size:** 265 MB - 48.1 MB = **216.9 MB (82% of limit)** **Buffer:** 283 MB remaining (~36 days at current growth rate) --- ## 4. Additional Optimization Opportunities ### Optimization 1: Properties Field Compression **Finding:** validation_details events have bloated properties (avg 329 bytes, max 9 KB) ```sql -- Identify large validation_details records SELECT id, user_id, created_at, pg_column_size(properties) as size_bytes FROM telemetry_events WHERE event = 'validation_details' AND pg_column_size(properties) > 1000 ORDER BY size_bytes DESC; -- Result: 417 records > 1KB, 2 records > 5KB ``` **Recommendation:** Truncate verbose error messages in validation_details after 7 days - Keep error types and counts - Remove full stack traces and detailed messages - Estimated savings: 2-3 MB ### Optimization 2: Remove Redundant tool_sequence Data **Finding:** tool_sequence properties contain mostly null values ```sql -- Analysis shows all tool_sequence.properties->>'tools' are null -- 362,170 records storing null in properties field ``` **Recommendation:** 1. Investigate why tool_sequence properties are empty 2. If by design, reduce properties field size or use a flag 3. Potential savings: 10-15 MB if properties field is eliminated ### Optimization 3: Workflow Deduplication by Hash **Finding:** No duplicate workflow_hash values found (good!) **Recommendation:** Continue using workflow_hash for future deduplication if needed. No action required. ### Optimization 4: Dead Row Cleanup **Finding:** telemetry_workflows has 1,591 dead rows (9.5% overhead) ```sql -- Run VACUUM to reclaim space VACUUM FULL telemetry_workflows; -- Expected savings: ~6-7 MB ``` **Recommendation:** Schedule weekly VACUUM operations ### Optimization 5: Index Optimization **Current indexes consume space but improve query performance** ```sql -- Check index sizes SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) as index_size FROM pg_stat_user_indexes WHERE schemaname = 'public' ORDER BY pg_relation_size(indexrelid) DESC; ``` **Recommendation:** Review if all indexes are necessary after pruning strategy is implemented --- ## 5. Implementation Strategy ### Phase 1: Immediate Emergency Pruning (Day 1) **Goal:** Free up 60+ MB immediately to prevent overflow ```sql -- EMERGENCY PRUNING: Delete data older than 7 days BEGIN; -- Backup count before deletion SELECT event, COUNT(*) FILTER (WHERE created_at < NOW() - INTERVAL '7 days') as to_delete FROM telemetry_events GROUP BY event; -- Delete old events DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '7 days'; -- Expected: ~278,051 rows deleted, ~36.5 MB saved -- Delete old simple workflows DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '7 days' AND complexity = 'simple'; -- Expected: ~5,146 rows deleted, ~11 MB saved -- Verify new size SELECT schemaname, relname, pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) AS size FROM pg_stat_user_tables WHERE schemaname = 'public'; COMMIT; -- Clean up dead rows VACUUM FULL telemetry_events; VACUUM FULL telemetry_workflows; ``` **Expected Result:** Database size reduced to ~210-220 MB (55-60% buffer remaining) ### Phase 2: Implement Automated Retention Policy (Week 1) **Create a scheduled Supabase Edge Function or pg_cron job** ```sql -- Create retention policy function CREATE OR REPLACE FUNCTION apply_retention_policy() RETURNS void AS $$ BEGIN -- Tier 4: 7-day retention for high-volume events DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '7 days' AND event IN ('tool_sequence', 'tool_used', 'session_start', 'workflow_validation_failed', 'search_query'); -- Tier 3: 14-day retention for medium-value events DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '14 days' AND event IN ('validation_details', 'workflow_created'); -- Tier 1: 30-day retention for errors (keep longer) DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '30 days' AND event = 'error_occurred'; -- Workflow retention by complexity DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '7 days' AND complexity = 'simple'; DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '14 days' AND complexity = 'medium'; DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '30 days' AND complexity = 'complex'; -- Cleanup VACUUM telemetry_events; VACUUM telemetry_workflows; END; $$ LANGUAGE plpgsql; -- Schedule daily execution (using pg_cron extension) SELECT cron.schedule('retention-policy', '0 2 * * *', 'SELECT apply_retention_policy()'); ``` ### Phase 3: Create Aggregation Tables (Week 2) **Preserve insights while deleting raw data** ```sql -- Daily tool usage summary CREATE TABLE IF NOT EXISTS telemetry_daily_tool_stats ( date DATE NOT NULL, tool TEXT NOT NULL, usage_count INTEGER NOT NULL, unique_users INTEGER NOT NULL, avg_duration_ms NUMERIC, error_count INTEGER DEFAULT 0, created_at TIMESTAMPTZ DEFAULT NOW(), PRIMARY KEY (date, tool) ); -- Daily validation summary CREATE TABLE IF NOT EXISTS telemetry_daily_validation_stats ( date DATE NOT NULL, node_type TEXT, total_validations INTEGER NOT NULL, failed_validations INTEGER NOT NULL, success_rate NUMERIC, common_errors JSONB, created_at TIMESTAMPTZ DEFAULT NOW(), PRIMARY KEY (date, node_type) ); -- Aggregate function to run before pruning CREATE OR REPLACE FUNCTION aggregate_before_pruning() RETURNS void AS $$ BEGIN -- Aggregate tool usage for data about to be deleted INSERT INTO telemetry_daily_tool_stats (date, tool, usage_count, unique_users, avg_duration_ms) SELECT DATE(created_at) as date, properties->>'tool' as tool, COUNT(*) as usage_count, COUNT(DISTINCT user_id) as unique_users, AVG((properties->>'duration')::numeric) as avg_duration_ms FROM telemetry_events WHERE event = 'tool_used' AND created_at < NOW() - INTERVAL '7 days' AND created_at >= NOW() - INTERVAL '8 days' GROUP BY DATE(created_at), properties->>'tool' ON CONFLICT (date, tool) DO NOTHING; -- Aggregate validation stats INSERT INTO telemetry_daily_validation_stats (date, node_type, total_validations, failed_validations) SELECT DATE(created_at) as date, properties->>'nodeType' as node_type, COUNT(*) as total_validations, COUNT(*) FILTER (WHERE properties->>'valid' = 'false') as failed_validations FROM telemetry_events WHERE event = 'validation_details' AND created_at < NOW() - INTERVAL '14 days' AND created_at >= NOW() - INTERVAL '15 days' GROUP BY DATE(created_at), properties->>'nodeType' ON CONFLICT (date, node_type) DO NOTHING; END; $$ LANGUAGE plpgsql; -- Update cron job to aggregate before pruning SELECT cron.schedule('aggregate-then-prune', '0 2 * * *', 'SELECT aggregate_before_pruning(); SELECT apply_retention_policy();'); ``` ### Phase 4: Monitoring and Alerting (Week 2) **Create size monitoring function** ```sql CREATE OR REPLACE FUNCTION check_database_size() RETURNS TABLE( total_size_mb NUMERIC, limit_mb NUMERIC, percent_used NUMERIC, days_until_full NUMERIC ) AS $$ DECLARE current_size_bytes BIGINT; growth_rate_bytes_per_day NUMERIC; BEGIN -- Get current size SELECT SUM(pg_total_relation_size(schemaname||'.'||relname)) INTO current_size_bytes FROM pg_stat_user_tables WHERE schemaname = 'public'; -- Calculate 7-day growth rate SELECT (COUNT(*) FILTER (WHERE created_at >= NOW() - INTERVAL '7 days')) * AVG(pg_column_size(properties)) * (1.0/7) INTO growth_rate_bytes_per_day FROM telemetry_events; RETURN QUERY SELECT ROUND((current_size_bytes / 1024.0 / 1024.0)::numeric, 2) as total_size_mb, 500.0 as limit_mb, ROUND((current_size_bytes / 1024.0 / 1024.0 / 500.0 * 100)::numeric, 2) as percent_used, ROUND((((500.0 * 1024 * 1024) - current_size_bytes) / NULLIF(growth_rate_bytes_per_day, 0))::numeric, 1) as days_until_full; END; $$ LANGUAGE plpgsql; -- Alert function (integrate with external monitoring) CREATE OR REPLACE FUNCTION alert_if_size_critical() RETURNS void AS $$ DECLARE size_pct NUMERIC; BEGIN SELECT percent_used INTO size_pct FROM check_database_size(); IF size_pct > 90 THEN -- Log critical alert INSERT INTO telemetry_events (user_id, event, properties) VALUES ('system', 'database_size_critical', json_build_object('percent_used', size_pct, 'timestamp', NOW())::jsonb); END IF; END; $$ LANGUAGE plpgsql; ``` --- ## 6. Priority Order for Implementation ### Priority 1: URGENT (Day 1) 1. **Execute Emergency Pruning** - Delete data older than 7 days - Impact: 47.5 MB saved immediately - Risk: Low - data already analyzed - SQL: Provided in Phase 1 ### Priority 2: HIGH (Week 1) 2. **Implement Automated Retention Policy** - Impact: Prevents future overflow - Risk: Low with proper testing - Implementation: Phase 2 function 3. **Run VACUUM FULL** - Impact: 6-7 MB reclaimed from dead rows - Risk: Low but locks tables briefly - Command: `VACUUM FULL telemetry_workflows;` ### Priority 3: MEDIUM (Week 2) 4. **Create Aggregation Tables** - Impact: Preserves insights, enables longer-term pruning - Risk: Low - additive only - Implementation: Phase 3 tables and functions 5. **Implement Monitoring** - Impact: Prevents future surprises - Risk: None - Implementation: Phase 4 monitoring functions ### Priority 4: LOW (Month 1) 6. **Optimize Properties Fields** - Impact: 2-3 MB additional savings - Risk: Medium - requires code changes - Action: Truncate verbose error messages 7. **Investigate tool_sequence null properties** - Impact: 10-15 MB potential savings - Risk: Medium - requires application changes - Action: Code review and optimization --- ## 7. Risk Assessment ### Strategy B (7-Day Retention): Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |------|-----------|---------|------------| | Loss of debugging data for old issues | Medium | Medium | Keep error_occurred for 30 days; aggregate validation stats | | Unable to analyze long-term trends | Low | Low | Implement aggregation tables before pruning | | Accidental deletion of critical data | Low | High | Test on staging; implement backups; add rollback capability | | Performance impact during deletion | Medium | Low | Run during off-peak hours (2 AM UTC) | | VACUUM locks table briefly | Low | Low | Schedule during low-usage window | ### Strategy C (Hybrid Tiered): Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |------|-----------|---------|------------| | Complex logic leads to bugs | Medium | Medium | Thorough testing; monitoring; gradual rollout | | Different retention per event type confusing | Low | Low | Document clearly; add comments in code | | Tiered approach still insufficient | Low | High | Monitor growth; adjust retention if needed | --- ## 8. Monitoring Metrics ### Key Metrics to Track Post-Implementation 1. **Database Size Trend** ```sql SELECT * FROM check_database_size(); ``` - Target: Stay under 300 MB (60% of limit) - Alert threshold: 90% (450 MB) 2. **Daily Growth Rate** ```sql SELECT DATE(created_at) as date, COUNT(*) as events, pg_size_pretty(SUM(pg_column_size(properties))::bigint) as daily_size FROM telemetry_events WHERE created_at >= NOW() - INTERVAL '7 days' GROUP BY DATE(created_at) ORDER BY date DESC; ``` - Target: < 8 MB/day average - Alert threshold: > 12 MB/day sustained 3. **Retention Policy Execution** ```sql -- Add logging to retention policy function CREATE TABLE retention_policy_log ( executed_at TIMESTAMPTZ DEFAULT NOW(), events_deleted INTEGER, workflows_deleted INTEGER, space_reclaimed_mb NUMERIC ); ``` - Monitor: Daily successful execution - Alert: If job fails or deletes 0 rows unexpectedly 4. **Data Availability Check** ```sql -- Ensure sufficient data for analysis SELECT event, COUNT(*) as available_records, MIN(created_at) as oldest_record, MAX(created_at) as newest_record FROM telemetry_events GROUP BY event; ``` - Target: 7 days of data always available - Alert: If oldest_record > 8 days ago (retention policy failing) --- ## 9. Recommended Action Plan ### Immediate Actions (Today) **Step 1:** Execute emergency pruning ```sql -- Backup first (optional but recommended) -- Create a copy of current stats CREATE TABLE telemetry_events_stats_backup AS SELECT event, COUNT(*), MIN(created_at), MAX(created_at) FROM telemetry_events GROUP BY event; -- Execute pruning DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '7 days'; DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '7 days' AND complexity = 'simple'; VACUUM FULL telemetry_events; VACUUM FULL telemetry_workflows; ``` **Step 2:** Verify results ```sql SELECT * FROM check_database_size(); ``` **Expected outcome:** Database size ~210-220 MB (58-60% buffer remaining) ### Week 1 Actions **Step 3:** Implement automated retention policy - Create retention policy function (Phase 2 code) - Test function on staging/development environment - Schedule daily execution via pg_cron **Step 4:** Set up monitoring - Create monitoring functions (Phase 4 code) - Configure alerts for size thresholds - Document escalation procedures ### Week 2 Actions **Step 5:** Create aggregation tables - Implement summary tables (Phase 3 code) - Backfill historical aggregations if needed - Update retention policy to aggregate before pruning **Step 6:** Optimize and tune - Review query performance post-pruning - Adjust retention periods if needed based on actual usage - Document any issues or improvements ### Monthly Maintenance **Step 7:** Regular review - Monthly review of database growth trends - Quarterly review of retention policy effectiveness - Adjust retention periods based on product needs --- ## 10. SQL Execution Scripts ### Script 1: Emergency Pruning (Run First) ```sql -- ============================================ -- EMERGENCY PRUNING SCRIPT -- Expected savings: ~50 MB -- Execution time: 2-5 minutes -- ============================================ BEGIN; -- Create backup of current state CREATE TABLE IF NOT EXISTS pruning_audit ( executed_at TIMESTAMPTZ DEFAULT NOW(), action TEXT, records_affected INTEGER, size_before_mb NUMERIC, size_after_mb NUMERIC ); -- Record size before INSERT INTO pruning_audit (action, size_before_mb) SELECT 'before_pruning', pg_total_relation_size('telemetry_events')::numeric / 1024 / 1024; -- Delete old events (keep last 7 days) WITH deleted AS ( DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '7 days' RETURNING * ) INSERT INTO pruning_audit (action, records_affected) SELECT 'delete_events_7d', COUNT(*) FROM deleted; -- Delete old simple workflows (keep last 7 days) WITH deleted AS ( DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '7 days' AND complexity = 'simple' RETURNING * ) INSERT INTO pruning_audit (action, records_affected) SELECT 'delete_workflows_simple_7d', COUNT(*) FROM deleted; -- Record size after UPDATE pruning_audit SET size_after_mb = pg_total_relation_size('telemetry_events')::numeric / 1024 / 1024 WHERE action = 'before_pruning'; COMMIT; -- Cleanup dead space VACUUM FULL telemetry_events; VACUUM FULL telemetry_workflows; -- Verify results SELECT * FROM pruning_audit ORDER BY executed_at DESC LIMIT 5; SELECT * FROM check_database_size(); ``` ### Script 2: Create Retention Policy (Run After Testing) ```sql -- ============================================ -- AUTOMATED RETENTION POLICY -- Schedule: Daily at 2 AM UTC -- ============================================ CREATE OR REPLACE FUNCTION apply_retention_policy() RETURNS TABLE( action TEXT, records_deleted INTEGER, execution_time_ms INTEGER ) AS $$ DECLARE start_time TIMESTAMPTZ; end_time TIMESTAMPTZ; deleted_count INTEGER; BEGIN -- Tier 4: 7-day retention (high volume, low long-term value) start_time := clock_timestamp(); DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '7 days' AND event IN ('tool_sequence', 'tool_used', 'session_start', 'workflow_validation_failed', 'search_query'); GET DIAGNOSTICS deleted_count = ROW_COUNT; end_time := clock_timestamp(); action := 'delete_tier4_7d'; records_deleted := deleted_count; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; -- Tier 3: 14-day retention (medium value) start_time := clock_timestamp(); DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '14 days' AND event IN ('validation_details', 'workflow_created'); GET DIAGNOSTICS deleted_count = ROW_COUNT; end_time := clock_timestamp(); action := 'delete_tier3_14d'; records_deleted := deleted_count; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; -- Tier 1: 30-day retention (errors - keep longer) start_time := clock_timestamp(); DELETE FROM telemetry_events WHERE created_at < NOW() - INTERVAL '30 days' AND event = 'error_occurred'; GET DIAGNOSTICS deleted_count = ROW_COUNT; end_time := clock_timestamp(); action := 'delete_errors_30d'; records_deleted := deleted_count; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; -- Workflow pruning by complexity start_time := clock_timestamp(); DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '7 days' AND complexity = 'simple'; GET DIAGNOSTICS deleted_count = ROW_COUNT; end_time := clock_timestamp(); action := 'delete_workflows_simple_7d'; records_deleted := deleted_count; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; start_time := clock_timestamp(); DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '14 days' AND complexity = 'medium'; GET DIAGNOSTICS deleted_count = ROW_COUNT; end_time := clock_timestamp(); action := 'delete_workflows_medium_14d'; records_deleted := deleted_count; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; start_time := clock_timestamp(); DELETE FROM telemetry_workflows WHERE created_at < NOW() - INTERVAL '30 days' AND complexity = 'complex'; GET DIAGNOSTICS deleted_count = ROW_COUNT; end_time := clock_timestamp(); action := 'delete_workflows_complex_30d'; records_deleted := deleted_count; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; -- Vacuum to reclaim space start_time := clock_timestamp(); VACUUM telemetry_events; VACUUM telemetry_workflows; end_time := clock_timestamp(); action := 'vacuum_tables'; records_deleted := 0; execution_time_ms := EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER; RETURN NEXT; END; $$ LANGUAGE plpgsql; -- Test the function (dry run - won't schedule yet) SELECT * FROM apply_retention_policy(); -- After testing, schedule with pg_cron -- Requires pg_cron extension: CREATE EXTENSION IF NOT EXISTS pg_cron; -- SELECT cron.schedule('retention-policy', '0 2 * * *', 'SELECT apply_retention_policy()'); ``` ### Script 3: Create Monitoring Dashboard ```sql -- ============================================ -- MONITORING QUERIES -- Run these regularly to track database health -- ============================================ -- Query 1: Current database size and projections SELECT 'Current Size' as metric, pg_size_pretty(SUM(pg_total_relation_size(schemaname||'.'||relname))) as value FROM pg_stat_user_tables WHERE schemaname = 'public' UNION ALL SELECT 'Free Tier Limit' as metric, '500 MB' as value UNION ALL SELECT 'Percent Used' as metric, CONCAT( ROUND( (SUM(pg_total_relation_size(schemaname||'.'||relname))::numeric / (500.0 * 1024 * 1024) * 100), 2 ), '%' ) as value FROM pg_stat_user_tables WHERE schemaname = 'public'; -- Query 2: Data age distribution SELECT event, COUNT(*) as total_records, MIN(created_at) as oldest_record, MAX(created_at) as newest_record, ROUND(EXTRACT(EPOCH FROM (MAX(created_at) - MIN(created_at))) / 86400, 2) as age_days FROM telemetry_events GROUP BY event ORDER BY total_records DESC; -- Query 3: Daily growth tracking (last 7 days) SELECT DATE(created_at) as date, COUNT(*) as daily_events, pg_size_pretty(SUM(pg_column_size(properties))::bigint) as daily_data_size, COUNT(DISTINCT user_id) as active_users FROM telemetry_events WHERE created_at >= NOW() - INTERVAL '7 days' GROUP BY DATE(created_at) ORDER BY date DESC; -- Query 4: Retention policy effectiveness SELECT DATE(executed_at) as execution_date, action, records_deleted, execution_time_ms FROM ( SELECT * FROM apply_retention_policy() ) AS policy_run ORDER BY execution_date DESC; ``` --- ## Conclusion **Immediate Action Required:** Implement Strategy B (7-day retention) immediately to avoid database overflow within 2 weeks. **Long-Term Strategy:** Transition to Strategy C (Hybrid Tiered Retention) with automated aggregation to balance data preservation with storage constraints. **Expected Outcomes:** - Immediate: 50+ MB saved (26% reduction) - Ongoing: Database stabilized at 200-220 MB (40-44% of limit) - Buffer: 30-40 days before limit with current growth rate - Risk: Low with proper testing and monitoring **Success Metrics:** 1. Database size < 300 MB consistently 2. 7+ days of detailed event data always available 3. No impact on product analytics capabilities 4. Automated retention policy runs daily without errors --- **Analysis completed:** 2025-10-10 **Next review date:** 2025-11-10 (monthly check) **Escalation:** If database exceeds 400 MB, consider upgrading to paid tier or implementing more aggressive pruning