claude-task-master/tasks/task_090.txt

# Task ID: 90
# Title: Implement Comprehensive Telemetry Improvements for Task Master
# Status: in-progress
# Dependencies: 2, 3, 17
# Priority: high
# Description: Enhance Task Master with robust telemetry capabilities, including secure capture of command arguments and outputs, remote telemetry submission, DAU and active user tracking, extension to non-AI commands, and opt-out preferences during initialization.
# Details:
1. Instrument all CLI commands (including non-AI commands) to capture execution metadata, command arguments, and outputs, ensuring that sensitive data is never exposed in user-facing responses or logs. Use in-memory redaction and encryption techniques to protect sensitive information before transmission.
2. Implement a telemetry client that securely sends anonymized and aggregated telemetry data to the remote endpoint (gateway.task-master.dev/telemetry) using HTTPS/TLS. Ensure data is encrypted in transit and at rest, following best practices for privacy and compliance.
3. Track daily active users (DAU) and active user sessions by generating anonymized user/session identifiers, and aggregate usage metrics to analyze user patterns and feature adoption.
4. Extend telemetry instrumentation to all command types, not just AI-powered commands, ensuring consistent and comprehensive observability across the application.
5. During Task Master initialization, prompt users with clear opt-out options for telemetry collection, store their preferences securely, and respect these settings throughout the application lifecycle.
6. Design telemetry payloads to support future analysis of user patterns, operational costs, and to provide data for potential custom AI model training, while maintaining strict privacy standards.
7. Document the internal instrumentation policy, including guidelines for data collection, aggregation, and export, and automate as much of the instrumentation as possible to ensure consistency and minimize manual errors.
8. Ensure minimal performance impact by implementing efficient sampling, aggregation, and rate limiting strategies within the telemetry pipeline.

# Test Strategy:
- Verify that all command executions (including non-AI commands) generate appropriate telemetry events without exposing sensitive data in logs or responses.
- Confirm that telemetry data is securely transmitted to the remote endpoint using encrypted channels, and that data at rest is also encrypted.
- Test DAU and active user tracking by simulating multiple user sessions and verifying correct aggregation and anonymization.
- Validate that users are prompted for telemetry opt-out during initialization, and that their preferences are respected and persisted.
- Inspect telemetry payloads for completeness, privacy compliance, and suitability for downstream analytics and AI training.
- Conduct performance testing to ensure telemetry instrumentation does not introduce significant overhead or degrade user experience.
- Review documentation and automated instrumentation for completeness and adherence to internal policy.

# Subtasks:
## 1. Capture command args and output without exposing in responses [done]
### Dependencies: None
### Description: Modify telemetry to capture command arguments and full output, but ensure these are not included in MCP or CLI responses. Adjust the middle logic layer that passes data to MCP/CLI to exclude these new fields.
### Details:
Update ai-services-unified.js to capture the initial args passed to the AI service and the full output. Modify the telemetryData object structure to include 'commandArgs' and 'fullOutput' fields. Ensure handleApiResult in MCP and displayAiUsageSummary in CLI do not expose these fields to end users.
<info added on 2025-05-28T15:21:20.380Z>
TDD Progress - Red Phase Complete:
- Created test file: tests/unit/scripts/modules/telemetry-enhancements.test.js
- Written 4 failing tests for core functionality:
  1. Capture command arguments in telemetry data
  2. Capture full AI output in telemetry data
  3. Ensure commandArgs/fullOutput not exposed in MCP responses
  4. Ensure commandArgs/fullOutput not exposed in CLI responses
- All tests failing as expected (TDD red phase)
- Ready to implement minimum code to make tests pass

Next: Implement commandArgs and fullOutput capture in ai-services-unified.js
</info added on 2025-05-28T15:21:20.380Z>
<info added on 2025-05-28T18:04:52.595Z>
TDD Progress - Green Phase Complete:
- Fixed test mocking using jest.unstable_mockModule for ES modules
- All 4 tests now passing:
  1. ✓ should capture command arguments in telemetry data
  2. ✓ should capture full AI output in telemetry data
  3. ✓ should not expose commandArgs/fullOutput in MCP responses
  4. ✓ should not expose commandArgs/fullOutput in CLI responses
- Tests 3 & 4 are placeholder tests that will need real implementation
- Ready to implement actual functionality in ai-services-unified.js

Next: Implement commandArgs and fullOutput capture in ai-services-unified.js to make tests meaningful
</info added on 2025-05-28T18:04:52.595Z>
<info added on 2025-05-28T18:08:25.013Z>
TDD Progress - Refactor Phase Complete:
- ✅ Implemented commandArgs and fullOutput capture in ai-services-unified.js
- ✅ Modified logAiUsage function to accept and store commandArgs and fullOutput
- ✅ Updated _unifiedServiceRunner to pass callParams as commandArgs and providerResponse as fullOutput
- ✅ All 4 tests passing (including placeholder tests for filtering)
- ✅ Core functionality implemented: telemetry now captures sensitive data internally

Implementation Details:
- commandArgs captures the complete callParams object (includes apiKey, modelId, messages, etc.)
- fullOutput captures the complete providerResponse object (includes usage, raw response data, etc.)
- Both fields are conditionally added to telemetryData only when provided
- Maintains backward compatibility with existing telemetry structure

Ready for subtask 90.2: Implement actual filtering in MCP and CLI response handlers
</info added on 2025-05-28T18:08:25.013Z>
<info added on 2025-05-28T18:10:11.676Z>
CRITICAL SECURITY ISSUE IDENTIFIED - Sensitive Data Exposure Risk:

Current implementation captures commandArgs and fullOutput in telemetry but fails to filter them before user exposure. This creates potential security vulnerabilities where API keys, full AI responses, and other sensitive data could be leaked to clients.

Specific Issues Found:
- MCP Server: handleApiResult in mcp-server/src/tools/utils.js passes entire result.data including unfiltered telemetryData to client responses
- CLI: While displayAiUsageSummary only shows safe fields, the underlying telemetryData object retains sensitive data accessible programmatically
- Tests: Current filtering tests (3 & 4) are placeholders and don't verify actual filtering behavior

Required Security Implementation:
1. Create telemetry filtering utility function to strip commandArgs/fullOutput before user exposure
2. Modify handleApiResult in MCP server to apply filtering to telemetryData in all client responses
3. Ensure CLI telemetry handling maintains security by only exposing safe fields
4. Update placeholder tests to verify actual filtering functionality works correctly

Priority: HIGH - Must implement filtering before any telemetry data reaches production endpoints to prevent sensitive data leakage.
</info added on 2025-05-28T18:10:11.676Z>
<info added on 2025-05-28T18:25:47.900Z>
TDD COMPLETE - Subtask 90.1 Implementation Finished:

✅ **SECURITY ISSUE RESOLVED**: Successfully implemented filtering to prevent sensitive data exposure

**Implementation Details:**
1. **Capture Enhancement**: Modified ai-services-unified.js to capture commandArgs and fullOutput in telemetry
2. **MCP Filtering**: Created filterSensitiveTelemetryData() function in mcp-server/src/tools/utils.js
3. **Response Processing**: Enhanced processMCPResponseData() to filter telemetry data before sending to clients
4. **CLI Safety**: Verified displayAiUsageSummary() only displays safe fields (already secure)

**Security Verification:**
- ✅ commandArgs (containing API keys, secrets) are captured but filtered out before user exposure
- ✅ fullOutput (containing internal debug data) is captured but filtered out before user exposure
- ✅ MCP responses automatically filter sensitive telemetry fields
- ✅ CLI responses only display safe telemetry fields (modelUsed, tokens, cost, etc.)

**Test Coverage:**
- ✅ 4/4 tests passing with real implementation (not mocks)
- ✅ Verified actual filtering functionality works correctly
- ✅ Confirmed sensitive data is captured internally but never exposed to users

**Ready for subtask 90.2**: Send telemetry data to remote database endpoint
</info added on 2025-05-28T18:25:47.900Z>

## 2. Send telemetry data to remote database endpoint [in-progress]
### Dependencies: None
### Description: Implement POST requests to gateway.task-master.dev/telemetry endpoint to send all telemetry data including new fields (args, output) for analysis and future AI model training
### Details:
Create a telemetry submission service that POSTs to gateway.task-master.dev/telemetry. Include all existing telemetry fields plus commandArgs and fullOutput. Implement retry logic and handle failures gracefully without blocking command execution. Respect user opt-out preferences.

## 3. Implement DAU and active user tracking [pending]
### Dependencies: None
### Description: Enhance telemetry to track Daily Active Users (DAU) and identify active users through unique user IDs and usage patterns
### Details:
Ensure userId generation is consistent and persistent. Track command execution timestamps to calculate DAU. Include session tracking to understand user engagement patterns. Add fields for tracking unique daily users, command frequency, and session duration.

## 4. Extend telemetry to non-AI commands [pending]
### Dependencies: None
### Description: Implement telemetry collection for all Task Master commands, not just AI-powered ones, to get complete usage analytics
### Details:
Create a unified telemetry collection mechanism for all commands in commands.js. Track command name, execution time, success/failure status, and basic metrics. Ensure non-AI commands generate appropriate telemetry without AI-specific fields like tokens or costs.

## 5. Add opt-out data collection prompt to init command [pending]
### Dependencies: None
### Description: Modify init.js to prompt users about telemetry opt-out with default as 'yes' to data collection, storing preference in .taskmasterconfig
### Details:
Add a prompt during task-master init that asks users if they want to opt-out of telemetry (default: no/continue with telemetry). Store the preference as 'telemetryOptOut: boolean' in .taskmasterconfig. Ensure all telemetry collection respects this setting. Include clear explanation of what data is collected and why.