We want to refine how Task Master handles AI model token limits to be more precise, by: 1. Distinguishing between `maxInputTokens` and `maxOutputTokens` in the configuration. 2. Dynamically adjusting the `maxOutputTokens` for an API call based on the actual prompt length to stay within the model's total context window (or respecting separate input/output limits if the API and model support that). 3. Ensuring that `ai-services-unified.js` uses these more granular limits. This is a good improvement for both cost control and preventing errors. Here's a plan: **Phase 1: Configuration and Core Logic Updates** 1. **Update `.taskmasterconfig` Structure:** - I'll modify the `models` section in your `.taskmasterconfig`. For each role (`main`, `research`, `fallback`), `maxTokens` will be replaced with `maxInputTokens` and `maxOutputTokens`. - We'll need to decide on reasonable default values for these new fields. We can look at the current `maxTokens` and the model's known limits to make an initial guess. 2. **Update `MODEL_MAP` in `ai-services-unified.js`:** - This array already stores cost data. We need to ensure it also stores the _absolute_ maximum input and output tokens for each model listed (e.g., `model_max_input_tokens`, `model_max_output_tokens`). If these fields are not present, they will need to be added. The values in `.taskmasterconfig` will then represent user-defined operational limits, which should ideally be validated against these absolute maximums. 3. **Update `config-manager.js`:** - Getter functions like `getParametersForRole` will be updated to fetch `maxInputTokens` and `maxOutputTokens` instead of the singular `maxTokens`. - New getters might be needed if we want to access the model's absolute limits directly from `MODEL_MAP` via `config-manager.js`. 4. **Update `ai-services-unified.js` (`_unifiedServiceRunner`):** - **Token Counting:** This is a crucial step. Before an API call, we need to estimate the token count of the combined `systemPrompt` and `userPrompt`. - The Vercel AI SDK or the individual provider SDKs might offer utilities for this. For example, some SDKs expose a `tokenizer` or a way to count tokens for a given string. - If a direct utility isn't available through the Vercel SDK for the specific provider, we might need to use a library like `tiktoken` for OpenAI/Anthropic models or investigate provider-specific tokenization. This could be complex as tokenization varies between models. - For now, let's assume we can get a reasonable estimate. - **Dynamic Output Token Calculation & Validation:** - Retrieve `configured_max_input_tokens` and `configured_max_output_tokens` from `config-manager.js` for the current role. - Retrieve `model_absolute_max_input_tokens` and `model_absolute_max_output_tokens` from `MODEL_MAP`. - Let `prompt_tokens = countTokens(systemPrompt + userPrompt)`. - **Input Validation:** If `prompt_tokens > configured_max_input_tokens` OR `prompt_tokens > model_absolute_max_input_tokens`, throw an error. (We might also consider auto-truncation strategies in the future, but an error is safer to start). - **Output Calculation:** The `max_tokens` parameter sent to the API (which typically refers to the _max number of tokens to generate_) should be `configured_max_output_tokens`. We also need to ensure this doesn't violate the model's absolute output limit. - Some models have a _total_ context window (input + output). If this is the case for a model, the `max_tokens` parameter for the API call might need to be `min(configured_max_output_tokens, model_absolute_total_tokens - prompt_tokens)`. However, many modern APIs handle separate input and output limits, simplifying this. We'll aim to pass `configured_max_output_tokens` as the API's `max_tokens` parameter, assuming the input validation has passed. - **API Call Parameter:** Update the `callParams` to use the correctly determined `max_tokens` for generation (which will be our `configured_max_output_tokens`). **Phase 2: Error Handling and Refinements** 1. **Enhanced Error Handling:** Provide clear error messages if prompt tokens exceed input limits, or if API calls fail due to token issues despite our calculations. 2. **Validation:** Add validation (perhaps in `config-manager.js` or when `task-master models --setup` is run) to ensure `configured_max_input_tokens` and `configured_max_output_tokens` in `.taskmasterconfig` do not exceed the `model_absolute_max_input_tokens` and `model_absolute_max_output_tokens` from `MODEL_MAP`. Let's start by inspecting `scripts/modules/config-manager.js` to see how parameters and `MODEL_MAP` are currently handled. This will inform the changes needed for `.taskmasterconfig` and `ai-services-unified.js`. I will now read `scripts/modules/config-manager.js`. Okay, I've reviewed `scripts/modules/config-manager.js`. Here's what I've gathered and how it impacts the plan: - **`MODEL_MAP` Source:** `MODEL_MAP` is loaded from `supported-models.json`. This is excellent, as it means we can update the model capabilities (including new token limits) in a structured JSON file. - **Configuration Loading:** `_loadAndValidateConfig` handles loading `.taskmasterconfig` and merging it with `DEFAULTS`. - **Getters:** There are specific getters for `maxTokens` for each role (e.g., `getMainMaxTokens`). We will need to update these. - **`getParametersForRole`:** This function already exists and is a good candidate for returning the new `maxInputTokens` and `maxOutputTokens`. Here's the refined plan: **Phase 1: Configuration and Data Structures** 1. **Update `supported-models.json`:** - For each model entry in `supported-models.json`, we need to add `maxInputTokens` and `maxOutputTokens` fields. These should reflect the _absolute maximums_ for that specific model. - Example entry change: ```json // Before { "id": "claude-3-7-sonnet-20250219", "name": "Claude 3.7 Sonnet (Preview)", "context_window": 200000, // This might be total context "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" } } // After (assuming 200k is total, and we make a reasonable split or find official numbers) { "id": "claude-3-7-sonnet-20250219", "name": "Claude 3.7 Sonnet (Preview)", "maxInputTokens": 190000, // Example "maxOutputTokens": 8192, // Example, often smaller for generation "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" } } ``` - I will need to find the official input/output token limits for the models currently in your `MODEL_MAP`. If you have this information handy, it would speed things up. Otherwise, I can make educated guesses or search for them. 2. **Update `.taskmasterconfig` (Defaults and User File):** - In `scripts/modules/config-manager.js`, modify the `DEFAULTS` object. For each role (`main`, `research`, `fallback`), replace `maxTokens` with: - `maxInputTokens`: A sensible default (e.g., a large portion of the model's capability, but user-configurable). - `maxOutputTokens`: A sensible default for generation (e.g., 4096 or 8192). - You will then need to manually update your existing `.taskmasterconfig` file to reflect this new structure. I can provide the snippet for you to paste. 3. **Update `config-manager.js`:** - Modify `getParametersForRole(role, explicitRoot = null)`: - It currently fetches `maxTokens` and `temperature`. - Update it to fetch `maxInputTokens`, `maxOutputTokens`, and `temperature` from the loaded config for the given role. - Remove the role-specific `getMaxTokens` functions (e.g., `getMainMaxTokens`, `getResearchMaxTokens`, `getFallbackMaxTokens`). The `getParametersForRole` will be the central way to get these. - (Optional, for later validation) Consider adding a new function `getModelCapabilities(providerName, modelId)` that reads from `MODEL_MAP` to return the absolute `maxInputTokens` and `maxOutputTokens` for a given model. This would be useful for validating the user's settings in `.taskmasterconfig`. **Phase 2: Core Logic in `ai-services-unified.js`** 1. **Token Counting (`_unifiedServiceRunner`):** - This is the most complex part. We need a reliable way to count tokens for the prompts. - **Strategy 1 (Ideal):** Leverage Vercel AI SDK. The SDK might provide a way to get a tokenizer for the active model or a utility function. We'll need to investigate its capabilities. - **Strategy 2 (Fallback):** Use a library like `tiktoken` for models compatible with OpenAI's tokenization (many are, including some Anthropic models). For other models, we might need provider-specific tokenizers or make estimations (less ideal). - **Initial Approach:** Let's try to find a Vercel AI SDK utility first. If not, we'll start with `tiktoken` as a common case and acknowledge that other models might need specific handling later. - The function `_unifiedServiceRunner` will call this token counting utility: ```javascript // Placeholder for token counting function countTokens(text, modelId /* or providerName */) { // ... implementation using SDK or library ... // This is a simplification; actual tokenization is model-specific. // For now, a rough estimate might be (text.length / 4) as a placeholder // if a proper tokenizer isn't immediately available. if (!text) return 0; return Math.ceil(text.length / 3.5); // Very rough general estimate } const promptTokens = countTokens(systemPrompt) + countTokens(prompt); ``` 2. **Dynamic Output Token Calculation & Validation (`_unifiedServiceRunner`):** - Fetch parameters using the updated `getParametersForRole` from `config-manager.js`: ```javascript const roleParams = getParametersForRole( currentRole, effectiveProjectRoot ); // roleParams will now contain { maxInputTokens, maxOutputTokens, temperature } ``` - Fetch model's absolute limits from `MODEL_MAP` (we'll need a helper in `config-manager.js` or do it directly in `ai-services-unified.js` for now): ```javascript // Simplified; ideally use a robust getter from config-manager const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find( (m) => m.id === modelId ); const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity; const modelAbsoluteMaxOutput = modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map ``` - **Input Validation:** ```javascript if (promptTokens > roleParams.maxInputTokens) { throw new Error( `Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.` ); } if (promptTokens > modelAbsoluteMaxInput) { throw new Error( `Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.` ); } ``` - **API Call Parameter (`max_tokens` for generation):** The `max_tokens` parameter passed to the provider's API call should be `roleParams.maxOutputTokens`. We must also ensure this doesn't exceed the model's absolute output capability. ```javascript const apiMaxOutputTokens = Math.min( roleParams.maxOutputTokens, modelAbsoluteMaxOutput ); const callParams = { apiKey, modelId, // Use apiMaxOutputTokens for the actual API call maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE temperature: roleParams.temperature, messages, baseUrl, ...(serviceType === 'generateObject' && { schema, objectName }), ...restApiParams }; ``` 3. **Update `logAiUsage`:** - The cost calculation in `logAiUsage` seems correct as it already uses `inputCost` and `outputCost` per 1M tokens. This part should remain compatible. **Next Steps:** 1. **Modify `supported-models.json`:** I need the content of this file to propose the new `maxInputTokens` and `maxOutputTokens` fields. Please provide it. 2. **Decide on Tokenizer Strategy:** What are your thoughts on the token counting? Are you aware of any utilities within Vercel AI SDK for this, or should we plan to integrate `tiktoken` initially? Once I have the content of `supported-models.json`, I can draft the changes for it and then proceed to `config-manager.js` and `.taskmasterconfig`. ```json // Before { "id": "claude-3-7-sonnet-20250219", "name": "Claude 3.7 Sonnet (Preview)", "context_window": 200000, // This might be total context "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" } } // After (assuming 200k is total, and we make a reasonable split or find official numbers) { "id": "claude-3-7-sonnet-20250219", "name": "Claude 3.7 Sonnet (Preview)", "maxInputTokens": 190000, // Example "maxOutputTokens": 8192, // Example, often smaller for generation "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" } } ``` ```javascript // Placeholder for token counting function countTokens(text, modelId /* or providerName */) { // ... implementation using SDK or library ... // This is a simplification; actual tokenization is model-specific. // For now, a rough estimate might be (text.length / 4) as a placeholder // if a proper tokenizer isn't immediately available. if (!text) return 0; return Math.ceil(text.length / 3.5); // Very rough general estimate } const promptTokens = countTokens(systemPrompt) + countTokens(prompt); ``` ```javascript const roleParams = getParametersForRole(currentRole, effectiveProjectRoot); // roleParams will now contain { maxInputTokens, maxOutputTokens, temperature } ``` ```javascript // Simplified; ideally use a robust getter from config-manager const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find( (m) => m.id === modelId ); const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity; const modelAbsoluteMaxOutput = modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map ``` ```javascript if (promptTokens > roleParams.maxInputTokens) { throw new Error( `Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.` ); } if (promptTokens > modelAbsoluteMaxInput) { throw new Error( `Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.` ); } ``` ```javascript const apiMaxOutputTokens = Math.min( roleParams.maxOutputTokens, modelAbsoluteMaxOutput ); const callParams = { apiKey, modelId, // Use apiMaxOutputTokens for the actual API call maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE temperature: roleParams.temperature, messages, baseUrl, ...(serviceType === 'generateObject' && { schema, objectName }), ...restApiParams }; ``` Okay, I have the contents of `supported-models.json`. It seems many models have a `max_tokens` field, which I assume is the _total context window_. Some don't have it, and for others (like OpenAI `gpt-4o`), the listed `max_tokens` (16384) is known to be its _output_ token limit when using the Chat Completions API, while the context window is much larger (128k). This highlights the complexity: `max_tokens` in `supported-models.json` is used inconsistently. **Revised Plan for `supported-models.json` and Token Definitions:** To bring clarity, we'll introduce two new fields and ensure their meaning is consistent: - `contextWindowTokens`: The total number of tokens the model can process (input + output). This would replace the current ambiguous `max_tokens`. - `maxOutputTokens`: The maximum number of tokens the model can _generate_ in a single response. This is often smaller than the total context window, especially for larger models. If a model _only_ specifies a total context window, we'll have to make a reasonable assumption for `maxOutputTokens` (e.g., 4096 or 8192, or a fraction of the total context window). If it only specifies an output token limit (like some OpenAI models in certain API modes), we'll need to find its total context window. **Updated `supported-models.json` Structure (Example):** ```json // For a model like Anthropic Claude 3.7 Sonnet (Preview) { "id": "claude-3-7-sonnet-20250219", "swe_score": 0.623, "cost_per_1m_tokens": { "input": 3.0, "output": 15.0 }, "allowed_roles": ["main", "fallback"], "contextWindowTokens": 200000, // From Anthropic's documentation "maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig } // For a model like OpenAI GPT-4o { "id": "gpt-4o", "swe_score": 0.332, "cost_per_1m_tokens": { "input": 2.5, "output": 10.0 }, "allowed_roles": ["main", "fallback"], "contextWindowTokens": 128000, // Official context window "maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4) // but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap. } ``` I will proceed to generate the updated `supported-models.json` content. This will be a significant change. I will make my best effort to find the correct `contextWindowTokens` and `maxOutputTokens` for each model. If official numbers are ambiguous or not readily available, I'll use sensible defaults and add a comment. **Regarding Tokenizer Strategy:** - **Vercel AI SDK:** The `ai` package (Vercel AI SDK) itself is a lightweight wrapper. Tokenization is usually handled by the underlying provider-specific SDKs (e.g., `@anthropic-ai/sdk`, `openai`). The Vercel SDK doesn't provide a universal tokenizer. - **Provider SDKs:** - `@anthropic-ai/sdk`: Does not expose a public tokenizer directly. Anthropic recommends counting based on their guidelines (roughly 3.5 chars/token for English). - `openai` (for Node.js): The `gpt-3-encoder` or `tiktoken` libraries are commonly used. `tiktoken` is more current. - **`tiktoken`:** This is a good general-purpose library for OpenAI models and those that use compatible tokenization (many models fine-tuned from Llama, etc., might be close). It supports various encodings. - **Recommendation:** 1. Start by integrating `tiktoken`. We can select the encoding based on the provider or even specific model if needed (e.g., `cl100k_base` for gpt-4, gpt-3.5-turbo, DALL-E models). 2. For providers where `tiktoken` is not accurate (e.g., Google's Gemini, Perplexity if they have proprietary tokenizers), we might have to rely on character-based estimations (e.g., `length / N`) as a rougher guide or investigate if their specific SDKs offer token counting. 3. The Vercel AI SDK's `experimental_streamObject` and other functions _might_ internally handle some aspects of token counting for request preparation, but we need explicit counting for our logic _before_ the call. For now, let's assume we'll add a `countTokens` utility that uses `tiktoken` for OpenAI and Anthropic (as Anthropic's guidance is character-based but `cl100k_base` is often a decent proxy for their tokenization for rough estimates, though not perfect), and a character-based fallback for others. I will now generate the proposed changes for `scripts/modules/supported-models.json`. This will be a large diff. After this, we'll move to: 1. Updating `.taskmasterconfig` defaults in `config-manager.js` and providing you the snippet for your user file. 2. Updating `config-manager.js` getters. 3. Implementing the logic in `ai-services-unified.js`. ```json // For a model like Anthropic Claude 3.7 Sonnet (Preview) { "id": "claude-3-7-sonnet-20250219", "swe_score": 0.623, "cost_per_1m_tokens": { "input": 3.0, "output": 15.0 }, "allowed_roles": ["main", "fallback"], "contextWindowTokens": 200000, // From Anthropic's documentation "maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig } // For a model like OpenAI GPT-4o { "id": "gpt-4o", "swe_score": 0.332, "cost_per_1m_tokens": { "input": 2.5, "output": 10.0 }, "allowed_roles": ["main", "fallback"], "contextWindowTokens": 128000, // Official context window "maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4) // but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap. } ``` ```json // For a model like Anthropic Claude 3.7 Sonnet (Preview) { "id": "claude-3-7-sonnet-20250219", "swe_score": 0.623, "cost_per_1m_tokens": { "input": 3.0, "output": 15.0 }, "allowed_roles": ["main", "fallback"], "contextWindowTokens": 200000, // From Anthropic's documentation "maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig } // For a model like OpenAI GPT-4o { "id": "gpt-4o", "swe_score": 0.332, "cost_per_1m_tokens": { "input": 2.5, "output": 10.0 }, "allowed_roles": ["main", "fallback"], "contextWindowTokens": 128000, // Official context window "maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4) // but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap. } ```