21 KiB
We want to refine how Task Master handles AI model token limits to be more precise, by:
- Distinguishing between
maxInputTokensandmaxOutputTokensin the configuration. - Dynamically adjusting the
maxOutputTokensfor an API call based on the actual prompt length to stay within the model's total context window (or respecting separate input/output limits if the API and model support that). - Ensuring that
ai-services-unified.jsuses these more granular limits.
This is a good improvement for both cost control and preventing errors.
Here's a plan:
Phase 1: Configuration and Core Logic Updates
- Update
.taskmasterconfigStructure:- I'll modify the
modelssection in your.taskmasterconfig. For each role (main,research,fallback),maxTokenswill be replaced withmaxInputTokensandmaxOutputTokens.- We'll need to decide on reasonable default values for these new fields. We can look at the current
maxTokensand the model's known limits to make an initial guess.
- We'll need to decide on reasonable default values for these new fields. We can look at the current
- I'll modify the
- Update
MODEL_MAPinai-services-unified.js:- This array already stores cost data. We need to ensure it also stores the absolute maximum input and output tokens for each model listed (e.g.,
model_max_input_tokens,model_max_output_tokens). If these fields are not present, they will need to be added. The values in.taskmasterconfigwill then represent user-defined operational limits, which should ideally be validated against these absolute maximums.
- This array already stores cost data. We need to ensure it also stores the absolute maximum input and output tokens for each model listed (e.g.,
- Update
config-manager.js:- Getter functions like
getParametersForRolewill be updated to fetchmaxInputTokensandmaxOutputTokensinstead of the singularmaxTokens. - New getters might be needed if we want to access the model's absolute limits directly from
MODEL_MAPviaconfig-manager.js.
- Getter functions like
- Update
ai-services-unified.js(_unifiedServiceRunner):- Token Counting: This is a crucial step. Before an API call, we need to estimate the token count of the combined
systemPromptanduserPrompt.- The Vercel AI SDK or the individual provider SDKs might offer utilities for this. For example, some SDKs expose a
tokenizeror a way to count tokens for a given string. - If a direct utility isn't available through the Vercel SDK for the specific provider, we might need to use a library like
tiktokenfor OpenAI/Anthropic models or investigate provider-specific tokenization. This could be complex as tokenization varies between models. - For now, let's assume we can get a reasonable estimate.
- The Vercel AI SDK or the individual provider SDKs might offer utilities for this. For example, some SDKs expose a
- Dynamic Output Token Calculation & Validation:
- Retrieve
configured_max_input_tokensandconfigured_max_output_tokensfromconfig-manager.jsfor the current role. - Retrieve
model_absolute_max_input_tokensandmodel_absolute_max_output_tokensfromMODEL_MAP. - Let
prompt_tokens = countTokens(systemPrompt + userPrompt). - Input Validation: If
prompt_tokens > configured_max_input_tokensORprompt_tokens > model_absolute_max_input_tokens, throw an error. (We might also consider auto-truncation strategies in the future, but an error is safer to start). - Output Calculation: The
max_tokensparameter sent to the API (which typically refers to the max number of tokens to generate) should beconfigured_max_output_tokens. We also need to ensure this doesn't violate the model's absolute output limit. - Some models have a total context window (input + output). If this is the case for a model, the
max_tokensparameter for the API call might need to bemin(configured_max_output_tokens, model_absolute_total_tokens - prompt_tokens). However, many modern APIs handle separate input and output limits, simplifying this. We'll aim to passconfigured_max_output_tokensas the API'smax_tokensparameter, assuming the input validation has passed.
- Retrieve
- API Call Parameter: Update the
callParamsto use the correctly determinedmax_tokensfor generation (which will be ourconfigured_max_output_tokens).
- Token Counting: This is a crucial step. Before an API call, we need to estimate the token count of the combined
Phase 2: Error Handling and Refinements
- Enhanced Error Handling: Provide clear error messages if prompt tokens exceed input limits, or if API calls fail due to token issues despite our calculations.
- Validation: Add validation (perhaps in
config-manager.jsor whentask-master models --setupis run) to ensureconfigured_max_input_tokensandconfigured_max_output_tokensin.taskmasterconfigdo not exceed themodel_absolute_max_input_tokensandmodel_absolute_max_output_tokensfromMODEL_MAP.
Let's start by inspecting scripts/modules/config-manager.js to see how parameters and MODEL_MAP are currently handled. This will inform the changes needed for .taskmasterconfig and ai-services-unified.js.
I will now read scripts/modules/config-manager.js.
Okay, I've reviewed scripts/modules/config-manager.js. Here's what I've gathered and how it impacts the plan:
MODEL_MAPSource:MODEL_MAPis loaded fromsupported-models.json. This is excellent, as it means we can update the model capabilities (including new token limits) in a structured JSON file.- Configuration Loading:
_loadAndValidateConfighandles loading.taskmasterconfigand merging it withDEFAULTS. - Getters: There are specific getters for
maxTokensfor each role (e.g.,getMainMaxTokens). We will need to update these. getParametersForRole: This function already exists and is a good candidate for returning the newmaxInputTokensandmaxOutputTokens.
Here's the refined plan:
Phase 1: Configuration and Data Structures
-
Update
supported-models.json:-
For each model entry in
supported-models.json, we need to addmaxInputTokensandmaxOutputTokensfields. These should reflect the absolute maximums for that specific model. -
Example entry change:
// Before { "id": "claude-3-7-sonnet-20250219", "name": "Claude 3.7 Sonnet (Preview)", "context_window": 200000, // This might be total context "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" } } // After (assuming 200k is total, and we make a reasonable split or find official numbers) { "id": "claude-3-7-sonnet-20250219", "name": "Claude 3.7 Sonnet (Preview)", "maxInputTokens": 190000, // Example "maxOutputTokens": 8192, // Example, often smaller for generation "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" } } -
I will need to find the official input/output token limits for the models currently in your
MODEL_MAP. If you have this information handy, it would speed things up. Otherwise, I can make educated guesses or search for them.
-
-
Update
.taskmasterconfig(Defaults and User File):- In
scripts/modules/config-manager.js, modify theDEFAULTSobject. For each role (main,research,fallback), replacemaxTokenswith:maxInputTokens: A sensible default (e.g., a large portion of the model's capability, but user-configurable).maxOutputTokens: A sensible default for generation (e.g., 4096 or 8192).
- You will then need to manually update your existing
.taskmasterconfigfile to reflect this new structure. I can provide the snippet for you to paste.
- In
-
Update
config-manager.js:- Modify
getParametersForRole(role, explicitRoot = null):- It currently fetches
maxTokensandtemperature. - Update it to fetch
maxInputTokens,maxOutputTokens, andtemperaturefrom the loaded config for the given role.
- It currently fetches
- Remove the role-specific
getMaxTokensfunctions (e.g.,getMainMaxTokens,getResearchMaxTokens,getFallbackMaxTokens). ThegetParametersForRolewill be the central way to get these. - (Optional, for later validation) Consider adding a new function
getModelCapabilities(providerName, modelId)that reads fromMODEL_MAPto return the absolutemaxInputTokensandmaxOutputTokensfor a given model. This would be useful for validating the user's settings in.taskmasterconfig.
- Modify
Phase 2: Core Logic in ai-services-unified.js
-
Token Counting (
_unifiedServiceRunner):-
This is the most complex part. We need a reliable way to count tokens for the prompts.
- Strategy 1 (Ideal): Leverage Vercel AI SDK. The SDK might provide a way to get a tokenizer for the active model or a utility function. We'll need to investigate its capabilities.
- Strategy 2 (Fallback): Use a library like
tiktokenfor models compatible with OpenAI's tokenization (many are, including some Anthropic models). For other models, we might need provider-specific tokenizers or make estimations (less ideal). - Initial Approach: Let's try to find a Vercel AI SDK utility first. If not, we'll start with
tiktokenas a common case and acknowledge that other models might need specific handling later.
-
The function
_unifiedServiceRunnerwill call this token counting utility:// Placeholder for token counting function countTokens(text, modelId /* or providerName */) { // ... implementation using SDK or library ... // This is a simplification; actual tokenization is model-specific. // For now, a rough estimate might be (text.length / 4) as a placeholder // if a proper tokenizer isn't immediately available. if (!text) return 0; return Math.ceil(text.length / 3.5); // Very rough general estimate } const promptTokens = countTokens(systemPrompt) + countTokens(prompt);
-
-
Dynamic Output Token Calculation & Validation (
_unifiedServiceRunner):-
Fetch parameters using the updated
getParametersForRolefromconfig-manager.js:const roleParams = getParametersForRole( currentRole, effectiveProjectRoot ); // roleParams will now contain { maxInputTokens, maxOutputTokens, temperature } -
Fetch model's absolute limits from
MODEL_MAP(we'll need a helper inconfig-manager.jsor do it directly inai-services-unified.jsfor now):// Simplified; ideally use a robust getter from config-manager const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find( (m) => m.id === modelId ); const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity; const modelAbsoluteMaxOutput = modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map -
Input Validation:
if (promptTokens > roleParams.maxInputTokens) { throw new Error( `Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.` ); } if (promptTokens > modelAbsoluteMaxInput) { throw new Error( `Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.` ); } -
API Call Parameter (
max_tokensfor generation): Themax_tokensparameter passed to the provider's API call should beroleParams.maxOutputTokens. We must also ensure this doesn't exceed the model's absolute output capability.const apiMaxOutputTokens = Math.min( roleParams.maxOutputTokens, modelAbsoluteMaxOutput ); const callParams = { apiKey, modelId, // Use apiMaxOutputTokens for the actual API call maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE temperature: roleParams.temperature, messages, baseUrl, ...(serviceType === 'generateObject' && { schema, objectName }), ...restApiParams };
-
-
Update
logAiUsage:- The cost calculation in
logAiUsageseems correct as it already usesinputCostandoutputCostper 1M tokens. This part should remain compatible.
- The cost calculation in
Next Steps:
- Modify
supported-models.json: I need the content of this file to propose the newmaxInputTokensandmaxOutputTokensfields. Please provide it. - Decide on Tokenizer Strategy: What are your thoughts on the token counting? Are you aware of any utilities within Vercel AI SDK for this, or should we plan to integrate
tiktokeninitially?
Once I have the content of supported-models.json, I can draft the changes for it and then proceed to config-manager.js and .taskmasterconfig.
// Before
{
"id": "claude-3-7-sonnet-20250219",
"name": "Claude 3.7 Sonnet (Preview)",
"context_window": 200000, // This might be total context
"cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
}
// After (assuming 200k is total, and we make a reasonable split or find official numbers)
{
"id": "claude-3-7-sonnet-20250219",
"name": "Claude 3.7 Sonnet (Preview)",
"maxInputTokens": 190000, // Example
"maxOutputTokens": 8192, // Example, often smaller for generation
"cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
}
// Placeholder for token counting
function countTokens(text, modelId /* or providerName */) {
// ... implementation using SDK or library ...
// This is a simplification; actual tokenization is model-specific.
// For now, a rough estimate might be (text.length / 4) as a placeholder
// if a proper tokenizer isn't immediately available.
if (!text) return 0;
return Math.ceil(text.length / 3.5); // Very rough general estimate
}
const promptTokens = countTokens(systemPrompt) + countTokens(prompt);
const roleParams = getParametersForRole(currentRole, effectiveProjectRoot);
// roleParams will now contain { maxInputTokens, maxOutputTokens, temperature }
// Simplified; ideally use a robust getter from config-manager
const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find(
(m) => m.id === modelId
);
const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity;
const modelAbsoluteMaxOutput =
modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map
if (promptTokens > roleParams.maxInputTokens) {
throw new Error(
`Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.`
);
}
if (promptTokens > modelAbsoluteMaxInput) {
throw new Error(
`Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.`
);
}
const apiMaxOutputTokens = Math.min(
roleParams.maxOutputTokens,
modelAbsoluteMaxOutput
);
const callParams = {
apiKey,
modelId,
// Use apiMaxOutputTokens for the actual API call
maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE
temperature: roleParams.temperature,
messages,
baseUrl,
...(serviceType === 'generateObject' && { schema, objectName }),
...restApiParams
};
Okay, I have the contents of supported-models.json.
It seems many models have a max_tokens field, which I assume is the total context window. Some don't have it, and for others (like OpenAI gpt-4o), the listed max_tokens (16384) is known to be its output token limit when using the Chat Completions API, while the context window is much larger (128k). This highlights the complexity: max_tokens in supported-models.json is used inconsistently.
Revised Plan for supported-models.json and Token Definitions:
To bring clarity, we'll introduce two new fields and ensure their meaning is consistent:
contextWindowTokens: The total number of tokens the model can process (input + output). This would replace the current ambiguousmax_tokens.maxOutputTokens: The maximum number of tokens the model can generate in a single response. This is often smaller than the total context window, especially for larger models.
If a model only specifies a total context window, we'll have to make a reasonable assumption for maxOutputTokens (e.g., 4096 or 8192, or a fraction of the total context window). If it only specifies an output token limit (like some OpenAI models in certain API modes), we'll need to find its total context window.
Updated supported-models.json Structure (Example):
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
"id": "claude-3-7-sonnet-20250219",
"swe_score": 0.623,
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 200000, // From Anthropic's documentation
"maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig
}
// For a model like OpenAI GPT-4o
{
"id": "gpt-4o",
"swe_score": 0.332,
"cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 128000, // Official context window
"maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
// but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
I will proceed to generate the updated supported-models.json content. This will be a significant change. I will make my best effort to find the correct contextWindowTokens and maxOutputTokens for each model. If official numbers are ambiguous or not readily available, I'll use sensible defaults and add a comment.
Regarding Tokenizer Strategy:
- Vercel AI SDK: The
aipackage (Vercel AI SDK) itself is a lightweight wrapper. Tokenization is usually handled by the underlying provider-specific SDKs (e.g.,@anthropic-ai/sdk,openai). The Vercel SDK doesn't provide a universal tokenizer. - Provider SDKs:
@anthropic-ai/sdk: Does not expose a public tokenizer directly. Anthropic recommends counting based on their guidelines (roughly 3.5 chars/token for English).openai(for Node.js): Thegpt-3-encoderortiktokenlibraries are commonly used.tiktokenis more current.
tiktoken: This is a good general-purpose library for OpenAI models and those that use compatible tokenization (many models fine-tuned from Llama, etc., might be close). It supports various encodings.- Recommendation:
- Start by integrating
tiktoken. We can select the encoding based on the provider or even specific model if needed (e.g.,cl100k_basefor gpt-4, gpt-3.5-turbo, DALL-E models). - For providers where
tiktokenis not accurate (e.g., Google's Gemini, Perplexity if they have proprietary tokenizers), we might have to rely on character-based estimations (e.g.,length / N) as a rougher guide or investigate if their specific SDKs offer token counting. - The Vercel AI SDK's
experimental_streamObjectand other functions might internally handle some aspects of token counting for request preparation, but we need explicit counting for our logic before the call.
- Start by integrating
For now, let's assume we'll add a countTokens utility that uses tiktoken for OpenAI and Anthropic (as Anthropic's guidance is character-based but cl100k_base is often a decent proxy for their tokenization for rough estimates, though not perfect), and a character-based fallback for others.
I will now generate the proposed changes for scripts/modules/supported-models.json. This will be a large diff.
After this, we'll move to:
- Updating
.taskmasterconfigdefaults inconfig-manager.jsand providing you the snippet for your user file. - Updating
config-manager.jsgetters. - Implementing the logic in
ai-services-unified.js.
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
"id": "claude-3-7-sonnet-20250219",
"swe_score": 0.623,
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 200000, // From Anthropic's documentation
"maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig
}
// For a model like OpenAI GPT-4o
{
"id": "gpt-4o",
"swe_score": 0.332,
"cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 128000, // Official context window
"maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
// but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
"id": "claude-3-7-sonnet-20250219",
"swe_score": 0.623,
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 200000, // From Anthropic's documentation
"maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig
}
// For a model like OpenAI GPT-4o
{
"id": "gpt-4o",
"swe_score": 0.332,
"cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 128000, // Official context window
"maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
// but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}