526 lines
19 KiB
Plaintext
526 lines
19 KiB
Plaintext
# Task ID: 81
|
|
# Title: Implement Separate Context Window and Output Token Limits
|
|
# Status: pending
|
|
# Dependencies: None
|
|
# Priority: high
|
|
# Description: Replace the ambiguous MAX_TOKENS configuration with separate contextWindowTokens and maxOutputTokens fields to properly handle model token limits and enable dynamic token allocation.
|
|
# Details:
|
|
Currently, the MAX_TOKENS configuration entry is ambiguous and doesn't properly differentiate between:
|
|
1. Context window tokens (total input + output capacity)
|
|
2. Maximum output tokens (generation limit)
|
|
|
|
This causes issues where:
|
|
- The system can't properly validate prompt lengths against model capabilities
|
|
- Output token allocation is not optimized based on input length
|
|
- Different models with different token architectures are handled inconsistently
|
|
|
|
This epic will implement a comprehensive solution that:
|
|
- Updates supported-models.json with accurate contextWindowTokens and maxOutputTokens for each model
|
|
- Modifies config-manager.js to use separate maxInputTokens and maxOutputTokens in role configurations
|
|
- Implements a token counting utility for accurate prompt measurement
|
|
- Updates ai-services-unified.js to dynamically calculate available output tokens
|
|
- Provides migration guidance and validation for existing configurations
|
|
- Adds comprehensive error handling and validation throughout the system
|
|
|
|
The end result will be more precise token management, better cost control, and reduced likelihood of hitting model context limits.
|
|
|
|
# Test Strategy:
|
|
1. Verify all models have accurate token limit data from official documentation
|
|
2. Test dynamic token allocation with various prompt lengths
|
|
3. Ensure backward compatibility with existing .taskmasterconfig files
|
|
4. Validate error messages are clear and actionable
|
|
5. Test with multiple AI providers to ensure consistent behavior
|
|
6. Performance test token counting utility with large prompts
|
|
|
|
# Subtasks:
|
|
## 1. Update supported-models.json with token limit fields [pending]
|
|
### Dependencies: None
|
|
### Description: Modify the supported-models.json file to include contextWindowTokens and maxOutputTokens fields for each model, replacing the ambiguous max_tokens field.
|
|
### Details:
|
|
For each model entry in supported-models.json:
|
|
1. Add `contextWindowTokens` field representing the total context window (input + output tokens)
|
|
2. Add `maxOutputTokens` field representing the maximum tokens the model can generate
|
|
3. Remove or deprecate the ambiguous `max_tokens` field if present
|
|
|
|
Research and populate accurate values for each model from official documentation:
|
|
- For OpenAI models (e.g., gpt-4o): contextWindowTokens=128000, maxOutputTokens=16384
|
|
- For Anthropic models (e.g., Claude 3.7): contextWindowTokens=200000, maxOutputTokens=8192
|
|
- For other providers, find official documentation or use reasonable defaults
|
|
|
|
Example entry:
|
|
```json
|
|
{
|
|
"id": "claude-3-7-sonnet-20250219",
|
|
"swe_score": 0.623,
|
|
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
|
|
"allowed_roles": ["main", "fallback"],
|
|
"contextWindowTokens": 200000,
|
|
"maxOutputTokens": 8192
|
|
}
|
|
```
|
|
|
|
## 2. Update config-manager.js defaults and getters [pending]
|
|
### Dependencies: None
|
|
### Description: Modify the config-manager.js module to replace maxTokens with maxInputTokens and maxOutputTokens in the DEFAULTS object and update related getter functions.
|
|
### Details:
|
|
1. Update the `DEFAULTS` object in config-manager.js:
|
|
```javascript
|
|
const DEFAULTS = {
|
|
// ... existing defaults
|
|
main: {
|
|
// Replace maxTokens with these two fields
|
|
maxInputTokens: 16000, // Example default
|
|
maxOutputTokens: 4000, // Example default
|
|
temperature: 0.7
|
|
// ... other fields
|
|
},
|
|
research: {
|
|
maxInputTokens: 16000,
|
|
maxOutputTokens: 4000,
|
|
temperature: 0.7
|
|
// ... other fields
|
|
},
|
|
fallback: {
|
|
maxInputTokens: 8000,
|
|
maxOutputTokens: 2000,
|
|
temperature: 0.7
|
|
// ... other fields
|
|
}
|
|
// ... rest of DEFAULTS
|
|
};
|
|
```
|
|
|
|
2. Update `getParametersForRole` function to return the new fields:
|
|
```javascript
|
|
function getParametersForRole(role, explicitRoot = null) {
|
|
const config = _getConfig(explicitRoot);
|
|
return {
|
|
maxInputTokens: config[role]?.maxInputTokens,
|
|
maxOutputTokens: config[role]?.maxOutputTokens,
|
|
temperature: config[role]?.temperature
|
|
// ... any other parameters
|
|
};
|
|
}
|
|
```
|
|
|
|
3. Add a new function to get model capabilities:
|
|
```javascript
|
|
function getModelCapabilities(providerName, modelId) {
|
|
const models = MODEL_MAP[providerName?.toLowerCase()];
|
|
const model = models?.find(m => m.id === modelId);
|
|
return {
|
|
contextWindowTokens: model?.contextWindowTokens,
|
|
maxOutputTokens: model?.maxOutputTokens
|
|
};
|
|
}
|
|
```
|
|
|
|
4. Deprecate or update the role-specific maxTokens getters:
|
|
```javascript
|
|
// Either remove these or update them to return maxInputTokens
|
|
function getMainMaxTokens(explicitRoot = null) {
|
|
console.warn('getMainMaxTokens is deprecated. Use getParametersForRole("main") instead.');
|
|
return getParametersForRole("main", explicitRoot).maxInputTokens;
|
|
}
|
|
// Same for getResearchMaxTokens and getFallbackMaxTokens
|
|
```
|
|
|
|
5. Export the new functions:
|
|
```javascript
|
|
module.exports = {
|
|
// ... existing exports
|
|
getParametersForRole,
|
|
getModelCapabilities
|
|
};
|
|
```
|
|
|
|
## 3. Implement token counting utility [pending]
|
|
### Dependencies: None
|
|
### Description: Create a utility function to count tokens for prompts based on the model being used, primarily using tiktoken for OpenAI and Anthropic models with character-based fallbacks for other providers.
|
|
### Details:
|
|
1. Install the tiktoken package:
|
|
```bash
|
|
npm install tiktoken
|
|
```
|
|
|
|
2. Create a new file `scripts/modules/token-counter.js`:
|
|
```javascript
|
|
const tiktoken = require('tiktoken');
|
|
|
|
/**
|
|
* Count tokens for a given text and model
|
|
* @param {string} text - The text to count tokens for
|
|
* @param {string} provider - The AI provider (e.g., 'openai', 'anthropic')
|
|
* @param {string} modelId - The model ID
|
|
* @returns {number} - Estimated token count
|
|
*/
|
|
function countTokens(text, provider, modelId) {
|
|
if (!text) return 0;
|
|
|
|
// Convert to lowercase for case-insensitive matching
|
|
const providerLower = provider?.toLowerCase();
|
|
|
|
try {
|
|
// OpenAI models
|
|
if (providerLower === 'openai') {
|
|
// Most OpenAI chat models use cl100k_base encoding
|
|
const encoding = tiktoken.encoding_for_model(modelId) || tiktoken.get_encoding('cl100k_base');
|
|
return encoding.encode(text).length;
|
|
}
|
|
|
|
// Anthropic models - can use cl100k_base as an approximation
|
|
// or follow Anthropic's guidance
|
|
if (providerLower === 'anthropic') {
|
|
try {
|
|
// Try to use cl100k_base as a reasonable approximation
|
|
const encoding = tiktoken.get_encoding('cl100k_base');
|
|
return encoding.encode(text).length;
|
|
} catch (e) {
|
|
// Fallback to Anthropic's character-based estimation
|
|
return Math.ceil(text.length / 3.5); // ~3.5 chars per token for English
|
|
}
|
|
}
|
|
|
|
// For other providers, use character-based estimation as fallback
|
|
// Different providers may have different tokenization schemes
|
|
return Math.ceil(text.length / 4); // General fallback estimate
|
|
} catch (error) {
|
|
console.warn(`Token counting error: ${error.message}. Using character-based estimate.`);
|
|
return Math.ceil(text.length / 4); // Fallback if tiktoken fails
|
|
}
|
|
}
|
|
|
|
module.exports = { countTokens };
|
|
```
|
|
|
|
3. Add tests for the token counter in `tests/token-counter.test.js`:
|
|
```javascript
|
|
const { countTokens } = require('../scripts/modules/token-counter');
|
|
|
|
describe('Token Counter', () => {
|
|
test('counts tokens for OpenAI models', () => {
|
|
const text = 'Hello, world! This is a test.';
|
|
const count = countTokens(text, 'openai', 'gpt-4');
|
|
expect(count).toBeGreaterThan(0);
|
|
expect(typeof count).toBe('number');
|
|
});
|
|
|
|
test('counts tokens for Anthropic models', () => {
|
|
const text = 'Hello, world! This is a test.';
|
|
const count = countTokens(text, 'anthropic', 'claude-3-7-sonnet-20250219');
|
|
expect(count).toBeGreaterThan(0);
|
|
expect(typeof count).toBe('number');
|
|
});
|
|
|
|
test('handles empty text', () => {
|
|
expect(countTokens('', 'openai', 'gpt-4')).toBe(0);
|
|
expect(countTokens(null, 'openai', 'gpt-4')).toBe(0);
|
|
});
|
|
});
|
|
```
|
|
|
|
## 4. Update ai-services-unified.js for dynamic token limits [pending]
|
|
### Dependencies: None
|
|
### Description: Modify the _unifiedServiceRunner function in ai-services-unified.js to use the new token counting utility and dynamically adjust output token limits based on input length.
|
|
### Details:
|
|
1. Import the token counter in `ai-services-unified.js`:
|
|
```javascript
|
|
const { countTokens } = require('./token-counter');
|
|
const { getParametersForRole, getModelCapabilities } = require('./config-manager');
|
|
```
|
|
|
|
2. Update the `_unifiedServiceRunner` function to implement dynamic token limit adjustment:
|
|
```javascript
|
|
async function _unifiedServiceRunner({
|
|
serviceType,
|
|
provider,
|
|
modelId,
|
|
systemPrompt,
|
|
prompt,
|
|
temperature,
|
|
currentRole,
|
|
effectiveProjectRoot,
|
|
// ... other parameters
|
|
}) {
|
|
// Get role parameters with new token limits
|
|
const roleParams = getParametersForRole(currentRole, effectiveProjectRoot);
|
|
|
|
// Get model capabilities
|
|
const modelCapabilities = getModelCapabilities(provider, modelId);
|
|
|
|
// Count tokens in the prompts
|
|
const systemPromptTokens = countTokens(systemPrompt, provider, modelId);
|
|
const userPromptTokens = countTokens(prompt, provider, modelId);
|
|
const totalPromptTokens = systemPromptTokens + userPromptTokens;
|
|
|
|
// Validate against input token limits
|
|
if (totalPromptTokens > roleParams.maxInputTokens) {
|
|
throw new Error(
|
|
`Prompt (${totalPromptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.`
|
|
);
|
|
}
|
|
|
|
// Validate against model's absolute context window
|
|
if (modelCapabilities.contextWindowTokens && totalPromptTokens > modelCapabilities.contextWindowTokens) {
|
|
throw new Error(
|
|
`Prompt (${totalPromptTokens} tokens) exceeds model's context window (${modelCapabilities.contextWindowTokens}) for ${modelId}.`
|
|
);
|
|
}
|
|
|
|
// Calculate available output tokens
|
|
// If model has a combined context window, we need to subtract input tokens
|
|
let availableOutputTokens = roleParams.maxOutputTokens;
|
|
|
|
// If model has a context window constraint, ensure we don't exceed it
|
|
if (modelCapabilities.contextWindowTokens) {
|
|
const remainingContextTokens = modelCapabilities.contextWindowTokens - totalPromptTokens;
|
|
availableOutputTokens = Math.min(availableOutputTokens, remainingContextTokens);
|
|
}
|
|
|
|
// Also respect the model's absolute max output limit
|
|
if (modelCapabilities.maxOutputTokens) {
|
|
availableOutputTokens = Math.min(availableOutputTokens, modelCapabilities.maxOutputTokens);
|
|
}
|
|
|
|
// Prepare API call parameters
|
|
const callParams = {
|
|
apiKey,
|
|
modelId,
|
|
maxTokens: availableOutputTokens, // Use dynamically calculated output limit
|
|
temperature: roleParams.temperature,
|
|
messages,
|
|
baseUrl,
|
|
...(serviceType === 'generateObject' && { schema, objectName }),
|
|
...restApiParams
|
|
};
|
|
|
|
// Log token usage information
|
|
console.debug(`Token usage: ${totalPromptTokens} input tokens, ${availableOutputTokens} max output tokens`);
|
|
|
|
// Rest of the function remains the same...
|
|
}
|
|
```
|
|
|
|
3. Update the error handling to provide clear messages about token limits:
|
|
```javascript
|
|
try {
|
|
// Existing code...
|
|
} catch (error) {
|
|
if (error.message.includes('tokens')) {
|
|
// Token-related errors should be clearly identified
|
|
console.error(`Token limit error: ${error.message}`);
|
|
}
|
|
throw error;
|
|
}
|
|
```
|
|
|
|
## 5. Update .taskmasterconfig schema and user guide [pending]
|
|
### Dependencies: None
|
|
### Description: Create a migration guide for users to update their .taskmasterconfig files and document the new token limit configuration options.
|
|
### Details:
|
|
1. Create a migration script or guide for users to update their existing `.taskmasterconfig` files:
|
|
|
|
```javascript
|
|
// Example migration snippet for .taskmasterconfig
|
|
{
|
|
"main": {
|
|
// Before:
|
|
// "maxTokens": 16000,
|
|
|
|
// After:
|
|
"maxInputTokens": 16000,
|
|
"maxOutputTokens": 4000,
|
|
"temperature": 0.7
|
|
},
|
|
"research": {
|
|
"maxInputTokens": 16000,
|
|
"maxOutputTokens": 4000,
|
|
"temperature": 0.7
|
|
},
|
|
"fallback": {
|
|
"maxInputTokens": 8000,
|
|
"maxOutputTokens": 2000,
|
|
"temperature": 0.7
|
|
}
|
|
}
|
|
```
|
|
|
|
2. Update the user documentation to explain the new token limit fields:
|
|
|
|
```markdown
|
|
# Token Limit Configuration
|
|
|
|
Task Master now provides more granular control over token limits with separate settings for input and output tokens:
|
|
|
|
- `maxInputTokens`: Maximum number of tokens allowed in the input prompt (system prompt + user prompt)
|
|
- `maxOutputTokens`: Maximum number of tokens the model should generate in its response
|
|
|
|
## Benefits
|
|
|
|
- More precise control over token usage
|
|
- Better cost management
|
|
- Reduced likelihood of hitting model context limits
|
|
- Dynamic adjustment to maximize output space based on input length
|
|
|
|
## Migration from Previous Versions
|
|
|
|
If you're upgrading from a previous version, you'll need to update your `.taskmasterconfig` file:
|
|
|
|
1. Replace the single `maxTokens` field with separate `maxInputTokens` and `maxOutputTokens` fields
|
|
2. Recommended starting values:
|
|
- Set `maxInputTokens` to your previous `maxTokens` value
|
|
- Set `maxOutputTokens` to approximately 1/4 of your model's context window
|
|
|
|
## Example Configuration
|
|
|
|
```json
|
|
{
|
|
"main": {
|
|
"maxInputTokens": 16000,
|
|
"maxOutputTokens": 4000,
|
|
"temperature": 0.7
|
|
}
|
|
}
|
|
```
|
|
```
|
|
|
|
3. Update the schema validation in `config-manager.js` to validate the new fields:
|
|
|
|
```javascript
|
|
function _validateConfig(config) {
|
|
// ... existing validation
|
|
|
|
// Validate token limits for each role
|
|
['main', 'research', 'fallback'].forEach(role => {
|
|
if (config[role]) {
|
|
// Check if old maxTokens is present and warn about migration
|
|
if (config[role].maxTokens !== undefined) {
|
|
console.warn(`Warning: 'maxTokens' in ${role} role is deprecated. Please use 'maxInputTokens' and 'maxOutputTokens' instead.`);
|
|
}
|
|
|
|
// Validate new token limit fields
|
|
if (config[role].maxInputTokens !== undefined && (!Number.isInteger(config[role].maxInputTokens) || config[role].maxInputTokens <= 0)) {
|
|
throw new Error(`Invalid maxInputTokens for ${role} role: must be a positive integer`);
|
|
}
|
|
|
|
if (config[role].maxOutputTokens !== undefined && (!Number.isInteger(config[role].maxOutputTokens) || config[role].maxOutputTokens <= 0)) {
|
|
throw new Error(`Invalid maxOutputTokens for ${role} role: must be a positive integer`);
|
|
}
|
|
}
|
|
});
|
|
|
|
return config;
|
|
}
|
|
```
|
|
|
|
## 6. Implement validation and error handling [pending]
|
|
### Dependencies: None
|
|
### Description: Add comprehensive validation and error handling for token limits throughout the system, including helpful error messages and graceful fallbacks.
|
|
### Details:
|
|
1. Add validation when loading models in `config-manager.js`:
|
|
```javascript
|
|
function _validateModelMap(modelMap) {
|
|
// Validate each provider's models
|
|
Object.entries(modelMap).forEach(([provider, models]) => {
|
|
models.forEach(model => {
|
|
// Check for required token limit fields
|
|
if (!model.contextWindowTokens) {
|
|
console.warn(`Warning: Model ${model.id} from ${provider} is missing contextWindowTokens field`);
|
|
}
|
|
if (!model.maxOutputTokens) {
|
|
console.warn(`Warning: Model ${model.id} from ${provider} is missing maxOutputTokens field`);
|
|
}
|
|
});
|
|
});
|
|
return modelMap;
|
|
}
|
|
```
|
|
|
|
2. Add validation when setting up a model in the CLI:
|
|
```javascript
|
|
function validateModelConfig(modelConfig, modelCapabilities) {
|
|
const issues = [];
|
|
|
|
// Check if input tokens exceed model's context window
|
|
if (modelConfig.maxInputTokens > modelCapabilities.contextWindowTokens) {
|
|
issues.push(`maxInputTokens (${modelConfig.maxInputTokens}) exceeds model's context window (${modelCapabilities.contextWindowTokens})`);
|
|
}
|
|
|
|
// Check if output tokens exceed model's maximum
|
|
if (modelConfig.maxOutputTokens > modelCapabilities.maxOutputTokens) {
|
|
issues.push(`maxOutputTokens (${modelConfig.maxOutputTokens}) exceeds model's maximum output tokens (${modelCapabilities.maxOutputTokens})`);
|
|
}
|
|
|
|
// Check if combined tokens exceed context window
|
|
if (modelConfig.maxInputTokens + modelConfig.maxOutputTokens > modelCapabilities.contextWindowTokens) {
|
|
issues.push(`Combined maxInputTokens and maxOutputTokens (${modelConfig.maxInputTokens + modelConfig.maxOutputTokens}) exceeds model's context window (${modelCapabilities.contextWindowTokens})`);
|
|
}
|
|
|
|
return issues;
|
|
}
|
|
```
|
|
|
|
3. Add graceful fallbacks in `ai-services-unified.js`:
|
|
```javascript
|
|
// Fallback for missing token limits
|
|
if (!roleParams.maxInputTokens) {
|
|
console.warn(`Warning: maxInputTokens not specified for role '${currentRole}'. Using default value.`);
|
|
roleParams.maxInputTokens = 8000; // Reasonable default
|
|
}
|
|
|
|
if (!roleParams.maxOutputTokens) {
|
|
console.warn(`Warning: maxOutputTokens not specified for role '${currentRole}'. Using default value.`);
|
|
roleParams.maxOutputTokens = 2000; // Reasonable default
|
|
}
|
|
|
|
// Fallback for missing model capabilities
|
|
if (!modelCapabilities.contextWindowTokens) {
|
|
console.warn(`Warning: contextWindowTokens not specified for model ${modelId}. Using conservative estimate.`);
|
|
modelCapabilities.contextWindowTokens = roleParams.maxInputTokens + roleParams.maxOutputTokens;
|
|
}
|
|
|
|
if (!modelCapabilities.maxOutputTokens) {
|
|
console.warn(`Warning: maxOutputTokens not specified for model ${modelId}. Using role configuration.`);
|
|
modelCapabilities.maxOutputTokens = roleParams.maxOutputTokens;
|
|
}
|
|
```
|
|
|
|
4. Add detailed logging for token usage:
|
|
```javascript
|
|
function logTokenUsage(provider, modelId, inputTokens, outputTokens, role) {
|
|
const inputCost = calculateTokenCost(provider, modelId, 'input', inputTokens);
|
|
const outputCost = calculateTokenCost(provider, modelId, 'output', outputTokens);
|
|
|
|
console.info(`Token usage for ${role} role with ${provider}/${modelId}:`);
|
|
console.info(`- Input: ${inputTokens.toLocaleString()} tokens ($${inputCost.toFixed(6)})`);
|
|
console.info(`- Output: ${outputTokens.toLocaleString()} tokens ($${outputCost.toFixed(6)})`);
|
|
console.info(`- Total cost: $${(inputCost + outputCost).toFixed(6)}`);
|
|
console.info(`- Available output tokens: ${availableOutputTokens.toLocaleString()}`);
|
|
}
|
|
```
|
|
|
|
5. Add a helper function to suggest configuration improvements:
|
|
```javascript
|
|
function suggestTokenConfigImprovements(roleParams, modelCapabilities, promptTokens) {
|
|
const suggestions = [];
|
|
|
|
// If prompt is using less than 50% of allowed input
|
|
if (promptTokens < roleParams.maxInputTokens * 0.5) {
|
|
suggestions.push(`Consider reducing maxInputTokens from ${roleParams.maxInputTokens} to save on potential costs`);
|
|
}
|
|
|
|
// If output tokens are very limited due to large input
|
|
const availableOutput = Math.min(
|
|
roleParams.maxOutputTokens,
|
|
modelCapabilities.contextWindowTokens - promptTokens
|
|
);
|
|
|
|
if (availableOutput < roleParams.maxOutputTokens * 0.5) {
|
|
suggestions.push(`Available output tokens (${availableOutput}) are significantly less than configured maxOutputTokens (${roleParams.maxOutputTokens}) due to large input`);
|
|
}
|
|
|
|
return suggestions;
|
|
}
|
|
```
|
|
|