claude-task-master/context/chats/max-min-tokens.txt.md

We want to refine how Task Master handles AI model token limits to be more precise, by:

1.  Distinguishing between `maxInputTokens` and `maxOutputTokens` in the configuration.
2.  Dynamically adjusting the `maxOutputTokens` for an API call based on the actual prompt length to stay within the model's total context window (or respecting separate input/output limits if the API and model support that).
3.  Ensuring that `ai-services-unified.js` uses these more granular limits.

This is a good improvement for both cost control and preventing errors.

Here's a plan:

**Phase 1: Configuration and Core Logic Updates**

1.  **Update `.taskmasterconfig` Structure:**
    - I'll modify the `models` section in your `.taskmasterconfig`. For each role (`main`, `research`, `fallback`), `maxTokens` will be replaced with `maxInputTokens` and `maxOutputTokens`.
      - We'll need to decide on reasonable default values for these new fields. We can look at the current `maxTokens` and the model's known limits to make an initial guess.
2.  **Update `MODEL_MAP` in `ai-services-unified.js`:**
    - This array already stores cost data. We need to ensure it also stores the _absolute_ maximum input and output tokens for each model listed (e.g., `model_max_input_tokens`, `model_max_output_tokens`). If these fields are not present, they will need to be added. The values in `.taskmasterconfig` will then represent user-defined operational limits, which should ideally be validated against these absolute maximums.
3.  **Update `config-manager.js`:**
    - Getter functions like `getParametersForRole` will be updated to fetch `maxInputTokens` and `maxOutputTokens` instead of the singular `maxTokens`.
    - New getters might be needed if we want to access the model's absolute limits directly from `MODEL_MAP` via `config-manager.js`.
4.  **Update `ai-services-unified.js` (`_unifiedServiceRunner`):**
    - **Token Counting:** This is a crucial step. Before an API call, we need to estimate the token count of the combined `systemPrompt` and `userPrompt`.
      - The Vercel AI SDK or the individual provider SDKs might offer utilities for this. For example, some SDKs expose a `tokenizer` or a way to count tokens for a given string.
      - If a direct utility isn't available through the Vercel SDK for the specific provider, we might need to use a library like `tiktoken` for OpenAI/Anthropic models or investigate provider-specific tokenization. This could be complex as tokenization varies between models.
      - For now, let's assume we can get a reasonable estimate.
    - **Dynamic Output Token Calculation & Validation:**
      - Retrieve `configured_max_input_tokens` and `configured_max_output_tokens` from `config-manager.js` for the current role.
      - Retrieve `model_absolute_max_input_tokens` and `model_absolute_max_output_tokens` from `MODEL_MAP`.
      - Let `prompt_tokens = countTokens(systemPrompt + userPrompt)`.
      - **Input Validation:** If `prompt_tokens > configured_max_input_tokens` OR `prompt_tokens > model_absolute_max_input_tokens`, throw an error. (We might also consider auto-truncation strategies in the future, but an error is safer to start).
      - **Output Calculation:** The `max_tokens` parameter sent to the API (which typically refers to the _max number of tokens to generate_) should be `configured_max_output_tokens`. We also need to ensure this doesn't violate the model's absolute output limit.
      - Some models have a _total_ context window (input + output). If this is the case for a model, the `max_tokens` parameter for the API call might need to be `min(configured_max_output_tokens, model_absolute_total_tokens - prompt_tokens)`. However, many modern APIs handle separate input and output limits, simplifying this. We'll aim to pass `configured_max_output_tokens` as the API's `max_tokens` parameter, assuming the input validation has passed.
    - **API Call Parameter:** Update the `callParams` to use the correctly determined `max_tokens` for generation (which will be our `configured_max_output_tokens`).

**Phase 2: Error Handling and Refinements**

1.  **Enhanced Error Handling:** Provide clear error messages if prompt tokens exceed input limits, or if API calls fail due to token issues despite our calculations.
2.  **Validation:** Add validation (perhaps in `config-manager.js` or when `task-master models --setup` is run) to ensure `configured_max_input_tokens` and `configured_max_output_tokens` in `.taskmasterconfig` do not exceed the `model_absolute_max_input_tokens` and `model_absolute_max_output_tokens` from `MODEL_MAP`.

Let's start by inspecting `scripts/modules/config-manager.js` to see how parameters and `MODEL_MAP` are currently handled. This will inform the changes needed for `.taskmasterconfig` and `ai-services-unified.js`.

I will now read `scripts/modules/config-manager.js`.

Okay, I've reviewed `scripts/modules/config-manager.js`. Here's what I've gathered and how it impacts the plan:

- **`MODEL_MAP` Source:** `MODEL_MAP` is loaded from `supported-models.json`. This is excellent, as it means we can update the model capabilities (including new token limits) in a structured JSON file.
- **Configuration Loading:** `_loadAndValidateConfig` handles loading `.taskmasterconfig` and merging it with `DEFAULTS`.
- **Getters:** There are specific getters for `maxTokens` for each role (e.g., `getMainMaxTokens`). We will need to update these.
- **`getParametersForRole`:** This function already exists and is a good candidate for returning the new `maxInputTokens` and `maxOutputTokens`.

Here's the refined plan:

**Phase 1: Configuration and Data Structures**

1.  **Update `supported-models.json`:**

    - For each model entry in `supported-models.json`, we need to add `maxInputTokens` and `maxOutputTokens` fields. These should reflect the _absolute maximums_ for that specific model.
    - Example entry change:

      ```json
      // Before
      {
          "id": "claude-3-7-sonnet-20250219",
          "name": "Claude 3.7 Sonnet (Preview)",
          "context_window": 200000, // This might be total context
          "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
      }

      // After (assuming 200k is total, and we make a reasonable split or find official numbers)
      {
          "id": "claude-3-7-sonnet-20250219",
          "name": "Claude 3.7 Sonnet (Preview)",
          "maxInputTokens": 190000, // Example
          "maxOutputTokens": 8192,  // Example, often smaller for generation
          "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
      }
      ```

    - I will need to find the official input/output token limits for the models currently in your `MODEL_MAP`. If you have this information handy, it would speed things up. Otherwise, I can make educated guesses or search for them.

2.  **Update `.taskmasterconfig` (Defaults and User File):**
    - In `scripts/modules/config-manager.js`, modify the `DEFAULTS` object. For each role (`main`, `research`, `fallback`), replace `maxTokens` with:
      - `maxInputTokens`: A sensible default (e.g., a large portion of the model's capability, but user-configurable).
      - `maxOutputTokens`: A sensible default for generation (e.g., 4096 or 8192).
    - You will then need to manually update your existing `.taskmasterconfig` file to reflect this new structure. I can provide the snippet for you to paste.
3.  **Update `config-manager.js`:**
    - Modify `getParametersForRole(role, explicitRoot = null)`:
      - It currently fetches `maxTokens` and `temperature`.
      - Update it to fetch `maxInputTokens`, `maxOutputTokens`, and `temperature` from the loaded config for the given role.
    - Remove the role-specific `getMaxTokens` functions (e.g., `getMainMaxTokens`, `getResearchMaxTokens`, `getFallbackMaxTokens`). The `getParametersForRole` will be the central way to get these.
    - (Optional, for later validation) Consider adding a new function `getModelCapabilities(providerName, modelId)` that reads from `MODEL_MAP` to return the absolute `maxInputTokens` and `maxOutputTokens` for a given model. This would be useful for validating the user's settings in `.taskmasterconfig`.

**Phase 2: Core Logic in `ai-services-unified.js`**

1.  **Token Counting (`_unifiedServiceRunner`):**

    - This is the most complex part. We need a reliable way to count tokens for the prompts.
      - **Strategy 1 (Ideal):** Leverage Vercel AI SDK. The SDK might provide a way to get a tokenizer for the active model or a utility function. We'll need to investigate its capabilities.
      - **Strategy 2 (Fallback):** Use a library like `tiktoken` for models compatible with OpenAI's tokenization (many are, including some Anthropic models). For other models, we might need provider-specific tokenizers or make estimations (less ideal).
      - **Initial Approach:** Let's try to find a Vercel AI SDK utility first. If not, we'll start with `tiktoken` as a common case and acknowledge that other models might need specific handling later.
    - The function `_unifiedServiceRunner` will call this token counting utility:

      ```javascript
      // Placeholder for token counting
      function countTokens(text, modelId /* or providerName */) {
      	// ... implementation using SDK or library ...
      	// This is a simplification; actual tokenization is model-specific.
      	// For now, a rough estimate might be (text.length / 4) as a placeholder
      	// if a proper tokenizer isn't immediately available.
      	if (!text) return 0;
      	return Math.ceil(text.length / 3.5); // Very rough general estimate
      }

      const promptTokens = countTokens(systemPrompt) + countTokens(prompt);
      ```

2.  **Dynamic Output Token Calculation & Validation (`_unifiedServiceRunner`):**

    - Fetch parameters using the updated `getParametersForRole` from `config-manager.js`:
      ```javascript
      const roleParams = getParametersForRole(
      	currentRole,
      	effectiveProjectRoot
      );
      // roleParams will now contain { maxInputTokens, maxOutputTokens, temperature }
      ```
    - Fetch model's absolute limits from `MODEL_MAP` (we'll need a helper in `config-manager.js` or do it directly in `ai-services-unified.js` for now):
      ```javascript
      // Simplified; ideally use a robust getter from config-manager
      const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find(
      	(m) => m.id === modelId
      );
      const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity;
      const modelAbsoluteMaxOutput =
      	modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map
      ```
    - **Input Validation:**
      ```javascript
      if (promptTokens > roleParams.maxInputTokens) {
      	throw new Error(
      		`Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.`
      	);
      }
      if (promptTokens > modelAbsoluteMaxInput) {
      	throw new Error(
      		`Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.`
      	);
      }
      ```
    - **API Call Parameter (`max_tokens` for generation):**
      The `max_tokens` parameter passed to the provider's API call should be `roleParams.maxOutputTokens`. We must also ensure this doesn't exceed the model's absolute output capability.

      ```javascript
      const apiMaxOutputTokens = Math.min(
      	roleParams.maxOutputTokens,
      	modelAbsoluteMaxOutput
      );

      const callParams = {
      	apiKey,
      	modelId,
      	// Use apiMaxOutputTokens for the actual API call
      	maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE
      	temperature: roleParams.temperature,
      	messages,
      	baseUrl,
      	...(serviceType === 'generateObject' && { schema, objectName }),
      	...restApiParams
      };
      ```

3.  **Update `logAiUsage`:**
    - The cost calculation in `logAiUsage` seems correct as it already uses `inputCost` and `outputCost` per 1M tokens. This part should remain compatible.

**Next Steps:**

1.  **Modify `supported-models.json`:** I need the content of this file to propose the new `maxInputTokens` and `maxOutputTokens` fields. Please provide it.
2.  **Decide on Tokenizer Strategy:** What are your thoughts on the token counting? Are you aware of any utilities within Vercel AI SDK for this, or should we plan to integrate `tiktoken` initially?

Once I have the content of `supported-models.json`, I can draft the changes for it and then proceed to `config-manager.js` and `.taskmasterconfig`.

```json
        // Before
        {
            "id": "claude-3-7-sonnet-20250219",
            "name": "Claude 3.7 Sonnet (Preview)",
            "context_window": 200000, // This might be total context
            "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
        }

        // After (assuming 200k is total, and we make a reasonable split or find official numbers)
        {
            "id": "claude-3-7-sonnet-20250219",
            "name": "Claude 3.7 Sonnet (Preview)",
            "maxInputTokens": 190000, // Example
            "maxOutputTokens": 8192,  // Example, often smaller for generation
            "cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
        }
```

```javascript
// Placeholder for token counting
function countTokens(text, modelId /* or providerName */) {
	// ... implementation using SDK or library ...
	// This is a simplification; actual tokenization is model-specific.
	// For now, a rough estimate might be (text.length / 4) as a placeholder
	// if a proper tokenizer isn't immediately available.
	if (!text) return 0;
	return Math.ceil(text.length / 3.5); // Very rough general estimate
}

const promptTokens = countTokens(systemPrompt) + countTokens(prompt);
```

```javascript
const roleParams = getParametersForRole(currentRole, effectiveProjectRoot);
// roleParams will now contain { maxInputTokens, maxOutputTokens, temperature }
```

```javascript
// Simplified; ideally use a robust getter from config-manager
const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find(
	(m) => m.id === modelId
);
const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity;
const modelAbsoluteMaxOutput =
	modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map
```

```javascript
if (promptTokens > roleParams.maxInputTokens) {
	throw new Error(
		`Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.`
	);
}
if (promptTokens > modelAbsoluteMaxInput) {
	throw new Error(
		`Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.`
	);
}
```

```javascript
const apiMaxOutputTokens = Math.min(
	roleParams.maxOutputTokens,
	modelAbsoluteMaxOutput
);

const callParams = {
	apiKey,
	modelId,
	// Use apiMaxOutputTokens for the actual API call
	maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE
	temperature: roleParams.temperature,
	messages,
	baseUrl,
	...(serviceType === 'generateObject' && { schema, objectName }),
	...restApiParams
};
```

Okay, I have the contents of `supported-models.json`.

It seems many models have a `max_tokens` field, which I assume is the _total context window_. Some don't have it, and for others (like OpenAI `gpt-4o`), the listed `max_tokens` (16384) is known to be its _output_ token limit when using the Chat Completions API, while the context window is much larger (128k). This highlights the complexity: `max_tokens` in `supported-models.json` is used inconsistently.

**Revised Plan for `supported-models.json` and Token Definitions:**

To bring clarity, we'll introduce two new fields and ensure their meaning is consistent:

- `contextWindowTokens`: The total number of tokens the model can process (input + output). This would replace the current ambiguous `max_tokens`.
- `maxOutputTokens`: The maximum number of tokens the model can _generate_ in a single response. This is often smaller than the total context window, especially for larger models.

If a model _only_ specifies a total context window, we'll have to make a reasonable assumption for `maxOutputTokens` (e.g., 4096 or 8192, or a fraction of the total context window). If it only specifies an output token limit (like some OpenAI models in certain API modes), we'll need to find its total context window.

**Updated `supported-models.json` Structure (Example):**

```json
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
    "id": "claude-3-7-sonnet-20250219",
    "swe_score": 0.623,
    "cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
    "allowed_roles": ["main", "fallback"],
    "contextWindowTokens": 200000, // From Anthropic's documentation
    "maxOutputTokens": 8192     // Anthropic default, user can override in .taskmasterconfig
}

// For a model like OpenAI GPT-4o
{
    "id": "gpt-4o",
    "swe_score": 0.332,
    "cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
    "allowed_roles": ["main", "fallback"],
    "contextWindowTokens": 128000, // Official context window
    "maxOutputTokens": 16384    // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
                                // but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
```

I will proceed to generate the updated `supported-models.json` content. This will be a significant change. I will make my best effort to find the correct `contextWindowTokens` and `maxOutputTokens` for each model. If official numbers are ambiguous or not readily available, I'll use sensible defaults and add a comment.

**Regarding Tokenizer Strategy:**

- **Vercel AI SDK:** The `ai` package (Vercel AI SDK) itself is a lightweight wrapper. Tokenization is usually handled by the underlying provider-specific SDKs (e.g., `@anthropic-ai/sdk`, `openai`). The Vercel SDK doesn't provide a universal tokenizer.
- **Provider SDKs:**
  - `@anthropic-ai/sdk`: Does not expose a public tokenizer directly. Anthropic recommends counting based on their guidelines (roughly 3.5 chars/token for English).
  - `openai` (for Node.js): The `gpt-3-encoder` or `tiktoken` libraries are commonly used. `tiktoken` is more current.
- **`tiktoken`:** This is a good general-purpose library for OpenAI models and those that use compatible tokenization (many models fine-tuned from Llama, etc., might be close). It supports various encodings.
- **Recommendation:**
  1.  Start by integrating `tiktoken`. We can select the encoding based on the provider or even specific model if needed (e.g., `cl100k_base` for gpt-4, gpt-3.5-turbo, DALL-E models).
  2.  For providers where `tiktoken` is not accurate (e.g., Google's Gemini, Perplexity if they have proprietary tokenizers), we might have to rely on character-based estimations (e.g., `length / N`) as a rougher guide or investigate if their specific SDKs offer token counting.
  3.  The Vercel AI SDK's `experimental_streamObject` and other functions _might_ internally handle some aspects of token counting for request preparation, but we need explicit counting for our logic _before_ the call.

For now, let's assume we'll add a `countTokens` utility that uses `tiktoken` for OpenAI and Anthropic (as Anthropic's guidance is character-based but `cl100k_base` is often a decent proxy for their tokenization for rough estimates, though not perfect), and a character-based fallback for others.

I will now generate the proposed changes for `scripts/modules/supported-models.json`. This will be a large diff.

After this, we'll move to:

1.  Updating `.taskmasterconfig` defaults in `config-manager.js` and providing you the snippet for your user file.
2.  Updating `config-manager.js` getters.
3.  Implementing the logic in `ai-services-unified.js`.

```json
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
    "id": "claude-3-7-sonnet-20250219",
    "swe_score": 0.623,
    "cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
    "allowed_roles": ["main", "fallback"],
    "contextWindowTokens": 200000, // From Anthropic's documentation
    "maxOutputTokens": 8192     // Anthropic default, user can override in .taskmasterconfig
}

// For a model like OpenAI GPT-4o
{
    "id": "gpt-4o",
    "swe_score": 0.332,
    "cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
    "allowed_roles": ["main", "fallback"],
    "contextWindowTokens": 128000, // Official context window
    "maxOutputTokens": 16384    // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
                                // but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
```

```json
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
    "id": "claude-3-7-sonnet-20250219",
    "swe_score": 0.623,
    "cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
    "allowed_roles": ["main", "fallback"],
    "contextWindowTokens": 200000, // From Anthropic's documentation
    "maxOutputTokens": 8192     // Anthropic default, user can override in .taskmasterconfig
}

// For a model like OpenAI GPT-4o
{
    "id": "gpt-4o",
    "swe_score": 0.332,
    "cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
    "allowed_roles": ["main", "fallback"],
    "contextWindowTokens": 128000, // Official context window
    "maxOutputTokens": 16384    // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
                                // but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
```