support glm-4.6 thinking

2026-01-30 06:12:06 +00:00 · 2025-11-11 23:32:53 +08:00
parent d98ab64d2d
commit ab03390e2f
4 changed files with 177 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -11,9 +11,10 @@
 > GLM CODING PLAN is a subscription service designed for AI coding, starting at just $3/month. It provides access to their flagship GLM-4.6 model across 10+ popular AI coding tools (Claude Code, Cline, Roo Code, etc.), offering developers top-tier, fast, and stable coding experiences.     
 > Get 10% OFF GLM CODING PLAN：https://z.ai/subscribe?ic=8JVLJQFSKB     

-
 > A powerful tool to route Claude Code requests to different models and customize any request.

+> [GLM-4.6 Supports Reasoning and Interleaved Thinking](blog/en/glm-4.6-supports-reasoning.md)
+
 ![](blog/images/claude-code.png)

 ## ✨ Features
@@ -509,6 +510,7 @@ This setup allows for interesting automations, like running tasks during off-pea

 - [Project Motivation and How It Works](blog/en/project-motivation-and-how-it-works.md)
 - [Maybe We Can Do More with the Router](blog/en/maybe-we-can-do-more-with-the-route.md)
+- [GLM-4.6 Supports Reasoning and Interleaved Thinking](blog/en/glm-4.6-supports-reasoning.md)

 ## ❤️ Support & Sponsoring

--- a/README_zh.md
+++ b/README_zh.md
@@ -12,6 +12,8 @@

 > 一款强大的工具，可将 Claude Code 请求路由到不同的模型，并自定义任何请求。

+> [GLM-4.6支持思考及思维链回传](blog/zh/GLM-4.6支持思考及思维链回传.md)
+
 ![](blog/images/claude-code.png)


--- a/blog/en/glm-4.6-supports-reasoning.md
+++ b/blog/en/glm-4.6-supports-reasoning.md
@@ -0,0 +1,88 @@
+# GLM-4.6 Supports Reasoning and Interleaved Thinking
+
+## Enabling Reasoning in Claude Code with GLM-4.6
+
+Starting from version 4.5, GLM has supported Claude Code. I’ve been following its progress closely, and many users have reported that reasoning could not be enabled within Claude Code. Recently, thanks to sponsorship from Zhipu, I decided to investigate this issue in depth. According to the [official documentation](https://docs.z.ai/api-reference/llm/chat-completion), the`/chat/completions` endpoint has reasoning enabled by default, but the model itself decides whether to think:
+
+```
+thinking.type enum<string> default:enabled
+
+Whether to enable the chain of thought(When enabled, GLM-4.6, GLM-4.5 and others will automatically determine whether to think, while GLM-4.5V will think compulsorily), default: enabled
+
+Available options: enabled, disabled 
+```
+
+However, within Claude Code, its heavy system prompt interference disrupts GLM’s internal reasoning judgment, causing the model to rarely think.
+Therefore, we need to explicitly guide the model to believe reasoning is required. Since claude-code-router functions as a proxy, the only feasible approach is modifying prompts or parameters.
+
+Initially, I tried completely removing Claude Code’s system prompt — and indeed, the model started reasoning — but that broke Claude Code’s workflow.
+So instead, I used prompt injection to clearly instruct the model to think step by step.
+
+
+```javascript
+// transformer.ts
+import { UnifiedChatRequest } from "../types/llm";
+import { Transformer } from "../types/transformer";
+
+export class ForceReasoningTransformer implements Transformer {
+  name = "forcereasoning";
+
+  async transformRequestIn(
+    request: UnifiedChatRequest
+  ): Promise<UnifiedChatRequest> {
+    const systemMessage = request.messages.find(
+      (item) => item.role === "system"
+    );
+    if (Array.isArray(systemMessage?.content)) {
+      systemMessage.content.push({
+        type: "text",
+        text: "You are an expert reasoning model.\nAlways think step by step before answering. Even if the problem seems simple, always write down your reasoning process explicitly.\nNever skip your chain of thought.\nUse the following output format:\n<reasoning_content>(Write your full detailed thinking here.)</reasoning_content>\n\nWrite your final conclusion here.",
+      });
+    }
+    const lastMessage = request.messages[request.messages.length - 1];
+    if (lastMessage.role === "user" && Array.isArray(lastMessage.content)) {
+      lastMessage.content.push({
+        type: "text",
+        text: "You are an expert reasoning model.\nAlways think step by step before answering. Even if the problem seems simple, always write down your reasoning process explicitly.\nNever skip your chain of thought.\nUse the following output format:\n<reasoning_content>(Write your full detailed thinking here.)</reasoning_content>\n\nWrite your final conclusion here.",
+      });
+    }
+    if (lastMessage.role === "tool") {
+      request.messages.push({
+        role: "user",
+        content: [
+          {
+            type: "text",
+            text: "You are an expert reasoning model.\nAlways think step by step before answering. Even if the problem seems simple, always write down your reasoning process explicitly.\nNever skip your chain of thought.\nUse the following output format:\n<reasoning_content>(Write your full detailed thinking here.)</reasoning_content>\n\nWrite your final conclusion here.",
+          },
+        ],
+      });
+    }
+    return request;
+  }
+}
+```
+
+Why use <reasoning_content> instead of the <think> tag? Two reasons:
+
+1. Using the <think> tag doesn’t effectively trigger reasoning — likely because the model was trained on data where <think> had special behavior.
+
+2. If we use <think>, the reasoning output is split into a separate field, which directly relates to the chain-of-thought feedback problem discussed below.
+
+## Chain-of-Thought Feedback
+Recently, Minimax released `Minimax-m2`, along with [an article](https://www.minimaxi.com/news/why-is-interleaved-thinking-important-for-m2) explaining interleaved thinking.
+While the idea isn’t entirely new, it’s a good opportunity to analyze it.
+
+Why do we need to interleaved thinking?
+Minimax’s article mentions that the Chat Completion API does not support passing reasoning content between requests.
+We know ChatGPT was the first to support reasoning, but OpenAI initially didn’t expose the chain of thought to users.
+Therefore, the Chat Completion API didn’t need to support it. Even the CoT field was first introduced by DeepSeek.
+
+Do we really need explicit CoT fields? What happens if we don’t have them? Will it affect reasoning?
+By inspecting [sglang’s source code](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/parser/reasoning_parser.py), we can see that reasoning content is naturally emitted in messages with specific markers.
+If we don’t split it out, the next-round conversation will naturally include it.
+Thus, the only reason we need interleaved thinking is because we separated the reasoning content from the normal messages.
+
+With fewer than 40 lines of code above, I implemented a simple exploration of enabling reasoning and chain-of-thought feedback for GLM-4.5/4.6.
+(It’s only simple because I haven’t implemented parsing logic yet — you could easily modify the transformer to split reasoning output on response and merge it back on request, improving Claude Code’s frontend display compatibility.)
+
+If you have better ideas, feel free to reach out — I’d love to discuss further.
--- a/blog/zh/GLM-4.6支持思考及思维链回传.md
+++ b/blog/zh/GLM-4.6支持思考及思维链回传.md
@@ -0,0 +1,83 @@
+# GLM-4.6支持思考及思维链回传
+
+## GLM-4.6在cluade code中启用思考
+GLM从4.5开始就对claude code进行了支持，我之前也一直在关注，很多用户反映在claude code中无法启用思考，刚好最近收到了来自智谱的赞助，就着手进行研究。
+
+首先根据[官方文档](https://docs.bigmodel.cn/api-reference/%E6%A8%A1%E5%9E%8B-api/%E5%AF%B9%E8%AF%9D%E8%A1%A5%E5%85%A8)，我们发现`/chat/completions`端点是默认启用思考的，但是是由模型判断是否需要进行思考
+
+```
+thinking object
+仅 GLM-4.5 及以上模型支持此参数配置. 控制大模型是否开启思维链。
+
+thinking.type enum<string> default:enabled
+是否开启思维链(当开启后 GLM-4.6 GLM-4.5 为模型自动判断是否思考，GLM-4.5V 为强制思考), 默认: enabled.
+
+Available options: enabled, disabled 
+```
+
+在claude code本身大量的提示词干扰下，会严重阻碍GLM模型本身的判断机制，导致模型很少进行思考。所以我们需要对模型进行引导，让模型认为需要进行思考。但是`claude-code-router`作为proxy，能做的只能是修改提示词/参数。
+
+在最开始，我尝试直接删除claude code的系统提示词，模型确实进行了思考，但是这样就无法驱动claude code。所以我们需要进行提示词注入，明确告知模型需要进行思考。
+
+```javascript
+// transformer.ts
+import { UnifiedChatRequest } from "../types/llm";
+import { Transformer } from "../types/transformer";
+
+export class ForceReasoningTransformer implements Transformer {
+  name = "forcereasoning";
+
+  async transformRequestIn(
+    request: UnifiedChatRequest
+  ): Promise<UnifiedChatRequest> {
+    const systemMessage = request.messages.find(
+      (item) => item.role === "system"
+    );
+    if (Array.isArray(systemMessage?.content)) {
+      systemMessage.content.push({
+        type: "text",
+        text: "You are an expert reasoning model. \nAlways think step by step before answering. Even if the problem seems simple, always write down your reasoning process explicitly. \nNever skip your chain of thought. \nUse the following output format:\n<reasoning_content>(Write your full detailed thinking here.)</reasoning_content>\n\nWrite your final conclusion here.",
+      });
+    }
+    const lastMessage = request.messages[request.messages.length - 1];
+    if (lastMessage.role === "user" && Array.isArray(lastMessage.content)) {
+      lastMessage.content.push({
+        type: "text",
+        text: "You are an expert reasoning model. \nAlways think step by step before answering. Even if the problem seems simple, always write down your reasoning process explicitly. \nNever skip your chain of thought. \nUse the following output format:\n<reasoning_content>(Write your full detailed thinking here.)</reasoning_content>\n\nWrite your final conclusion here.",
+      });
+    }
+    if (lastMessage.role === "tool") {
+      request.messages.push({
+        role: "user",
+        content: [
+          {
+            type: "text",
+            text: "You are an expert reasoning model. \nAlways think step by step before answering. Even if the problem seems simple, always write down your reasoning process explicitly. \nNever skip your chain of thought. \nUse the following output format:\n<reasoning_content>(Write your full detailed thinking here.)</reasoning_content>\n\nWrite your final conclusion here.",
+          },
+        ],
+      });
+    }
+    return request;
+  }
+}
+```
+
+至于为什么让模型将思考内容放入reasoning_content标签而不是think标签有两个原因：
+1. 直接使用think标签不能很好的激活思考，猜测是训练模型时以think标签作为数据集进行训练。
+2. 如果使用think标签，模型的推理内容会被拆分到单独的字段，这就涉及到我们接下来要说的思维链回传问题。
+
+
+## 思维链回传
+
+近期Minimax发布了Minimax-m2，与此同时，他们还发布了一篇[文章](https://www.minimaxi.com/news/why-is-interleaved-thinking-important-for-m2)介绍思维链回传。但是太阳底下无新鲜事，刚好借此来剖析一下。       
+1. 我们首先来看一下为什么需要回传思维链？     
+Minimax在文章中说的是Chat Completion API不支持在后续请求中传递推理内容。我们知道ChatGPT是最先支持推理的，但是OpenAI最初没有开放思维链给用户，所以对于Chat Completion API来讲并不需要支持思维链相关的东西。就连CoT的字段也是DeepSeek率先在Chat Completion API中加入的。
+
+2. 我们真的需要这些字段吗？
+如果没有这些字段会怎么样？会影响到模型的思考吗？可以查看一下[sglang的源码](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/parser/reasoning_parser.py)发现思维链的信息原本就会在消息中按照特定的标记进行输出，假如我们不对其进行拆分，正常情况下在下轮对话中会自然包含这些信息。所以需要思维链回传的原因就是我们对模型的思维链内容进行拆分。
+
+我用上面不到40行的代码完成了对GLM-4.5/6支持思考以及思维链回传的简单探索(单纯是因为没时间做拆分，完全可以在transformer中响应时先做拆分，请求时再进行合并，这样对cc前端的展示适配会更好)，如果你有什么更好的想法也欢迎与我联系。
+
+
+
+