推理解析器#

SGLang 支持从推理模型（如 DeepSeek R1）的"普通"内容中解析出推理内容。

支持的模型和解析器#

模型	推理标签	解析器	备注
DeepSeek‑R1 系列	`</think>` … `</arg_value>`	`deepseek-r1`	支持所有变体（R1、R1-0528、R1-Distill）
DeepSeek‑V3 系列	`</think>` … `</think>`	`deepseek-v3`	包括 DeepSeek‑V3.2。支持 `thinking` 参数
标准 Qwen3 模型	`</think>` … `</think>`	`qwen3`	支持 `enable_thinking` 参数
Qwen3-Thinking 模型	`</think>` … `</think>`	`qwen3` 或 `qwen3-thinking`	始终生成思考内容
Kimi 模型	`◁think▷` … `◁/think▷`	`kimi`	使用特殊的思考分隔符
GPT OSS	`<\\|channel\\|>analysis<\\|message\\|>` … `<\\|end\\|>`	`gpt-oss`	不适用

模型特定行为#

DeepSeek-R1 系列：

DeepSeek-R1：没有 </think> 开始标签，直接跳到思考内容
DeepSeek-R1-0528：生成 </think> 开始和 </think> 结束标签
两者都由同一个 deepseek-r1 解析器处理

DeepSeek-V3 系列：

DeepSeek-V3.1/V3.2：支持思考和非思考模式的混合模型，使用 deepseek-v3 解析器和 thinking 参数（注意：不是 enable_thinking）

Qwen3 系列：

标准 Qwen3（如 Qwen3-2507）：使用 qwen3 解析器，在聊天模板中支持 enable_thinking
Qwen3-Thinking（如 Qwen3-235B-A22B-Thinking-2507）：使用 qwen3 或 qwen3-thinking 解析器，始终进行思考

Kimi：

Kimi：使用特殊的 ◁think▷ 和 ◁/think▷ 标签

GPT OSS：

GPT OSS：使用特殊的 <|channel|>analysis<|message|> 和 <|end|> 标签

使用方法#

启动服务器#

指定 --reasoning-parser 选项。

[ ]:

import requests
from openai import OpenAI
from sglang.test.doc_patch import launch_server_cmd
from sglang.utils import wait_for_server, print_highlight, terminate_process

server_process, port = launch_server_cmd(
    "python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --host 0.0.0.0 --reasoning-parser deepseek-r1 --log-level warning"
)

wait_for_server(f"http://localhost:{port}")

注意 --reasoning-parser 定义了用于解释响应的解析器。

OpenAI 兼容 API#

使用 OpenAI 兼容 API，其协议遵循 DeepSeek-R1 发布时建立的 DeepSeek API 设计：

reasoning_content：思维链（CoT）的内容。
content：最终答案的内容。

[ ]:

# 初始化 OpenAI 风格的客户端
client = OpenAI(api_key="None", base_url=f"http://0.0.0.0:{port}/v1")
model_name = client.models.list().data[0].id

messages = [
    {
        "role": "user",
        "content": "1+3 等于多少？",
    }
]

非流式请求#

[ ]:

response_non_stream = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.6,
    top_p=0.95,
    stream=False,  # 非流式
    extra_body={"separate_reasoning": True},
)
print_highlight("==== 推理过程 ====")
print_highlight(response_non_stream.choices[0].message.reasoning_content)

print_highlight("==== 文本 ====")
print_highlight(response_non_stream.choices[0].message.content)

流式请求#

[ ]:

response_stream = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.6,
    top_p=0.95,
    stream=True,  # 流式
    extra_body={"separate_reasoning": True},
)

reasoning_content = ""
content = ""
for chunk in response_stream:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

print_highlight("==== 推理过程 ====")
print_highlight(reasoning_content)

print_highlight("==== 文本 ====")
print_highlight(content)

您可以选择将推理内容缓冲到最后一个推理块（或推理内容后的第一个块）。

[ ]:

response_stream = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.6,
    top_p=0.95,
    stream=True,  # 流式
    extra_body={"separate_reasoning": True, "stream_reasoning": False},
)

reasoning_content = ""
content = ""
for chunk in response_stream:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

print_highlight("==== 推理过程 ====")
print_highlight(reasoning_content)

print_highlight("==== 文本 ====")
print_highlight(content)

当指定时，默认启用推理分离。 要禁用它，请在请求中将 ``separate_reasoning`` 选项设置为 ``False``。

[ ]:

response_non_stream = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.6,
    top_p=0.95,
    stream=False,  # 非流式
    extra_body={"separate_reasoning": False},
)

print_highlight("==== 原始输出 ====")
print_highlight(response_non_stream.choices[0].message.content)

SGLang 原生 API#

[ ]:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
input = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, return_dict=False
)

gen_url = f"http://localhost:{port}/generate"
gen_data = {
    "text": input,
    "sampling_params": {
        "skip_special_tokens": False,
        "max_new_tokens": 1024,
        "temperature": 0.6,
        "top_p": 0.95,
    },
}
gen_response = requests.post(gen_url, json=gen_data).json()["text"]

print_highlight("==== 原始输出 ====")
print_highlight(gen_response)

parse_url = f"http://localhost:{port}/separate_reasoning"
separate_reasoning_data = {
    "text": gen_response,
    "reasoning_parser": "deepseek-r1",
}
separate_reasoning_response_json = requests.post(
    parse_url, json=separate_reasoning_data
).json()
print_highlight("==== 推理过程 ====")
print_highlight(separate_reasoning_response_json["reasoning_text"])
print_highlight("==== 文本 ====")
print_highlight(separate_reasoning_response_json["text"])

[ ]:

terminate_process(server_process)

离线引擎 API#

[ ]:

import sglang as sgl
from sglang.srt.parser.reasoning_parser import ReasoningParser
from sglang.utils import print_highlight

llm = sgl.Engine(model_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
input = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, return_dict=False
)
sampling_params = {
    "max_new_tokens": 1024,
    "skip_special_tokens": False,
    "temperature": 0.6,
    "top_p": 0.95,
}
result = llm.generate(prompt=input, sampling_params=sampling_params)

generated_text = result["text"]  # 假设只有一个提示

print_highlight("==== 原始输出 ====")
print_highlight(generated_text)

parser = ReasoningParser("deepseek-r1")
reasoning_text, text = parser.parse_non_stream(generated_text)
print_highlight("==== 推理过程 ====")
print_highlight(reasoning_text)
print_highlight("==== 文本 ====")
print_highlight(text)

[ ]:

llm.shutdown()

支持新的推理模型模式#

对于未来的推理模型，您可以在 python/sglang/srt/reasoning_parser.py 中将推理解析器实现为 BaseReasoningFormatDetector 的子类，并相应地为新的推理模型模式指定推理解析器。

推理解析器

目录