目录

{
"cells": [
{

"cell_type": "markdown", "metadata": {}, "source": [

"# 工具解析器n", "n", "本指南演示如何使用 SGLang 的[函数调用](https://platform.openai.com/docs/guides/function-calling)功能。"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## 当前支持的解析器:n", "n", "| 解析器 | 支持的模型 | 说明 |\n", "||---||\n", "| deepseekv3 | DeepSeek-v3 (例如 deepseek-ai/DeepSeek-V3-0324) | 建议在启动命令中添加 –chat-template ./examples/chat_template/tool_chat_template_deepseekv3.jinja|\n", "| deepseekv31 | DeepSeek-V3.1 和 DeepSeek-V3.2 (例如 deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3.2-Exp) | 建议在启动命令中添加 –chat-template ./examples/chat_template/tool_chat_template_deepseekv31.jinja (对于 DeepSeek-V3.2 使用 ..deepseekv32.jinja)。 |\n", "| glm | GLM 系列 (例如 zai-org/GLM-4.6) | |\n", "| gpt-oss | GPT-OSS (例如 openai/gpt-oss-120b, openai/gpt-oss-20b, lmsys/gpt-oss-120b-bf16, lmsys/gpt-oss-20b-bf16) | gpt-oss 工具解析器会过滤分析通道事件,只保留正常文本。当解释内容在分析通道中时,这可能导致内容为空。解决此问题的方法是,通过返回工具结果作为 role="tool" 消息来完成工具调用周期,这样模型就能生成最终内容。 |\n", "| kimi_k2 | moonshotai/Kimi-K2-Instruct | |\n", "| llama3 | Llama 3.1 / 3.2 / 3.3 (例如 meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.3-70B-Instruct) | |\n", "| llama4 | Llama 4 (例如 meta-llama/Llama-4-Scout-17B-16E-Instruct) | |\n", "| mistral | Mistral (例如 mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3) | |\n", "| pythonic | Llama-3.2 / Llama-3.3 / Llama-4 | 模型将函数调用输出为 Python 代码。需要 –tool-call-parser pythonic,并建议使用特定的聊天模板。 |\n", "| qwen | Qwen 系列 (例如 Qwen/Qwen3-Next-80B-A3B-Instruct, Qwen/Qwen3-VL-30B-A3B-Thinking),不包括 Qwen3-Coder | |\n", "| qwen3_coder | Qwen3-Coder (例如 Qwen/Qwen3-Coder-30B-A3B-Instruct) | |\n", "| step3 | Step-3 | |"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## OpenAI 兼容 API"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 启动服务器"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"import jsonn", "from sglang.test.doc_patch import launch_server_cmdn", "from sglang.utils import wait_for_server, print_highlight, terminate_processn", "from openai import OpenAIn", "n", "server_process, port = launch_server_cmd(n", " "python3 -m sglang.launch_server –model-path Qwen/Qwen2.5-7B-Instruct –tool-call-parser qwen25 –host 0.0.0.0 –log-level warning" # qwen25n", ")n", "wait_for_server(f"http://localhost:{port}")"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"请注意,–tool-call-parser 定义了用于解释响应的解析器。"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 为函数调用定义工具n", "以下是一个 Python 代码片段,展示了如何将工具定义为字典。字典包括工具名称、描述和定义的参数。"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 定义工具n", "tools = [n", " {n", " "type": "function",n", " "function": {n", " "name": "get_current_weather",n", " "description": "Get the current weather in a given location",n", " "parameters": {n", " "type": "object",n", " "properties": {n", " "city": {n", " "type": "string",n", " "description": "The city to find the weather for, e.g. 'San Francisco'",n", " },n", " "state": {n", " "type": "string",n", " "description": "the two-letter abbreviation for the state that the city is"n", " " in, e.g. 'CA' which would mean 'California'",n", " },n", " "unit": {n", " "type": "string",n", " "description": "The unit to fetch the temperature in",n", " "enum": ["celsius", "fahrenheit"],n", " },n", " },n", " "required": ["city", "state", "unit"],n", " },n", " },n", " }n", "]"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 定义消息"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"def get_messages():n", " return [n", " {n", " "role": "user",n", " "content": "What’s the weather like in Boston today? Output a reasoning before act, then use the tools to help you.",n", " }n", " ]n", "n", "n", "messages = get_messages()"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 初始化客户端"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 初始化 OpenAI 风格的客户端n", "client = OpenAI(api_key="None", base_url=f"http://0.0.0.0:{port}/v1")n", "model_name = client.models.list().data[0].id"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 非流式请求"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 非流式模式测试n", "response_non_stream = client.chat.completions.create(n", " model=model_name,n", " messages=messages,n", " temperature=0,n", " top_p=0.95,n", " max_tokens=1024,n", " stream=False, # 非流式n", " tools=tools,n", ")n", "print_highlight("Non-stream response:")n", "print_highlight(response_non_stream)n", "print_highlight("==== content ====")n", "print_highlight(response_non_stream.choices[0].message.content)n", "print_highlight("==== tool_calls ====")n", "print_highlight(response_non_stream.choices[0].message.tool_calls)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"#### 处理工具n", "当引擎确定应该调用特定工具时,它将通过响应返回参数或部分参数。您可以解析这些参数,并在稍后相应地调用工具。"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"name_non_stream = response_non_stream.choices[0].message.tool_calls[0].function.namen", "arguments_non_stream = (n", " response_non_stream.choices[0].message.tool_calls[0].function.argumentsn", ")n", "n", "print_highlight(f"Final streamed function call name: {name_non_stream}")n", "print_highlight(f"Final streamed function call arguments: {arguments_non_stream}")"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 流式请求"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 流式模式测试n", "print_highlight("Streaming response:")n", "response_stream = client.chat.completions.create(n", " model=model_name,n", " messages=messages,n", " temperature=0,n", " top_p=0.95,n", " max_tokens=1024,n", " stream=True, # 启用流式n", " tools=tools,n", ")n", "n", "texts = ""n", "tool_calls = []n", "name = ""n", "arguments = ""n", "for chunk in response_stream:n", " if chunk.choices[0].delta.content:n", " texts += chunk.choices[0].delta.contentn", " if chunk.choices[0].delta.tool_calls:n", " tool_calls.append(chunk.choices[0].delta.tool_calls[0])n", "print_highlight("==== Text ====")n", "print_highlight(texts)n", "n", "print_highlight("==== Tool Call ====")n", "for tool_call in tool_calls:n", " print_highlight(tool_call)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"#### 处理工具n", "当引擎确定应该调用特定工具时,它将通过响应返回参数或部分参数。您可以解析这些参数,并在稍后相应地调用工具。"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 解析并组合函数调用参数n", "arguments = []n", "for tool_call in tool_calls:n", " if tool_call.function.name:n", " print_highlight(f"Streamed function call name: {tool_call.function.name}")n", "n", " if tool_call.function.arguments:n", " arguments.append(tool_call.function.arguments)n", "n", "# 将所有片段组合成单个 JSON 字符串n", "full_arguments = "".join(arguments)n", "print_highlight(f"streamed function call arguments: {full_arguments}")"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 定义工具函数"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 这是一个演示,请根据您的使用情况定义真实函数。n", "def get_current_weather(city: str, state: str, unit: "str"):n", " return (n", " f"The weather in {city}, {state} is 85 degrees {unit}. It is "n", " "partly cloudly, with highs in the 90’s."n", " )n", "n", "n", "available_tools = {"get_current_weather": get_current_weather}"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"n", "### 执行工具"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"messages.append(response_non_stream.choices[0].message)n", "n", "# 调用相应的工具函数n", "tool_call = messages[-1].tool_calls[0]n", "tool_name = tool_call.function.namen", "tool_to_call = available_tools[tool_name]n", "result = tool_to_call(**(json.loads(tool_call.function.arguments)))n", "print_highlight(f"Function call result: {result}")n", "# messages.append({"role": "tool", "content": result, "name": tool_name})n", "messages.append(n", " {n", " "role": "tool",n", " "tool_call_id": tool_call.id,n", " "content": str(result),n", " "name": tool_name,n", " }n", ")n", "n", "print_highlight(f"Updated message history: {messages}")"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 将结果发送回模型"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"final_response = client.chat.completions.create(n", " model=model_name,n", " messages=messages,n", " temperature=0,n", " top_p=0.95,n", " stream=False,n", " tools=tools,n", ")n", "print_highlight("Non-stream response:")n", "print_highlight(final_response)n", "n", "print_highlight("==== Text ====")n", "print_highlight(final_response.choices[0].message.content)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## 原生 API 和 SGLang 运行时 (SRT)"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"from transformers import AutoTokenizern", "import requestsn", "n", "# 生成答案n", "tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")n", "n", "messages = get_messages()n", "n", "input = tokenizer.apply_chat_template(n", " messages, tokenize=False, add_generation_prompt=True, tools=tools, return_dict=Falsen", ")n", "n", "gen_url = f"http://localhost:{port}/generate"n", "gen_data = {n", " "text": input,n", " "sampling_params": {n", " "skip_special_tokens": False,n", " "max_new_tokens": 1024,n", " "temperature": 0,n", " "top_p": 0.95,n", " },n", "}n", "gen_response = requests.post(gen_url, json=gen_data).json()["text"]n", "print_highlight("==== Response ====")n", "print_highlight(gen_response)n", "n", "# 解析响应n", "parse_url = f"http://localhost:{port}/parse_function_call"n", "n", "function_call_input = {n", " "text": gen_response,n", " "tool_call_parser": "qwen25",n", " "tools": tools,n", "}n", "n", "function_call_response = requests.post(parse_url, json=function_call_input)n", "function_call_response_json = function_call_response.json()n", "n", "print_highlight("==== Text ====")n", "print(function_call_response_json["normal_text"])n", "print_highlight("==== Calls ====")n", "print("function name: ", function_call_response_json["calls"][0]["name"])n", "print("function arguments: ", function_call_response_json["calls"][0]["parameters"])"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"terminate_process(server_process)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## 离线引擎 API"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"import sglang as sgln", "from sglang.srt.function_call.function_call_parser import FunctionCallParsern", "from sglang.srt.managers.io_struct import Tool, Functionn", "n", "llm = sgl.Engine(model_path="Qwen/Qwen2.5-7B-Instruct")n", "tokenizer = llm.tokenizer_manager.tokenizern", "input_ids = tokenizer.apply_chat_template(n", " messages, tokenize=True, add_generation_prompt=True, tools=tools, return_dict=Falsen", ")n", "n", "# 请注意,对于 gpt-oss 工具解析器,添加 "no_stop_trim": Truen", "# 以确保工具调用标记 <call> 不会被裁剪。n", "n", "sampling_params = {n", " "max_new_tokens": 1024,n", " "temperature": 0,n", " "top_p": 0.95,n", " "skip_special_tokens": False,n", "}n", "n", "# 1) 离线生成n", "result = llm.generate(input_ids=input_ids, sampling_params=sampling_params)n", "generated_text = result["text"] # 假设只有一个提示n", "n", "print_highlight("=== Offline Engine Output Text ===")n", "print_highlight(generated_text)n", "n", "n", "# 2) 使用 FunctionCallParser 解析n", "def convert_dict_to_tool(tool_dict: dict) -> Tool:n", " function_dict = tool_dict.get("function", {})n", " return Tool(n", " type=tool_dict.get("type", "function"),n", " function=Function(n", " name=function_dict.get("name"),n", " description=function_dict.get("description"),n", " parameters=function_dict.get("parameters"),n", " ),n", " )n", "n", "n", "tools = [convert_dict_to_tool(raw_tool) for raw_tool in tools]n", "n", "parser = FunctionCallParser(tools=tools, tool_call_parser="qwen25")n", "normal_text, calls = parser.parse_non_stream(generated_text)n", "n", "print_highlight("=== Parsing Result ===")n", "print("Normal text portion:", normal_text)n", "print_highlight("Function call portion:")n", "for call in calls:n", " # call: ToolCallItemn", " print_highlight(f" - tool name: {call.name}")n", " print_highlight(f" parameters: {call.parameters}")n", "n", "# 3) 如果需要,对解析后的函数执行额外的逻辑,例如自动调用相应的函数以获取返回值等。"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"llm.shutdown()"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## 工具选择模式n", "n", "SGLang 支持 OpenAI 的 tool_choice 参数,以控制模型何时以及调用哪个工具。此功能使用 EBNF(扩展巴科斯-瑙尔形式)语法实现,以确保可靠的工具调用行为。n", "n", "### 支持的工具选择选项n", "n", "- `tool_choice="required"`: 强制模型至少调用一个工具n", "- `tool_choice={"type": "function", "function": {"name": "specific_function"}}`: 强制模型调用特定函数n", "n", "### 后端兼容性n", "n", "工具选择完全支持 Xgrammar 后端,这是默认的语法后端(–grammar-backend xgrammar)。但是,它可能不完全支持其他后端,如 outlines。n", "n", "### 示例:必需的工具选择"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"from openai import OpenAIn", "from sglang.utils import wait_for_server, print_highlight, terminate_processn", "from sglang.test.doc_patch import launch_server_cmdn", "n", "# 为工具选择示例启动新的服务器会话n", "server_process_tool_choice, port_tool_choice = launch_server_cmd(n", " "python3 -m sglang.launch_server –model-path Qwen/Qwen2.5-7B-Instruct –tool-call-parser qwen25 –host 0.0.0.0 –log-level warning"n", ")n", "wait_for_server(f"http://localhost:{port_tool_choice}")n", "n", "# 为工具选择示例初始化客户端n", "client_tool_choice = OpenAI(n", " api_key="None", base_url=f"http://0.0.0.0:{port_tool_choice}/v1"n", ")n", "model_name_tool_choice = client_tool_choice.models.list().data[0].idn", "n", "# 使用 tool_choice="required" 的示例 - 强制模型调用工具n", "messages_required = [n", " {"role": "user", "content": "Hello, what is the capital of France?"}n", "]n", "n", "# 定义工具n", "tools = [n", " {n", " "type": "function",n", " "function": {n", " "name": "get_current_weather",n", " "description": "Get the current weather in a given location",n", " "parameters": {n", " "type": "object",n", " "properties": {n", " "city": {n", " "type": "string",n", " "description": "The city to find the weather for, e.g. 'San Francisco'",n", " },n", " "unit": {n", " "type": "string",n", " "description": "The unit to fetch the temperature in",n", " "enum": ["celsius", "fahrenheit"],n", " },n", " },n", " "required": ["city", "unit"],n", " },n", " },n", " }n", "]n", "n", "response_required = client_tool_choice.chat.completions.create(n", " model=model_name_tool_choice,n", " messages=messages_required,n", " temperature=0,n", " max_tokens=1024,n", " tools=tools,n", " tool_choice="required", # 强制模型调用工具n", ")n", "n", "print_highlight("Response with tool_choice='required':")n", "print("Content:", response_required.choices[0].message.content)n", "print("Tool calls:", response_required.choices[0].message.tool_calls)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"### 示例:特定函数选择"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"# 使用特定函数选择的示例 - 强制模型调用特定函数n", "messages_specific = [n", " {"role": "user", "content": "What are the most attactive places in France?"}n", "]n", "n", "response_specific = client_tool_choice.chat.completions.create(n", " model=model_name_tool_choice,n", " messages=messages_specific,n", " temperature=0,n", " max_tokens=1024,n", " tools=tools,n", " tool_choice={n", " "type": "function",n", " "function": {"name": "get_current_weather"},n", " }, # 强制模型调用特定的 get_current_weather 函数n", ")n", "n", "print_highlight("Response with specific function choice:")n", "print("Content:", response_specific.choices[0].message.content)n", "print("Tool calls:", response_specific.choices[0].message.tool_calls)n", "n", "if response_specific.choices[0].message.tool_calls:n", " tool_call = response_specific.choices[0].message.tool_calls[0]n", " print_highlight(f"Called function: {tool_call.function.name}")n", " print_highlight(f"Arguments: {tool_call.function.arguments}")"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"terminate_process(server_process_tool_choice)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## Pythonic 工具调用格式 (Llama-3.2 / Llama-3.3 / Llama-4)n", "n", "一些 Llama 模型(如 Llama-3.2-1B、Llama-3.2-3B、Llama-3.3-70B 和 Llama-4)支持 "pythonic" 工具调用格式,其中模型将函数调用输出为 Python 代码,例如:n", "n", "`python\n", "[get_current_weather(city=\"San Francisco\", state=\"CA\", unit=\"celsius\")]\n", "`n", "n", "- 输出是一个 Python 函数调用列表,参数为 Python 字面量(非 JSON)。n", "- 多个工具调用可以在同一个列表中返回:n", "`python\n", "[get_current_weather(city=\"San Francisco\", state=\"CA\", unit=\"celsius\"),\n", " get_current_weather(city=\"New York\", state=\"NY\", unit=\"fahrenheit\")]\n", "`n", "n", "有关更多信息,请参阅 Meta 关于 [Zero shot function calling](https://github.com/meta-llama/llama-models/blob/main/models/llama4/prompt_format.md#zero-shot-function-calling—system-message) 的文档。n", "n", "请注意,此功能在 Blackwell 上仍在开发中。n", "n", "### 如何启用n", "- 使用 –tool-call-parser pythonic 启动服务器n", "- 您也可以为模型指定改进的 –chat-template(例如 –chat-template=examples/chat_template/tool_chat_template_llama4_pythonic.jinja)。n", "建议这样做,因为模型需要特殊的提示格式才能可靠地生成有效的 pythonic 工具调用输出。该模板确保提示结构(例如特殊标记、消息边界如 <|eom|> 和函数调用分隔符)与模型所训练或微调的内容相匹配。如果不使用正确的聊天模板,工具调用可能会失败或产生不一致的结果。n", "n", "#### 不使用聊天模板强制 Pythonic 工具调用输出n", "如果您不想指定聊天模板,则必须在消息中给模型极其明确的指令以强制 pythonic 输出。例如,对于 Llama-3.2-1B-Instruct,您需要:"

]

}, {

"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [

"import openain", "n", "server_process, port = launch_server_cmd(n", " " python3 -m sglang.launch_server –model-path meta-llama/Llama-3.2-1B-Instruct –tool-call-parser pythonic –tp 1 –log-level warning" # llama-3.2-1b-instructn", ")n", "wait_for_server(f"http://localhost:{port}")n", "n", "tools = [n", " {n", " "type": "function",n", " "function": {n", " "name": "get_weather",n", " "description": "Get the current weather for a given location.",n", " "parameters": {n", " "type": "object",n", " "properties": {n", " "location": {n", " "type": "string",n", " "description": "The name of the city or location.",n", " }n", " },n", " "required": ["location"],n", " },n", " },n", " },n", " {n", " "type": "function",n", " "function": {n", " "name": "get_tourist_attractions",n", " "description": "Get a list of top tourist attractions for a given city.",n", " "parameters": {n", " "type": "object",n", " "properties": {n", " "city": {n", " "type": "string",n", " "description": "The name of the city to find attractions for.",n", " }n", " },n", " "required": ["city"],n", " },n", " },n", " },n", "]n", "n", "n", "def get_messages():n", " return [n", " {n", " "role": "system",n", " "content": (n", " "You are a travel assistant. "n", " "When asked to call functions, ALWAYS respond ONLY with a python list of function calls, "n", " "using this format: [func_name1(param1=value1, param2=value2), func_name2(param=value)]. "n", " "Do NOT use JSON, do NOT use variables, do NOT use any other format. "n", " "Here is an example:\n"n", " '[get_weather(location="Paris"), get_tourist_attractions(city="Paris")]'n", " ),n", " },n", " {n", " "role": "user",n", " "content": (n", " "I’m planning a trip to Tokyo next week. What’s the weather like and what are some top tourist attractions? "n", " "Propose parallel tool calls at once, using the python list of function calls format as shown above."n", " ),n", " },n", " ]n", "n", "n", "messages = get_messages()n", "n", "client = openai.Client(base_url=f"http://localhost:{port}/v1", api_key="xxxxxx")n", "model_name = client.models.list().data[0].idn", "n", "n", "response_non_stream = client.chat.completions.create(n", " model=model_name,n", " messages=messages,n", " temperature=0,n", " top_p=0.9,n", " stream=False, # 非流式n", " tools=tools,n", ")n", "print_highlight("Non-stream response:")n", "print_highlight(response_non_stream)n", "n", "response_stream = client.chat.completions.create(n", " model=model_name,n", " messages=messages,n", " temperature=0,n", " top_p=0.9,n", " stream=True,n", " tools=tools,n", ")n", "texts = ""n", "tool_calls = []n", "name = ""n", "arguments = ""n", "n", "for chunk in response_stream:n", " if chunk.choices[0].delta.content:n", " texts += chunk.choices[0].delta.contentn", " if chunk.choices[0].delta.tool_calls:n", " tool_calls.append(chunk.choices[0].delta.tool_calls[0])n", "n", "print_highlight("Streaming Response:")n", "print_highlight("==== Text ====")n", "print_highlight(texts)n", "n", "print_highlight("==== Tool Call ====")n", "for tool_call in tool_calls:n", " print_highlight(tool_call)n", "n", "terminate_process(server_process)"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"> 注意: n", "> 如果模型主要在 JSON 格式上进行了大量微调,它可能仍然默认使用 JSON。如果您不使用聊天模板,提示工程(包括示例)是增加 pythonic 输出几率的唯一方法。"

]

}, {

"cell_type": "markdown", "metadata": {}, "source": [

"## 如何支持新模型?n", "1. 使用模型的工具标签更新 sglang/srt/function_call_parser.py 中的 TOOLS_TAG_LIST。当前支持的标签包括:n", "```n", "tTOOLS_TAG_LIST = [n", "t "<|plugin|>",n", "t "<function=",n", "t "