SGLang 前端语言#
SGLang 前端语言可以方便地以结构化的方式定义简单易用的提示。
启动服务器#
在您的终端中启动服务器并等待其初始化。
[ ]:
from sglang import assistant_begin, assistant_end
from sglang import assistant, function, gen, system, user
from sglang import image
from sglang import RuntimeEndpoint
from sglang.lang.api import set_default_backend
from sglang.srt.utils import load_image
from sglang.test.doc_patch import launch_server_cmd
from sglang.utils import print_highlight, terminate_process, wait_for_server
server_process, port = launch_server_cmd(
"python -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0 --log-level warning"
)
wait_for_server(f"http://localhost:{port}")
print(f"Server started on http://localhost:{port}")
设置默认后端。注意:除了本地服务器,您还可以使用 OpenAI 或其他 API 端点。
[ ]:
set_default_backend(RuntimeEndpoint(f"http://localhost:{port}"))
基本用法#
使用 SGLang 前端语言最简单的方式是用户和助手之间的简单问答对话。
[ ]:
@function
def basic_qa(s, question):
s += system(f"You are a helpful assistant than can answer questions.")
s += user(question)
s += assistant(gen("answer", max_tokens=512))
[ ]:
state = basic_qa("List 3 countries and their capitals.")
print_highlight(state["answer"])
多轮对话#
SGLang 前端语言也可用于定义多轮对话。
[ ]:
@function
def multi_turn_qa(s):
s += system(f"You are a helpful assistant than can answer questions.")
s += user("Please give me a list of 3 countries and their capitals.")
s += assistant(gen("first_answer", max_tokens=512))
s += user("Please give me another list of 3 countries and their capitals.")
s += assistant(gen("second_answer", max_tokens=512))
return s
state = multi_turn_qa()
print_highlight(state["first_answer"])
print_highlight(state["second_answer"])
控制流#
您可以在函数内使用任何 Python 代码来定义更复杂的控制流。
[ ]:
@function
def tool_use(s, question):
s += assistant(
"To answer this question: "
+ question
+ ". I need to use a "
+ gen("tool", choices=["calculator", "search engine"])
+ ". "
)
if s["tool"] == "calculator":
s += assistant("The math expression is: " + gen("expression"))
elif s["tool"] == "search engine":
s += assistant("The key word to search is: " + gen("word"))
state = tool_use("What is 2 * 2?")
print_highlight(state["tool"])
print_highlight(state["expression"])
并行处理#
使用 fork 启动并行提示。由于 sgl.gen 是非阻塞的,下面的 for 循环并行发出了两个生成调用。
[ ]:
@function
def tip_suggestion(s):
s += assistant(
"Here are two tips for staying healthy: "
"1. Balanced Diet. 2. Regular Exercise.\n\n"
)
forks = s.fork(2)
for i, f in enumerate(forks):
f += assistant(
f"Now, expand tip {i+1} into a paragraph:\n"
+ gen("detailed_tip", max_tokens=256, stop="\n\n")
)
s += assistant("Tip 1:" + forks[0]["detailed_tip"] + "\n")
s += assistant("Tip 2:" + forks[1]["detailed_tip"] + "\n")
s += assistant(
"To summarize the above two tips, I can say:\n" + gen("summary", max_tokens=512)
)
state = tip_suggestion()
print_highlight(state["summary"])
约束解码#
使用 regex 指定正则表达式作为解码约束。这仅支持本地模型。
[ ]:
@function
def regular_expression_gen(s):
s += user("What is the IP address of the Google DNS servers?")
s += assistant(
gen(
"answer",
temperature=0,
regex=r"((25[0-5]|2[0-4]\d|[01]?\d\d?).){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)",
)
)
state = regular_expression_gen()
print_highlight(state["answer"])
使用 regex 定义 JSON 解码模式。
[ ]:
character_regex = (
r"""\{\n"""
+ r""" "name": "[\w\d\s]{1,16}",\n"""
+ r""" "house": "(Gryffindor|Slytherin|Ravenclaw|Hufflepuff)",\n"""
+ r""" "blood status": "(Pure-blood|Half-blood|Muggle-born)",\n"""
+ r""" "occupation": "(student|teacher|auror|ministry of magic|death eater|order of the phoenix)",\n"""
+ r""" "wand": \{\n"""
+ r""" "wood": "[\w\d\s]{1,16}",\n"""
+ r""" "core": "[\w\d\s]{1,16}",\n"""
+ r""" "length": [0-9]{1,2}\.[0-9]{0,2}\n"""
+ r""" \},\n"""
+ r""" "alive": "(Alive|Deceased)",\n"""
+ r""" "patronus": "[\w\d\s]{1,16}",\n"""
+ r""" "bogart": "[\w\d\s]{1,16}"\n"""
+ r"""\}"""
)
@function
def character_gen(s, name):
s += user(
f"{name} is a character in Harry Potter. Please fill in the following information about this character."
)
s += assistant(gen("json_output", max_tokens=256, regex=character_regex))
state = character_gen("Harry Potter")
print_highlight(state["json_output"])
批处理#
使用 run_batch 运行一批提示。
[ ]:
@function
def text_qa(s, question):
s += user(question)
s += assistant(gen("answer", stop="\n"))
states = text_qa.run_batch(
[
{"question": "What is the capital of the United Kingdom?"},
{"question": "What is the capital of France?"},
{"question": "What is the capital of Japan?"},
],
progress_bar=True,
)
for i, state in enumerate(states):
print_highlight(f"Answer {i+1}: {states[i]['answer']}")
流式处理#
使用 stream 将输出流式传输给用户。
[ ]:
@function
def text_qa(s, question):
s += user(question)
s += assistant(gen("answer", stop="\n"))
state = text_qa.run(
question="What is the capital of France?", temperature=0.1, stream=True
)
for out in state.text_iter():
print(out, end="", flush=True)
复杂提示#
您可以使用 {system|user|assistant}_{begin|end} 来定义复杂提示。
[ ]:
@function
def chat_example(s):
s += system("You are a helpful assistant.")
# Same as: s += s.system("You are a helpful assistant.")
with s.user():
s += "Question: What is the capital of France?"
s += assistant_begin()
s += "Answer: " + gen("answer", max_tokens=100, stop="\n")
s += assistant_end()
state = chat_example()
print_highlight(state["answer"])
[ ]:
terminate_process(server_process)
多模态生成#
您可以使用 SGLang 前端语言来定义多模态提示。 有关支持的模型,请参见此处。
[ ]:
server_process, port = launch_server_cmd(
"python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --host 0.0.0.0 --log-level warning"
)
wait_for_server(f"http://localhost:{port}")
print(f"Server started on http://localhost:{port}")
[ ]:
set_default_backend(RuntimeEndpoint(f"http://localhost:{port}"))
关于图像提出问题。
[ ]:
@function
def image_qa(s, image_file, question):
s += user(image(image_file) + question)
s += assistant(gen("answer", max_tokens=256))
image_url = "https://github.com/sgl-project/sglang/blob/main/examples/assets/example_image.png?raw=true"
image_bytes, _ = load_image(image_url)
state = image_qa(image_bytes, "What is in the image?")
print_highlight(state["answer"])
[ ]:
terminate_process(server_process)