Set "stream": true on a chat completions request to receive the response incrementally as Server-Sent Events (SSE), instead of waiting for the full completion. This is ideal for chat UIs where you want to render tokens as they arrive.
Example
from openai import OpenAI
client = OpenAI(base_url="https://api.deepshi.ai/v1", api_key="sk-bf-...")
stream = client.chat.completions.create(
model="deepshi-3.0",
messages=[{"role": "user", "content": "Write a haiku about the sea."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Each event is a data: line containing a partial chunk. Tokens arrive in choices[0].delta.content. The stream ends with a data: [DONE] sentinel:
data: {"choices":[{"delta":{"content":"Wide"}}]}
data: {"choices":[{"delta":{"content":" blue"}}]}
data: [DONE]
Streaming works the same way for tool calls. The arguments arrive
incrementally in delta.tool_calls. Accumulate them until finish_reason is
tool_calls.