Skip to main content
Set "stream": true on a chat completions request to receive the response incrementally as Server-Sent Events (SSE), instead of waiting for the full completion. This is ideal for chat UIs where you want to render tokens as they arrive.

Example

from openai import OpenAI

client = OpenAI(base_url="https://api.deepshi.ai/v1", api_key="sk-bf-...")

stream = client.chat.completions.create(
    model="deepshi-3.0",
    messages=[{"role": "user", "content": "Write a haiku about the sea."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

The stream format

Each event is a data: line containing a partial chunk. Tokens arrive in choices[0].delta.content. The stream ends with a data: [DONE] sentinel:
data: {"choices":[{"delta":{"content":"Wide"}}]}

data: {"choices":[{"delta":{"content":" blue"}}]}

data: [DONE]
Streaming works the same way for tool calls. The arguments arrive incrementally in delta.tool_calls. Accumulate them until finish_reason is tool_calls.