Skip to main content
Some models think through a problem before giving a final answer. This makes them stronger at math, code, and logic-heavy tasks. Models with the reasoning capability (for example Deepshi 3.0 and most frontier models) can return their thinking alongside the answer. See Text models for which models support reasoning.

Reading the reasoning trace

On a reasoning model, the assistant message can carry extra fields next to content:
  • message.reasoning: the reasoning text.
  • message.reasoning_details: a structured array of reasoning segments.
from openai import OpenAI

client = OpenAI(base_url="https://api.deepshi.ai/v1", api_key="sk-bf-...")

resp = client.chat.completions.create(
    model="deepshi-3.0",
    messages=[{"role": "user", "content": "What is 15% of 240?"}],
    max_tokens=2048,
)

msg = resp.choices[0].message
print(getattr(msg, "reasoning", None))   # the thinking trace (may be None)
print(msg.content)                        # the final answer
If you only want the final answer, read message.content and ignore the reasoning fields.

Give reasoning models enough tokens

Reasoning and the visible answer share the same generation budget. If you set max_tokens too low, a reasoning model can spend its whole budget thinking and return empty content with finish_reason: "length". That is a normal 200, not an error.
{
  "choices": [{ "index": 0, "finish_reason": "length", "message": { "role": "assistant", "content": "" } }]
}
To avoid it, give reasoning models a generous max_tokens so there’s room for both the thinking and the answer.

Streaming

Reasoning works with streaming too. Set "stream": true and read choices[].delta. See the Streaming guide.

Best practices

  • Allow ample max_tokens on reasoning models (room for thinking and the answer).
  • Use a reasoning model for math, code, and multi-step problems; a non-reasoning model is faster and cheaper for simple tasks.
  • Both reasoning and answer tokens count toward usage.completion_tokens and your cost.