Text & chat - Deepshi API

The chat completions endpoint is the main way to generate text with the Deepshi API. It’s fully OpenAI-compatible, so your existing prompts, SDKs, and tooling work unchanged. Endpoint: POST https://api.deepshi.ai/v1/chat/completions

Basic request

Send a list of messages. Each message has a role (system, user, or assistant) and content.

curl https://api.deepshi.ai/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSHI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepshi-3.0",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Explain backpropagation in two sentences." }
    ]
  }'

Common parameters

Parameter	Type	Description
`model`	string	The model id to use, e.g. `deepshi-2.0` or `deepshi-3.0`. See Models.
`messages`	array	The conversation so far. Required.
`temperature`	number	Sampling randomness, typically `0`–`2`. Lower is more deterministic.
`top_p`	number	Nucleus sampling cutoff. Use instead of `temperature`, not both.
`max_tokens`	integer	Maximum tokens to generate in the response.
`stop`	string or array	Sequences that stop generation.
`stream`	boolean	Stream tokens as Server-Sent Events. See Streaming.
`seed`	integer	Best-effort deterministic sampling for repeatable output.
`tools`	array	Function/tool definitions the model may call. See Tool calling.
`response_format`	object	Set to `{ "type": "json_object" }` to force valid JSON output (model-dependent).

Supported parameters vary by model. Unsupported fields are safely ignored rather than rejected. Each model lists its supported_sampling_parameters in the models catalog.

Multi-turn conversations

The API is stateless, so it doesn’t remember previous calls. To continue a conversation, send the full message history each time and append the model’s previous reply as an assistant message:

messages = [{"role": "user", "content": "What's the capital of France?"}]
resp = client.chat.completions.create(model="deepshi-3.0", messages=messages)

messages.append(resp.choices[0].message)              # the assistant's reply
messages.append({"role": "user", "content": "And its population?"})

resp = client.chat.completions.create(model="deepshi-3.0", messages=messages)

Reasoning models

Models like Deepshi 3.0 reason step by step before answering. They accept the same request shape; allow more max_tokens and expect higher latency on hard problems. See Reasoning models and Text models for which models support reasoning.

Streaming

Set "stream": true to receive tokens incrementally as Server-Sent Events. See the Streaming guide for a full example.

​Basic request

​Common parameters

​Multi-turn conversations

​Reasoning models

​Streaming

​Next steps

Tool calling

Reasoning

Basic request

Common parameters

Multi-turn conversations

Reasoning models

Streaming

Next steps