Skip to main content
The chat completions endpoint is the main way to generate text with the Deepshi API. It’s fully OpenAI-compatible, so your existing prompts, SDKs, and tooling work unchanged. Endpoint: POST https://api.deepshi.ai/v1/chat/completions

Basic request

Send a list of messages. Each message has a role (system, user, or assistant) and content.
curl https://api.deepshi.ai/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSHI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepshi-3.0",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Explain backpropagation in two sentences." }
    ]
  }'

Common parameters

ParameterTypeDescription
modelstringThe model id to use, e.g. deepshi-2.0 or deepshi-3.0. See Models.
messagesarrayThe conversation so far. Required.
temperaturenumberSampling randomness, typically 02. Lower is more deterministic.
top_pnumberNucleus sampling cutoff. Use instead of temperature, not both.
max_tokensintegerMaximum tokens to generate in the response.
stopstring or arraySequences that stop generation.
streambooleanStream tokens as Server-Sent Events. See Streaming.
seedintegerBest-effort deterministic sampling for repeatable output.
toolsarrayFunction/tool definitions the model may call. See Tool calling.
response_formatobjectSet to { "type": "json_object" } to force valid JSON output (model-dependent).
Supported parameters vary by model. Unsupported fields are safely ignored rather than rejected. Each model lists its supported_sampling_parameters in the models catalog.

Multi-turn conversations

The API is stateless, so it doesn’t remember previous calls. To continue a conversation, send the full message history each time and append the model’s previous reply as an assistant message:
messages = [{"role": "user", "content": "What's the capital of France?"}]
resp = client.chat.completions.create(model="deepshi-3.0", messages=messages)

messages.append(resp.choices[0].message)              # the assistant's reply
messages.append({"role": "user", "content": "And its population?"})

resp = client.chat.completions.create(model="deepshi-3.0", messages=messages)

Reasoning models

Models like Deepshi 3.0 reason step by step before answering. They accept the same request shape; allow more max_tokens and expect higher latency on hard problems. See Reasoning models and Text models for which models support reasoning.

Streaming

Set "stream": true to receive tokens incrementally as Server-Sent Events. See the Streaming guide for a full example.

Next steps

Tool calling

Let the model call your functions.

Reasoning

Use models that think before they answer.