Skip to main content
Deepshi is pay-as-you-go. You buy credits, and each request costs a fraction of a cent based on the tokens it uses. No subscriptions, no seats, no minimums. And failed requests (4xx) are never billed.

See live pricing

Up-to-date per-model rates live on deepshi.ai. They’re the source of truth and can change over time.

How a request is priced

Pricing is per model, in USD per 1M tokens, with separate rates for input (prompt) and output (completion) tokens:
cost = prompt_tokens × input_rate + completion_tokens × output_rate
Prompt tokens served from cache are billed at the model’s discounted cache-read rate where one is offered (reported under usage.prompt_tokens_details.cached_tokens).

You never have to guess the cost

Every successful response reports exactly what it cost in usage.cost.total_cost (USD), drawn from your balance:
"usage": {
  "prompt_tokens": 22,
  "completion_tokens": 8,
  "total_tokens": 30,
  "cost": { "total_cost": 0.000135 }
}
When streaming, add "stream_options": {"include_usage": true} to get a final chunk with the same usage (including cost). Standard OpenAI SDKs ignore the extra cost field, so it doesn’t break compatibility.

Credits & billing

How your balance, top-ups, and running out of credits work.

Text models

Context windows and capabilities for every model.