Skip to content

Torrify Pro pricing analysis

Status: ✅ Implemented via PRO Gateway (January 2026)

Overview

This document outlines the cost analysis and decision framework used to implement the Torrify PRO tier. The PRO tier routes requests through a managed gateway to optimize costs via prompt caching and provide curated model access.

Implemented Solution: PRO Gateway

  • Gateway URL: https://the-gatekeeper-production.up.railway.app
  • Authentication: License keys (managed via LemonSqueezy)
  • Features:
    • Prompt caching for supported OpenRouter models
    • Streaming support (SSE passthrough)
    • Cost tracking and rate limiting (gateway-side)
    • Curated model list in UI

Source of truth for pricing

OpenRouter prices vary by model and change over time. Use the Models API to pull live pricing and avoid hardcoding:

  • Models API: https://openrouter.ai/api/v1/models
  • Pricing fields (per-token, USD): pricing.prompt, pricing.completion, pricing.request
  • OpenRouter pay-as-you-go platform fee: 5.5% (per OpenRouter pricing page)

Reference:

Candidate model shortlist (current PRO UI)

From the current PRO model list in src/components/SettingsModal.tsx:

  • anthropic/claude-3.7-sonnet:thinking
  • openai/gpt-5.2-codex
  • anthropic/claude-sonnet-4.5
  • google/gemini-2.5-pro
  • openai/gpt-5.1-codex-max
  • deepseek/deepseek-v3.2
  • qwen/qwen3-coder
  • x-ai/grok-code-fast-1
  • openai/gpt-5-mini
  • google/gemini-2.5-flash

Cost model

Per-request cost formula

Let:

  • P = prompt tokens
  • C = completion tokens
  • Rp = pricing.prompt (USD per token)
  • Rc = pricing.completion (USD per token)
  • Rr = pricing.request (USD per request)
  • F = OpenRouter platform fee (0.055)

Then:

base_cost = (P * Rp) + (C * Rc) + Rr
total_cost = base_cost * (1 + F)

Usage profiles (baseline scenarios)

Use these as standard scenarios for comparing models. Update with telemetry later.

ProfileInput tokens (P)Output tokens (C)DescriptionNotes
Quick edit300150Small changes, short repliesGood for simple refactors
Standard (current)500250Typical chat/code assistUsed in pricing table
Deep assist1200600Longer reasoning + codeFor complex CAD tasks
Image-assisted700300Uses 1–2 imagesAdd image pricing if model charges per image

Scaling note: costs scale linearly with P/C for a fixed model. For example, “Deep assist” is 2.4× the tokens of “Standard.”

Pricing comparison table (filled from Models API)

Assumptions:

  • Avg request size: 500 input tokens + 250 output tokens
  • OpenRouter platform fee: 5.5%
  • Prices shown per 1M tokens are from https://openrouter.ai/api/v1/models
ModelPrompt $/1MCompletion $/1MEst $/req (500/250, w/ fee)Est reqs per $20
anthropic/claude-3.7-sonnet:thinking3.0015.000.005543,600
openai/gpt-5.2-codex1.7514.000.004624,334
anthropic/claude-sonnet-4.53.0015.000.005543,600
google/gemini-2.5-pro1.2510.000.003306,061
openai/gpt-5.1-codex-max1.2510.000.003306,061
deepseek/deepseek-v3.20.250.380.0002386,000
qwen/qwen3-coder0.220.950.0003754,600
x-ai/grok-code-fast-10.201.500.0005039,900
openai/gpt-5-mini0.252.000.0006630,300
google/gemini-2.5-flash0.302.500.0008224,500

Tip: The Models API returns per-token prices. Convert to per-1M tokens for readability.

Cost projections by usage profile

Assumptions for “not everyday” usage:

  • Quick edit: 15 requests/user/month (3 sessions × 5 requests)
  • Standard: 48 requests/user/month (6 sessions × 8 requests)
  • Deep assist: 24 requests/user/month (4 sessions × 6 requests)
  • Image-assisted: 10 requests/user/month (2 sessions × 5 requests)

Notes:

  • Costs below use the token estimates in each profile and include the 5.5% platform fee.
  • Image fees are not included (add model-specific image pricing where applicable).

Quick edit (P=300, C=150, 15 req/user/mo)

ModelEst $/req$/user/mo$/10 users$/100 users$/1000 users
anthropic/claude-3.7-sonnet:thinking0.00330.0500.4984.9849.85
openai/gpt-5.2-codex0.00280.0420.4154.1541.54
anthropic/claude-sonnet-4.50.00330.0500.4984.9849.85
google/gemini-2.5-pro0.00200.0300.2972.9729.67
openai/gpt-5.1-codex-max0.00200.0300.2972.9729.67
deepseek/deepseek-v3.20.00010.00210.0210.2092.09
qwen/qwen3-coder0.00020.00330.0330.3303.30
x-ai/grok-code-fast-10.00030.00450.0450.4514.51
openai/gpt-5-mini0.00040.00590.0590.5935.93
google/gemini-2.5-flash0.00050.00740.0740.7367.36

Standard (P=500, C=250, 48 req/user/mo)

ModelEst $/req$/user/mo$/10 users$/100 users$/1000 users
anthropic/claude-3.7-sonnet:thinking0.00550.2662.6626.59265.86
openai/gpt-5.2-codex0.00460.2222.2222.16221.55
anthropic/claude-sonnet-4.50.00550.2662.6626.59265.86
google/gemini-2.5-pro0.00330.1581.5815.82158.25
openai/gpt-5.1-codex-max0.00330.1581.5815.82158.25
deepseek/deepseek-v3.20.00020.0110.1111.1111.14
qwen/qwen3-coder0.00040.0180.1761.7617.60
x-ai/grok-code-fast-10.00050.0240.2412.4124.05
openai/gpt-5-mini0.00070.0320.3173.1631.65
google/gemini-2.5-flash0.00080.0390.3923.9239.25

Deep assist (P=1200, C=600, 24 req/user/mo)

ModelEst $/req$/user/mo$/10 users$/100 users$/1000 users
anthropic/claude-3.7-sonnet:thinking0.0130.3193.1931.90319.03
openai/gpt-5.2-codex0.0110.2662.6626.59265.86
anthropic/claude-sonnet-4.50.0130.3193.1931.90319.03
google/gemini-2.5-pro0.00790.1901.9018.99189.90
openai/gpt-5.1-codex-max0.00790.1901.9018.99189.90
deepseek/deepseek-v3.20.00060.0130.1341.3413.37
qwen/qwen3-coder0.00090.0210.2112.1121.12
x-ai/grok-code-fast-10.00120.0290.2892.8928.86
openai/gpt-5-mini0.00160.0380.3803.8037.98
google/gemini-2.5-flash0.00200.0470.4714.7147.10

Image-assisted (P=700, C=300, 10 req/user/mo)

ModelEst $/req$/user/mo$/10 users$/100 users$/1000 users
anthropic/claude-3.7-sonnet:thinking0.00700.0700.6966.9669.63
openai/gpt-5.2-codex0.00570.0570.5725.7257.23
anthropic/claude-sonnet-4.50.00700.0700.6966.9669.63
google/gemini-2.5-pro0.00410.0410.4094.0940.88
openai/gpt-5.1-codex-max0.00410.0410.4094.0940.88
deepseek/deepseek-v3.20.00030.00300.0300.3053.05
qwen/qwen3-coder0.00050.00460.0460.4634.63
x-ai/grok-code-fast-10.00060.00620.0620.6226.22
openai/gpt-5-mini0.00080.00820.0820.8188.18
google/gemini-2.5-flash0.00100.0100.1011.0110.13

Default model selection criteria

Choose the default based on:

  • Cost: lowest total cost under typical P/C.
  • CAD/code quality: correctness and minimal hallucinations for CAD DSL/Python.
  • Latency: faster model for interactive feel.
  • Prompt caching support: models with cache support reduce repeated context costs.

Recommendation approach:

  1. Pick 2 finalists: one premium-quality, one cost-efficient.
  2. Run the same 10–20 internal CAD tasks and score quality.
  3. Use cost/quality ratio to pick default.
  4. Keep the higher-quality option as a selectable upgrade in PRO.

Free usage policy (no user key)

Inputs to decide

  • B = monthly budget for free usage (USD)
  • U = estimated free users/month
  • R = average requests per free user (if unlimited)
  • P/C per request from usage profile

Budget-based limit

Compute max free requests:

cost_per_request = total_cost(P, C, model)
max_free_requests = floor(B / cost_per_request)

Convert to a per-user free allowance:

free_requests_per_user = floor(max_free_requests / U)

$20/month PRO reference point

Using the estimates above, the “requests per $20” column provides a first-pass cap if you want $20/month to roughly cover a full PRO user at average usage (500/250 tokens). This is not a recommendation, just a sizing reference.

First-pass protection (before per-user limits)

  • Global cap: overall daily/weekly budget ceiling.
  • Anonymous cap: per-device daily requests and token ceilings.
  • Model fallback: if budget is exhausted, switch to a cheaper model or disable PRO.

Suggested starting points (fill once prices are known)

  • Default model: choose the lowest-cost model that still passes quality checks on CAD tasks.
  • Free usage: start with a small, fixed request count per device/day and adjust once telemetry is available.

Implementation considerations

  • Proxy enforces hard limits and tracks token usage per request.
  • Stores per-request cost and aggregate daily/monthly spend.
  • UI status: “free requests remaining” and “budget exhausted” messages.

Next steps

  1. Pull current pricing from the Models API and fill the table.
  2. Pick baseline P/C for each workflow.
  3. Compute cost per request per model.
  4. Select default model and free usage limits.

Released under the GPL-3.0 License.