Alloovium

Core Concepts

Rate Limits

Alloovium runs a per-credential token bucket. Each capability has a cost in tokens; each tier has a fill rate and a maximum burst capacity. Stay within your bucket and you'll never see a 429.

How the bucket works

Every API key and OAuth credential has its own token bucket. Tokens refill continuously at the tier's per-minute rate, up to the tier's burst capacity. When a request arrives, the capability's cost is deducted from the bucket. If the bucket does not have enough tokens, the request is rejected with code and must be retried later.

Token-bucket, not fixed window

A fixed-window limit of 60/min would allow 120 calls at the edge of a window. A token bucket with rate 60 and burst 120 smooths this out: you can burst to 120 instantly, but sustained traffic averages to 60/minute.

Tiers

TierTokens / minuteBurst capacityTypical use
free1020Hobby projects and prototyping
standard60120Most SaaS integrations
pro300600High-volume automations and ETL
enterprise15003000Platform integrations, custom SLAs

The tier is set on each API key at creation time and can be upgraded from the dashboard. OAuth credentials inherit the tier of the user's tenant plan.

Capability costs

Cheap reads cost 1 token. Searches and structured queries cost 5. Chat and expensive LLM calls cost 10. Workflow runs and template fills cost 25. Use this table to budget:

CapabilityCost
meta.whoami1
vault.list_projects1
vault.get_project1
vault.create_project2
vault.list_documents1
vault.get_document1
vault.upload_document10
vault.search5
chat.list_conversations1
chat.get_conversation1
chat.ask10
chat.ask_stream10
templates.start_fill25
templates.get_fill_status1
workflows.list1
workflows.run25
workflows.get_run_status1

Response headers

Every response — success or rate-limited — carries these headers:

HeaderMeaning
X-RateLimit-LimitBurst capacity of your bucket
X-RateLimit-RemainingTokens left after this call was accounted for
X-RateLimit-ResetUnix timestamp when the bucket is expected to be full
X-RateLimit-CostTokens this particular call deducted
Retry-AfterSeconds until you can retry (only on 429)

Watch header proactively and back off before you hit zero — you will get cleaner behavior than reacting to 429s.

Hitting the limit

When you run out of tokens the API returns code with this envelope:

json
{ "type": "https://api.alloovium.com/errors/rate_limited", "title": "Rate limit exceeded", "status": 429, "detail": "Rate limit exceeded. Retry in 12 seconds.", "code": "rate_limited", "retry_after_seconds": 12 }

The header header carries the same value. Respect it. Do not retry immediately in a tight loop — it will just extend the backoff and waste your quota.

Retry strategy

The recommended strategy for any client:

  1. On 429, sleep for header seconds and retry.
  2. On 5xx, retry with exponential backoff — start at 1s, double up to 30s, add 10% jitter. Cap at five attempts.
  3. On every retry, keep the same header so the API can replay the original response instead of executing twice. See Idempotency.
  4. Give up after repeated 4xx non-429 errors — those are not transient.

Example retry loop (pseudocode)

python
import time, httpx def call_with_retry(client, method, url, *, body=None, idem_key=None, max_retries=5): headers = {"Authorization": f"Bearer {API_KEY}"} if idem_key: headers["Idempotency-Key"] = idem_key for attempt in range(max_retries): resp = client.request(method, url, json=body, headers=headers) if resp.status_code < 400: return resp.json() if resp.status_code == 429: time.sleep(int(resp.headers.get("Retry-After", "1"))) continue if 500 <= resp.status_code < 600: time.sleep(min(30, 2 ** attempt) + (0.1 * attempt)) continue # 4xx other than 429 — do not retry resp.raise_for_status() raise RuntimeError("max retries exhausted")

Quota engineering tips

  • Batch with search (cost 5) instead of multiple get calls (cost 1 each) when you need content.
  • For workflows, poll status every 2–5 seconds — do not tight-loop.
  • If you're fan-out ingesting, run upload calls concurrently up to formula, then rest until the bucket refills.
  • Use whoami (cost 1) as a health probe — not the more expensive endpoints.