Core Concepts
Rate Limits
Alloovium runs a per-credential token bucket. Each capability has a cost in tokens; each tier has a fill rate and a maximum burst capacity. Stay within your bucket and you'll never see a 429.
How the bucket works
Every API key and OAuth credential has its own token bucket. Tokens refill continuously at the tier's per-minute rate, up to the tier's burst capacity. When a request arrives, the capability's cost is deducted from the bucket. If the bucket does not have enough tokens, the request is rejected with code and must be retried later.
Token-bucket, not fixed window
Tiers
| Tier | Tokens / minute | Burst capacity | Typical use |
|---|---|---|---|
| free | 10 | 20 | Hobby projects and prototyping |
| standard | 60 | 120 | Most SaaS integrations |
| pro | 300 | 600 | High-volume automations and ETL |
| enterprise | 1500 | 3000 | Platform integrations, custom SLAs |
The tier is set on each API key at creation time and can be upgraded from the dashboard. OAuth credentials inherit the tier of the user's tenant plan.
Capability costs
Cheap reads cost 1 token. Searches and structured queries cost 5. Chat and expensive LLM calls cost 10. Workflow runs and template fills cost 25. Use this table to budget:
| Capability | Cost |
|---|---|
| meta.whoami | 1 |
| vault.list_projects | 1 |
| vault.get_project | 1 |
| vault.create_project | 2 |
| vault.list_documents | 1 |
| vault.get_document | 1 |
| vault.upload_document | 10 |
| vault.search | 5 |
| chat.list_conversations | 1 |
| chat.get_conversation | 1 |
| chat.ask | 10 |
| chat.ask_stream | 10 |
| templates.start_fill | 25 |
| templates.get_fill_status | 1 |
| workflows.list | 1 |
| workflows.run | 25 |
| workflows.get_run_status | 1 |
Response headers
Every response — success or rate-limited — carries these headers:
| Header | Meaning |
|---|---|
| X-RateLimit-Limit | Burst capacity of your bucket |
| X-RateLimit-Remaining | Tokens left after this call was accounted for |
| X-RateLimit-Reset | Unix timestamp when the bucket is expected to be full |
| X-RateLimit-Cost | Tokens this particular call deducted |
| Retry-After | Seconds until you can retry (only on 429) |
Watch header proactively and back off before you hit zero — you will get cleaner behavior than reacting to 429s.
Hitting the limit
When you run out of tokens the API returns code with this envelope:
json{ "type": "https://api.alloovium.com/errors/rate_limited", "title": "Rate limit exceeded", "status": 429, "detail": "Rate limit exceeded. Retry in 12 seconds.", "code": "rate_limited", "retry_after_seconds": 12 }
The header header carries the same value. Respect it. Do not retry immediately in a tight loop — it will just extend the backoff and waste your quota.
Retry strategy
The recommended strategy for any client:
- On 429, sleep for
headerseconds and retry. - On 5xx, retry with exponential backoff — start at 1s, double up to 30s, add 10% jitter. Cap at five attempts.
- On every retry, keep the same
headerso the API can replay the original response instead of executing twice. See Idempotency. - Give up after repeated 4xx non-429 errors — those are not transient.
Example retry loop (pseudocode)
pythonimport time, httpx def call_with_retry(client, method, url, *, body=None, idem_key=None, max_retries=5): headers = {"Authorization": f"Bearer {API_KEY}"} if idem_key: headers["Idempotency-Key"] = idem_key for attempt in range(max_retries): resp = client.request(method, url, json=body, headers=headers) if resp.status_code < 400: return resp.json() if resp.status_code == 429: time.sleep(int(resp.headers.get("Retry-After", "1"))) continue if 500 <= resp.status_code < 600: time.sleep(min(30, 2 ** attempt) + (0.1 * attempt)) continue # 4xx other than 429 — do not retry resp.raise_for_status() raise RuntimeError("max retries exhausted")
Quota engineering tips
- Batch with
search(cost 5) instead of multiplegetcalls (cost 1 each) when you need content. - For workflows, poll
statusevery 2–5 seconds — do not tight-loop. - If you're fan-out ingesting, run
uploadcalls concurrently up toformula, then rest until the bucket refills. - Use
whoami(cost 1) as a health probe — not the more expensive endpoints.