Jarvis SDK uses a hybrid rate limiting system (in-memory + database-backed) to ensure fair usage and platform stability. Limits scale with your plan.
| Plan | Requests/min | Executions/mo | Concurrent | Overage |
|---|---|---|---|---|
| Free | 60 | 1,000 | 5 | Paused until next month |
| Pro $29/mo | 300 | 50,000 | 25 | $0.001/execution |
| Business $299/mo | 1,000 | 500,000 | 100 | $0.001/execution |
| Enterprise Custom | 5,000 | Unlimited | 500 | Custom pricing |
Rate limits reset every 60 seconds (sliding window). Execution quotas reset at the start of each billing cycle.
Every API response includes these headers so your agent can self-throttle:
| Header | Type | Description |
|---|---|---|
| X-RateLimit-Limit | number | Maximum requests allowed per window |
| X-RateLimit-Remaining | number | Requests remaining in current window |
| X-RateLimit-Reset | unix timestamp | When the current window resets |
| X-Execution-Remaining | number | Monthly execution quota remaining |
| Retry-After | seconds | Seconds to wait (only on 429 responses) |
Example response headers
HTTP/1.1 200 OK X-RateLimit-Limit: 300 X-RateLimit-Remaining: 287 X-RateLimit-Reset: 1711929600 X-Execution-Remaining: 49823 X-Request-Id: req_abc123
When rate limited, the API returns 429 Too Many Requests:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Retry after 12 seconds.",
"retry_after": 12,
"limit": 300,
"reset_at": "2026-03-20T15:31:00Z"
}Important for agents: Always read the Retry-After header and wait that many seconds before retrying. Do not implement fixed-delay retries — the header value reflects actual window state.
These endpoints do not count toward your rate limit or execution quota. Use them freely for discovery and monitoring:
| Endpoint | Purpose |
|---|---|
| GET /api/v1/health | Platform health check |
| GET /api/v1/modules | Browse module catalog |
| GET /api/v1/catalog/search | Search modules by keyword |
| GET /api/v1/directory | Browse full 1,600+ entry directory |
| GET /.well-known/agent.json | A2A Agent Card |
| GET /api/llms.txt | LLM-friendly documentation |
| GET /openapi.json | OpenAPI 3.1 specification |
| GET /api/v1/trust/{name} | Module trust scores |
Batch execution — POST /api/v1/batch runs up to 10 modules in parallel but counts as a single rate-limit hit and one execution per module.
Chain execution — POST /api/v1/chain pipes output through multiple modules in a single request. One rate-limit hit, one execution per step.
Agent memory — POST /api/v1/memory stores frequently-used results. Read from memory instead of re-executing modules.
Self-throttle with headers — Read X-RateLimit-Remaining and slow down when approaching zero. Proactive throttling avoids 429s entirely.
Use the arm endpoint — Instead of discovering modules one-by-one, use POST /api/v1/agent/arm to get your entire toolkit in a single call.
TypeScript — Exponential Backoff
async function executeWithRetry(
module: string,
action: string,
input: Record<string, unknown>,
maxRetries = 3
) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await fetch(
`https://jarvissdk.com/api/v1/modules/${module}/execute`,
{
method: "POST",
headers: {
"x-api-key": "jsk_your_key",
"Content-Type": "application/json",
},
body: JSON.stringify({ action, input }),
}
);
if (res.status !== 429) return res.json();
const retryAfter = parseInt(res.headers.get("Retry-After") ?? "5");
console.log(`Rate limited. Waiting ${retryAfter}s (attempt ${attempt + 1})`);
await new Promise((r) => setTimeout(r, retryAfter * 1000));
}
throw new Error("Max retries exceeded");
}Python — Exponential Backoff
import time, requests
def execute_with_retry(module: str, action: str, input: dict, max_retries=3):
url = f"https://jarvissdk.com/api/v1/modules/{module}/execute"
headers = {"x-api-key": "jsk_your_key"}
for attempt in range(max_retries + 1):
resp = requests.post(url, headers=headers,
json={"action": action, "input": input})
if resp.status_code != 429:
return resp.json()
retry_after = int(resp.headers.get("Retry-After", 5))
print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
time.sleep(retry_after)
raise Exception("Max retries exceeded")| Symptom | Cause | Fix |
|---|---|---|
| Getting 429 on every request | Rate limit window exhausted or API key on Free plan | Check X-RateLimit-Remaining header. Upgrade plan or wait for window reset. |
| Execution quota exceeded mid-month | Monthly execution limit hit (Free: 1K, Pro: 50K) | Upgrade plan or enable overage billing on Pro/Business ($0.001/exec). |
| Batch requests still rate limited | Batch counts as 1 rate-limit hit but N executions against quota | Batch reduces rate-limit pressure, not execution quota. Monitor both. |
| Different rate limits than documented | Enterprise plans have custom limits set during onboarding | Check X-RateLimit-Limit header for your actual limit. |
| Rate limited despite low usage | Multiple API keys sharing the same tenant rate limit pool | All keys under one tenant share the same pool. Check total tenant usage. |
Do MCP protocol requests count against rate limits?
Yes. MCP JSON-RPC calls to POST /api/mcp count as one request per JSON-RPC method call. Tool executions within MCP also count against your execution quota.
Can I get higher rate limits without Enterprise?
Business plan (1,000 req/min) handles most production workloads. Contact sales for custom limits between Business and Enterprise.
What counts as an 'execution'?
Each module action execution counts as one execution. Catalog browsing, search, health checks, and discovery endpoints are free and unlimited.
Are rate limits per API key or per tenant?
Per tenant. All API keys under the same tenant share one rate limit pool. This prevents circumventing limits by creating multiple keys.