API Rate Limits & Best Practices
Understand rate limits, handle transient errors gracefully, and optimize your API usage for reliability and performance.
Rate limits protect the platform and ensure fair usage. They are expressed as requests per time window and may vary by endpoint, API key type, or account tier. This page explains common limit headers, how to detect and react to throttling, and practical strategies to reduce request volume.
Many APIs expose quota information in response headers. Typical headers you may see:
X-RateLimit-Limit— total allowed requests in the windowX-RateLimit-Remaining— requests left in the current windowX-RateLimit-Reset— epoch seconds when the window resetsRetry-After— seconds to wait before retrying after a 429
HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 120 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1612540800 Retry-After: 30
The server will return HTTP 429 for rate limiting. For transient issues, you may see 502/503/504. Always inspect response headers and bodies to guide retry decisions.
- On 429, read
Retry-After(if provided) or compute wait fromX-RateLimit-Reset. - On 5xx, treat as transient and consider a retry with exponential backoff and jitter.
- Log and alert on repeated 429/5xx to spot spikes or bad client behavior.
Use exponential backoff with jitter to spread retries and avoid thundering-herd problems. Example pattern:
// Pseudo: exponential backoff with full jitter base = 500ms max = 60000ms attempt = 0 while attempt < maxRetries: wait = random_between(0, min(max, base * 2^attempt)) sleep(wait) attempt += 1 try request() -> if success: break
For 429 responses prefer honoring Retry-After if present. For requests that must not be repeated (non-idempotent), avoid automatic retries — instead bubble the error to the caller.
For endpoints that create or modify resources, support idempotency keys so clients can safely retry without creating duplicates. Use a stable idempotency key per logical operation.
# Example: create order with idempotency POST /v1/orders Idempotency-Key: order_2026-02-01_host123 If the client retries the request with the same Idempotency-Key, the server returns the original result.
- Batch requests where the API supports bulk endpoints rather than many small requests.
- Cache responses (with TTL) for read-heavy endpoints to reduce repeated calls.
- Use conditional requests (ETags / If-None-Match) when retrieving resources that change infrequently.
- Debounce UI actions that trigger network calls (search/autocomplete) and only send final queries.
- Use webhooks or server-to-server events instead of polling where available.
- Implement request queuing and smoothing: accept requests and process at a controlled rate.
- Use circuit breakers to fail fast and avoid wasting downstream capacity when the API is degraded.
- Apply client-side rate limiting throttles to enforce fair per-user quota usage.
- Expose metrics: request rate, 429 count, 5xx count, retry count — alert on abnormal increases.
For operations that take long to complete, prefer asynchronous patterns: submit a job, poll a status endpoint at a sensible interval, or receive a webhook when the job completes. This avoids holding blocking requests and consuming quota.
Track these signals to discover and remediate rate-limit related issues quickly:
- 429 rate and trend
- Retry counts and exponential backoff saturation
- Request latency and p95/p99 tail latencies
- Error budget consumption for critical endpoints
import axios from 'axios';
async function requestWithRetry(config, maxRetries = 5) {
let attempt = 0;
while (true) {
try {
const res = await axios(config);
return res.data;
} catch (err) {
attempt++;
const status = err.response?.status;
const retryAfter = parseInt(err.response?.headers?.['retry-after'] || '0', 10);
// Don't retry non-transient errors or if exceeded attempts
if (![429, 502, 503, 504].includes(status) || attempt > maxRetries) {
throw err;
}
const wait = retryAfter > 0 ? retryAfter * 1000 : Math.min(60000, 500 * 2 ** attempt);
await new Promise((r) => setTimeout(r, wait));
}
}
}# On 429 response, read Retry-After header and wait that many seconds before retrying curl -X POST https://api.runash.in/v1/some-endpoint -H "Authorization: Bearer sk_..."
Plan quotas for heavy usage: request higher limits for trusted workflows, use separate keys for high-volume systems, and partition workloads by priority. Discuss enterprise quotas with the RunAsh team if you anticipate sustained high-throughput.
Was this page helpful?
Your feedback helps us improve RunAsh docs.