API Rate Limits & Best Practices

Understand rate limits, handle transient errors gracefully, and optimize your API usage for reliability and performance.

Introduction

Rate limits protect the platform and ensure fair usage. They are expressed as requests per time window and may vary by endpoint, API key type, or account tier. This page explains common limit headers, how to detect and react to throttling, and practical strategies to reduce request volume.

Standard rate-limit headers

Many APIs expose quota information in response headers. Typical headers you may see:

  • X-RateLimit-Limit — total allowed requests in the window
  • X-RateLimit-Remaining — requests left in the current window
  • X-RateLimit-Reset — epoch seconds when the window resets
  • Retry-After — seconds to wait before retrying after a 429
Example headers
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1612540800
Retry-After: 30
Detecting rate limits & transient errors

The server will return HTTP 429 for rate limiting. For transient issues, you may see 502/503/504. Always inspect response headers and bodies to guide retry decisions.

Recommended checks:
  • On 429, read Retry-After (if provided) or compute wait from X-RateLimit-Reset.
  • On 5xx, treat as transient and consider a retry with exponential backoff and jitter.
  • Log and alert on repeated 429/5xx to spot spikes or bad client behavior.
Retry strategies (safe defaults)

Use exponential backoff with jitter to spread retries and avoid thundering-herd problems. Example pattern:

// Pseudo: exponential backoff with full jitter
base = 500ms
max = 60000ms
attempt = 0

while attempt < maxRetries:
  wait = random_between(0, min(max, base * 2^attempt))
  sleep(wait)
  attempt += 1
  try request() -> if success: break

For 429 responses prefer honoring Retry-After if present. For requests that must not be repeated (non-idempotent), avoid automatic retries — instead bubble the error to the caller.

Idempotency & safe retries

For endpoints that create or modify resources, support idempotency keys so clients can safely retry without creating duplicates. Use a stable idempotency key per logical operation.

# Example: create order with idempotency
POST /v1/orders
Idempotency-Key: order_2026-02-01_host123

If the client retries the request with the same Idempotency-Key, the server returns the original result.
Client-side best practices to reduce requests
  • Batch requests where the API supports bulk endpoints rather than many small requests.
  • Cache responses (with TTL) for read-heavy endpoints to reduce repeated calls.
  • Use conditional requests (ETags / If-None-Match) when retrieving resources that change infrequently.
  • Debounce UI actions that trigger network calls (search/autocomplete) and only send final queries.
  • Use webhooks or server-to-server events instead of polling where available.
Server-side patterns for resiliency
  • Implement request queuing and smoothing: accept requests and process at a controlled rate.
  • Use circuit breakers to fail fast and avoid wasting downstream capacity when the API is degraded.
  • Apply client-side rate limiting throttles to enforce fair per-user quota usage.
  • Expose metrics: request rate, 429 count, 5xx count, retry count — alert on abnormal increases.
Handling long-running operations

For operations that take long to complete, prefer asynchronous patterns: submit a job, poll a status endpoint at a sensible interval, or receive a webhook when the job completes. This avoids holding blocking requests and consuming quota.

Monitoring & alerting

Track these signals to discover and remediate rate-limit related issues quickly:

  • 429 rate and trend
  • Retry counts and exponential backoff saturation
  • Request latency and p95/p99 tail latencies
  • Error budget consumption for critical endpoints
Code examples
JavaScript (axios) — simple retry with Retry-After handling
import axios from 'axios';

async function requestWithRetry(config, maxRetries = 5) {
  let attempt = 0;
  while (true) {
    try {
      const res = await axios(config);
      return res.data;
    } catch (err) {
      attempt++;
      const status = err.response?.status;
      const retryAfter = parseInt(err.response?.headers?.['retry-after'] || '0', 10);

      // Don't retry non-transient errors or if exceeded attempts
      if (![429, 502, 503, 504].includes(status) || attempt > maxRetries) {
        throw err;
      }

      const wait = retryAfter > 0 ? retryAfter * 1000 : Math.min(60000, 500 * 2 ** attempt);
      await new Promise((r) => setTimeout(r, wait));
    }
  }
}
cURL — respect Retry-After (manual)
# On 429 response, read Retry-After header and wait that many seconds before retrying
curl -X POST https://api.runash.in/v1/some-endpoint -H "Authorization: Bearer sk_..."
Governance & quota planning

Plan quotas for heavy usage: request higher limits for trusted workflows, use separate keys for high-volume systems, and partition workloads by priority. Discuss enterprise quotas with the RunAsh team if you anticipate sustained high-throughput.

Summary: honor server headers, prefer idempotency for retry-safe operations, use exponential backoff with jitter, batch and cache where possible, and monitor key metrics. If you'd like, we can review your client behavior and suggest optimizations.

Was this page helpful?

Your feedback helps us improve RunAsh docs.