Rate Limits

Learn about rate limits, how they work, and best practices for handling rate limit responses in your application.

Overview

Rate limits help ensure fair usage of the API and protect against abuse. The Scoped Memory API implements rate limiting to maintain service quality for all users.

Rate Limit Headers

Every API response includes rate limit information in the response headers:

Rate Limit Headers
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 999 X-RateLimit-Reset: 1642680000
X-RateLimit-Limit - The maximum number of requests allowed in the current time window
X-RateLimit-Remaining - The number of requests remaining in the current time window
X-RateLimit-Reset - Unix timestamp when the rate limit window resets
Rate Limit Exceeded

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

Rate Limit Error Response
{ "detail": "Rate limit exceeded. Please try again later." }

The response will also include the rate limit headers showing when the limit will reset.

Handling Rate Limits

Implement exponential backoff when you receive a 429 response. Here's how to handle rate limits gracefully:

Rate Limit Handling with Retry
import time from memory_scope import MemoryScopeClient from memory_scope.exceptions import RateLimitError client = MemoryScopeClient(api_key="your-api-key") def make_request_with_retry(max_retries=3): for attempt in range(max_retries): try: result = client.read_memory( user_id="user123", scope="preferences", domain="food", purpose="generate food recommendations" ) return result except RateLimitError as e: if attempt < max_retries - 1: # Exponential backoff: wait 2^attempt seconds wait_time = 2 ** attempt print(f"Rate limit exceeded. Waiting {wait_time} seconds...") time.sleep(wait_time) else: # Max retries reached raise Exception("Rate limit exceeded. Max retries reached.") except Exception as e: # Handle other errors raise e
Monitoring Rate Limits

Monitor rate limit headers to track your usage and avoid hitting limits:

Monitoring Rate Limits
import requests response = requests.post( "https://api.memoryscope.dev/memory/read", headers={ "X-API-Key": "your-api-key", "Content-Type": "application/json" }, json={...} ) # Check rate limit headers rate_limit = response.headers.get("X-RateLimit-Limit") remaining = response.headers.get("X-RateLimit-Remaining") reset_time = response.headers.get("X-RateLimit-Reset") print(f"Rate limit: {rate_limit}") print(f"Remaining: {remaining}") print(f"Resets at: {reset_time}") # Alert when approaching limit if int(remaining) < 100: print("Warning: Approaching rate limit!")
Best Practices
  • Implement Exponential Backoff: When you receive a 429 response, wait before retrying. Use exponential backoff to avoid overwhelming the API.
  • Monitor Rate Limit Headers: Check rate limit headers in responses to track your usage and avoid hitting limits.
  • Cache Responses: Cache API responses when appropriate to reduce the number of requests you make.
  • Batch Operations: When possible, batch multiple operations into fewer requests.
  • Use Webhooks: If available, use webhooks instead of polling to reduce API calls.
  • Plan for Limits: Design your application to handle rate limits gracefully. Don't assume unlimited requests.
  • Upgrade Plan if Needed: If you consistently hit rate limits, consider upgrading your plan for higher limits.
  • Handle Gracefully: Always handle 429 responses gracefully. Don't retry immediately without waiting.