Performance Optimization

Learn how to optimize your API usage for better performance, lower latency, and reduced costs.

Performance Tips

These optimization techniques will help you build faster, more efficient applications while reducing API costs.

Use max_age_days to Filter Stale Data

Filtering out stale memories reduces response size and improves relevance. Only request data that's actually useful for your use case.

Using max_age_days

# Good: Only get recent preferences (last 30 days)
result = client.read_memory(
    user_id="user123",
    scope="preferences",
    domain="food",
    purpose="generate food recommendations",
    max_age_days=30  # Filter stale data
)

# Bad: Getting all preferences (including very old ones)
result = client.read_memory(
    user_id="user123",
    scope="preferences",
    domain="food",
    purpose="generate food recommendations"
    # No max_age_days - gets everything
)

Cache Responses Appropriately

Cache API responses to reduce the number of requests, but be mindful of revocation tokens and data freshness.

Response Caching

import time
from functools import lru_cache

# Cache with TTL
cache = {}
CACHE_TTL = 300  # 5 minutes

def get_cached_memory(user_id, scope, domain, purpose):
    cache_key = f"{user_id}:{scope}:{domain}:{purpose}"
    
    # Check cache
    if cache_key in cache:
        cached_data, timestamp = cache[cache_key]
        if time.time() - timestamp < CACHE_TTL:
            return cached_data
    
    # Cache miss - fetch from API
    result = client.read_memory(
        user_id=user_id,
        scope=scope,
        domain=domain,
        purpose=purpose
    )
    
    # Store in cache
    cache[cache_key] = (result, time.time())
    return result

Important: If a user revokes access, invalidate the cache for that user's data. Don't serve stale cached data after revocation.

Batch Operations When Possible

When you need to read multiple scopes or domains, consider batching operations or using continue operations to reduce API calls.

Batching Operations

# Good: Read once, use continue for related reads
result = client.read_memory(
    user_id="user123",
    scope="preferences",
    domain="food",
    purpose="generate recommendations"
)

# Later, continue reading with same token (no new grant)
food_prefs = client.read_memory_continue(
    revocation_token=result.revocation_token
)

# Bad: Making separate read requests
food_prefs = client.read_memory(user_id="user123", scope="preferences", domain="food", ...)
music_prefs = client.read_memory(user_id="user123", scope="preferences", domain="music", ...)
# Each creates a new grant - less efficient

Use Appropriate Scopes and Domains

Using the right scope and domain helps the API optimize merging and reduces unnecessary data processing.

Scope and Domain Selection

# Good: Specific scope and domain
client.create_memory(
    user_id="user123",
    scope="preferences",  # Correct scope
    domain="food",  # Specific domain
    value_json={"likes": ["pizza"]}
)

# Bad: Too generic
client.create_memory(
    user_id="user123",
    scope="preferences",
    domain=None,  # No domain - less organized
    value_json={"food_likes": ["pizza"], "music_likes": ["jazz"]}  # Mixed data
)

Optimize Memory Creation

Create memories efficiently by using appropriate TTL values and avoiding unnecessary duplicates.

Use TTL for temporary data: Set ttl_days for data that should expire automatically
Don't worry about duplicates: The API handles deduplication automatically
Batch creates when possible: Create multiple memories in parallel if your use case allows

Memory Creation Optimization

# Good: Set TTL for temporary preferences
client.create_memory(
    user_id="user123",
    scope="preferences",
    domain="food",
    source="explicit_user_input",
    ttl_days=7,  # Expires in 7 days
    value_json={"likes": ["seasonal_special"]}
)

# The API handles deduplication, so you can create multiple memories
# They'll be merged automatically when read

Monitor and Optimize

Monitor your API usage to identify optimization opportunities:

Track API call frequency and patterns
Monitor response times
Identify unnecessary API calls
Review cache hit rates
Check rate limit usage

Monitoring API Calls

import time
import logging

logger = logging.getLogger(__name__)

def read_with_monitoring(user_id, scope, domain, purpose):
    start_time = time.time()
    
    try:
        result = client.read_memory(
            user_id=user_id,
            scope=scope,
            domain=domain,
            purpose=purpose
        )
        
        elapsed = time.time() - start_time
        logger.info(f"API call took {elapsed:.2f}s")
        
        return result
    except Exception as e:
        elapsed = time.time() - start_time
        logger.error(f"API call failed after {elapsed:.2f}s: {e}")
        raise

Best Practices Summary

Use max_age_days: Filter stale data to improve performance and relevance
Cache responses: Reduce API calls by caching with appropriate TTL
Use continue operations: Re-read data without creating new grants
Choose right scopes/domains: Better organization improves API efficiency
Set TTL appropriately: Let temporary data expire automatically
Monitor usage: Track API calls to identify optimization opportunities
Avoid unnecessary calls: Don't read data you don't need
Handle rate limits: Implement exponential backoff to avoid hitting limits