Back to all questions

How do I scale MatCraft for large workloads?

Technical
scaling
performance
kubernetes

MatCraft is designed to scale horizontally. Here is how to handle increasing numbers of users, campaigns, and evaluations.

Scaling the API Layer

The FastAPI backend is stateless — all session state is stored in the database and Redis. To handle more concurrent users:

bash
# Scale API instances behind a load balancer
docker compose up -d --scale api=4

Each API instance can handle approximately 500 concurrent connections (including WebSocket). For 100 concurrent dashboard users, 2-3 instances are sufficient.

Key considerations:

  • Use sticky sessions (session affinity) at the load balancer for WebSocket connections.
  • All instances share the same database and Redis, so no state synchronization is needed.

Scaling Workers

Workers are the primary bottleneck for campaign throughput. Each worker process handles one task at a time:

bash
# Scale horizontally (more worker containers)
docker compose up -d --scale worker=8

# Scale vertically (more processes per worker)
celery -A materia.tasks worker --concurrency=8

Rules of thumb:

  • Each campaign iteration requires 1 worker process for 2-30 seconds.
  • For N concurrent campaigns running simultaneously, you need approximately N/2 worker processes (since campaigns spend time waiting between iterations).
  • CPU-bound tasks (CMA-ES, surrogate training): Set concurrency equal to the number of CPU cores.
  • GPU-bound tasks (large surrogate training): Set concurrency to 1 per GPU.

Priority Queues

For mixed workloads, use priority queues to ensure interactive requests are not blocked by batch jobs:

python
# In task definitions
@app.task(queue="high_priority")
def train_surrogate(campaign_id: str):
    ...

@app.task(queue="low_priority")
def export_results(campaign_id: str):
    ...
bash
# Dedicated workers per queue
celery -A materia.tasks worker --queues=high_priority --concurrency=4
celery -A materia.tasks worker --queues=low_priority --concurrency=2

Scaling the Database

PostgreSQL is typically the last bottleneck. Strategies for scaling:

  1. Connection pooling: Use PgBouncer between the application and PostgreSQL. This reduces the number of direct database connections from (API instances worker processes connections per process) to a manageable pool.
  1. Read replicas: Route read-heavy dashboard queries (listing campaigns, viewing results) to a read replica. Write operations (creating candidates, storing measurements) go to the primary.
  1. Partitioning: For very large deployments (>1M candidates), partition the candidates table by campaign ID:
sql
CREATE TABLE candidates (
    id UUID PRIMARY KEY,
    campaign_id UUID NOT NULL,
    ...
) PARTITION BY HASH (campaign_id);

Auto-Scaling

On Kubernetes or ECS, configure horizontal pod auto-scaling based on:

  • API pods: Scale on CPU utilization (target: 60%) or request rate.
  • Worker pods: Scale on Redis queue depth (target: <10 pending tasks per worker).
yaml
# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: matcraft-worker
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: matcraft-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: redis_queue_depth
        target:
          type: AverageValue
          averageValue: "10"

Related Questions