How does the Celery task queue work?

Question

Accepted Answer

MatCraft uses Celery with Redis as the message broker to run optimization tasks asynchronously. This architecture keeps the API responsive while computationally intensive surrogate training and optimization run in the background.

Why Celery?

Materials optimization involves CPU-intensive tasks that can take seconds to minutes per iteration:

Surrogate training: Training an MLP on hundreds to thousands of data points (1-30 seconds depending on architecture and dataset size).
CMA-ES optimization: Running hundreds of generations to optimize the acquisition function (1-10 seconds).
Evaluation: Running physics models or external simulations (seconds to hours).

Running these in the API request-response cycle would cause timeouts. Celery offloads them to dedicated worker processes.

Task Flow

API Server                    Redis                     Celery Worker
    │                           │                           │
    │  POST /campaigns/run      │                           │
    │──────────────────────────>│                           │
    │  task_id = "abc-123"      │                           │
    │<──────────────────────────│                           │
    │                           │  pick up task             │
    │                           │──────────────────────────>│
    │                           │                           │ Train surrogate
    │                           │  publish progress event   │
    │                           │<──────────────────────────│
    │  WebSocket push to client │                           │
    │<──────────────────────────│                           │
    │                           │                           │ Run CMA-ES
    │                           │  publish progress event   │
    │                           │<──────────────────────────│
    │  WebSocket push to client │                           │
    │<──────────────────────────│                           │
    │                           │  task complete            │
    │                           │<──────────────────────────│

Configuration

Worker configuration in your .env or Celery config:

python

# celery_config.py
broker_url = "redis://localhost:6379/0"
result_backend = "redis://localhost:6379/1"
task_serializer = "json"
result_serializer = "json"
accept_content = ["json"]
task_time_limit = 3600        # Hard limit: 1 hour per task
task_soft_time_limit = 3000   # Soft limit: 50 minutes (raises exception)
worker_prefetch_multiplier = 1  # One task at a time per worker process
worker_concurrency = 4         # 4 parallel worker processes

Running Workers

bash

# Single worker with 4 processes
celery -A materia.tasks worker --loglevel=info --concurrency=4

# GPU worker for surrogate training (single process, GPU-bound)
celery -A materia.tasks worker --loglevel=info --concurrency=1 \
  --queues=gpu_training -n gpu-worker@%h

# Priority queues
celery -A materia.tasks worker --loglevel=info \
  --queues=high_priority,default,low_priority

Task Types

| Task | Queue | Typical Duration | Priority | |———|———-|————————-|—————| | train_surrogate | default (or gpu_training) | 1-30s | High | | run_cmaes_optimization | default | 1-10s | High | | evaluate_candidates | default | 0.1s - 1h | Medium | | compute_pareto_front | default | 0.1-5s | Low | | export_results | low_priority | 1-60s | Low | | cleanup_old_checkpoints | low_priority | 1-10s | Low |

Monitoring

Use Flower for a web-based Celery monitoring dashboard:

bash

pip install flower
celery -A materia.tasks flower --port=5555

This provides real-time visibility into task queues, worker status, task history, and performance metrics at http://localhost:5555.

Scaling Workers

Add more workers to handle higher campaign throughput:

bash

# Run 3 worker instances
docker compose up -d --scale worker=3

Each worker independently pulls tasks from Redis, so scaling is straightforward. The only shared state is the PostgreSQL database.

How does the Celery task queue work?

Why Celery?

Task Flow

Configuration

Running Workers

Task Types

Monitoring

Scaling Workers

Related Questions

What is MatCraft's system architecture?

How does real-time campaign monitoring work?