Back to all questions

What are the database requirements?

Technical
database
postgresql
storage

MatCraft uses PostgreSQL as its primary data store. Here are the requirements, configuration recommendations, and scaling considerations.

Minimum Requirements

| Requirement | Minimum | Recommended | |——————-|————-|——————-| | PostgreSQL version | 14 | 16 | | Storage | 1 GB | 10 GB+ | | RAM | 256 MB (shared) | 1 GB+ dedicated | | Connections | 20 | 100+ |

Local Development

For local development, the simplest setup is Docker:

bash
docker run -d \
  --name matcraft-db \
  -e POSTGRES_DB=matcraft \
  -e POSTGRES_USER=matcraft \
  -e POSTGRES_PASSWORD=matcraft \
  -p 5432:5432 \
  -v matcraft-pgdata:/var/lib/postgresql/data \
  postgres:16-alpine

Alternatively, the core library can use SQLite for single-user, local-only usage (no web dashboard):

bash
# SQLite mode (core library only, no server)
export MATCRAFT_DATABASE_URL="sqlite:///./matcraft.db"
materia campaign run --config my_material.yaml

Connection String

Configure the database URL in your .env file:

DATABASE_URL=postgresql://user:password@host:5432/matcraft

For connection pooling in production, use PgBouncer or your cloud provider's built-in connection pooler:

DATABASE_URL=postgresql://user:password@pgbouncer:6432/matcraft?pgbouncer=true

Schema Management

MatCraft uses Alembic for database migrations:

bash
# Apply all pending migrations
materia db upgrade

# Check current migration version
materia db current

# Generate a new migration after model changes (development)
materia db revision --autogenerate -m "add campaign tags"

Key Tables

| Table | Purpose | Growth Rate | |———-|————-|——————-| | materials | Material definitions | Slow (10s-100s) | | campaigns | Campaign metadata | Slow (10s-100s) | | iterations | Per-iteration state | Moderate (100s-1000s) | | candidates | Proposed compositions | Fast (1000s-10000s) | | measurements | Evaluation results | Fast (1000s-10000s) | | surrogate_checkpoints | Serialized model weights | Moderate (MBs per checkpoint) |

Production Recommendations

  • Backups: Enable automated daily backups with point-in-time recovery. For AWS RDS, this is enabled by default.
  • Monitoring: Track connection pool usage, query latency (p95), and disk usage. Alert when connections exceed 80% of the pool or disk usage exceeds 80%.
  • Indexing: MatCraft creates necessary indexes during migration. For very large deployments (>100k candidates), consider adding partial indexes on the candidates table for active campaigns.
  • Vacuuming: Ensure autovacuum is enabled (it is by default in PostgreSQL). The candidates table sees frequent inserts and benefits from regular vacuuming.

Managed Database Services

Tested and recommended managed PostgreSQL services:

  • AWS RDS: Use db.t3.medium for small teams, db.r6g.large for production.
  • Google Cloud SQL: Standard tier with 1+ vCPU.
  • Azure Database for PostgreSQL: Flexible Server, General Purpose tier.
  • Supabase: Works well for small deployments.

Related Questions