Back to all questions

What are the best practices for cloud deployment?

Technical
deployment
cloud
production
security

Deploying MatCraft to production requires attention to security, reliability, and performance. Here is a comprehensive guide covering the most important considerations.

Infrastructure Sizing

Small Team (1-5 users, <10 concurrent campaigns)

| Component | Specification | |—————-|———————| | API Server | 1 instance, 2 vCPU, 4 GB RAM | | Workers | 2 instances, 2 vCPU, 4 GB RAM each | | Database | Managed PostgreSQL, 2 vCPU, 4 GB RAM, 50 GB storage | | Redis | Managed Redis, 1 GB RAM | | Estimated cost | $200-400/month (AWS) |

Medium Team (5-20 users, 10-50 concurrent campaigns)

| Component | Specification | |—————-|———————| | API Server | 2 instances behind ALB, 2 vCPU, 8 GB RAM each | | Workers | 4-8 instances, 4 vCPU, 8 GB RAM each | | Database | Managed PostgreSQL, 4 vCPU, 16 GB RAM, 200 GB storage, read replica | | Redis | Managed Redis, 2 GB RAM, cluster mode | | Estimated cost | $800-1,500/month (AWS) |

Security Checklist

  • [ ] TLS everywhere: Use ACM (AWS) or Let's Encrypt for TLS certificates. Enforce HTTPS redirection.
  • [ ] Strong secrets: Generate a cryptographically random SECRET_KEY (at least 32 bytes). Never reuse across environments.
  • [ ] Database credentials: Store in AWS Secrets Manager, not environment variables or config files.
  • [ ] Network isolation: Place the database and Redis in a private subnet. Only the API server should have direct access.
  • [ ] WAF: Enable AWS WAF or Cloudflare in front of the API to block common attacks.
  • [ ] CORS: Configure allowed origins to match your frontend domain only.
  • [ ] Rate limiting: The API has built-in rate limiting, but consider adding an external rate limiter at the load balancer level for DDoS protection.

Environment Variables

bash
# Required
DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/matcraft
REDIS_URL=redis://elasticache-endpoint:6379/0
SECRET_KEY=<cryptographically-random-32-byte-hex-string>
MATCRAFT_ENV=production

# Optional
CORS_ORIGINS=https://matcraft.yourcompany.com
LOG_LEVEL=info
SENTRY_DSN=https://abc@sentry.io/123   # Error tracking

Health Checks

Configure your load balancer to check these endpoints:

  • API: GET /health — returns 200 if the API is running and the database is reachable.
  • Worker: Celery workers report health via the celery inspect ping command. Set up a sidecar that exposes this as an HTTP endpoint.

Backup Strategy

  • Database: Automated daily snapshots with 7-day retention. Enable point-in-time recovery (PITR) for granular restoration.
  • Redis: Redis data is ephemeral (task queue). No backups needed — tasks will be re-queued if Redis restarts.
  • Surrogate checkpoints: Stored in the database. Covered by database backups.

Monitoring and Alerting

Set up alerts for:

| Metric | Warning Threshold | Critical Threshold | |————|—————————-|——————————| | API error rate (5xx) | >1% | >5% | | API latency (p95) | >500ms | >2000ms | | Worker queue depth | >50 tasks | >200 tasks | | Database connections | >80% pool | >95% pool | | Database disk usage | >70% | >85% | | Redis memory | >70% | >85% |

Use CloudWatch (AWS), Prometheus + Grafana (self-managed), or Datadog for monitoring.

Related Questions