Does MatCraft support GPU acceleration?

Question

Accepted Answer

Yes. MatCraft supports GPU acceleration for surrogate model training, which can provide a 5-20x speedup for large datasets and complex model architectures.

## When GPU Helps

GPU acceleration is most beneficial when:

- **Large datasets**: More than 1,000 training points. For smaller datasets, the overhead of moving data to GPU memory outweighs the computation benefit.
- **Large models**: Hidden layers with 256+ neurons or 3+ layers. Small models (64-64) train fast enough on CPU.
- **Frequent retraining**: Active learning campaigns retrain the surrogate every iteration. Faster training means faster campaign completion.
- **Ensemble surrogates**: Training 5-10 independent models benefits significantly from GPU parallelism.

## Installation

Install the GPU-enabled version:

```bash
pip install matcraft[gpu]
```

This installs PyTorch with CUDA support. Verify GPU availability:

```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
```

## Configuration

Enable GPU training in your campaign config:

```yaml
surrogate:
  type: mlp
  device: cuda        # "cpu" or "cuda" (default: auto-detect)
  hidden_layers: [256, 256, 128]
  epochs: 500
  batch_size: 256     # Larger batch sizes benefit more from GPU
```

Or in Python:

```python
from materia.surrogate import MLPSurrogate

surrogate = MLPSurrogate(
    hidden_layers=[256, 256, 128],
    device="cuda",
    batch_size=256,
)
```

## Docker with GPU

Use the NVIDIA Container Toolkit:

```bash
# Install NVIDIA Container Toolkit
# (See: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)

# Run with GPU support
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
```

The GPU compose override:

```yaml
# docker-compose.gpu.yml
services:
  worker:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      MATCRAFT_DEVICE: cuda
```

## Supported GPUs

| GPU | Status | Notes |
|-----|--------|-------|
| NVIDIA A100 | Fully supported | Best performance, recommended for production |
| NVIDIA V100 | Fully supported | Good performance |
| NVIDIA T4 | Fully supported | Cost-effective for cloud (AWS g4dn instances) |
| NVIDIA RTX 3090/4090 | Supported | Good for local development |
| NVIDIA RTX 3060/4060 | Supported | Adequate for small to medium models |
| AMD GPUs (ROCm) | Experimental | Requires PyTorch ROCm build |
| Apple M1/M2/M3 (MPS) | Experimental | Use `device: mps` |

## Performance Benchmarks

Training time for an MLP surrogate with `[256, 256, 128]` architecture, 200 epochs:

| Dataset Size | CPU (i7-12700) | GPU (RTX 3090) | Speedup |
|-------------|----------------|----------------|---------|
| 100 points | 2.1s | 1.8s | 1.2x |
| 1,000 points | 8.3s | 1.9s | 4.4x |
| 10,000 points | 45.2s | 3.1s | 14.6x |
| 100,000 points | 412s | 21.4s | 19.3x |

For typical active learning campaigns with <1,000 data points, CPU training is fast enough. GPU becomes valuable for large-scale campaigns or when running many campaigns concurrently.

## Cloud GPU Instances

Recommended cloud instances for GPU workers:

- **AWS**: `g4dn.xlarge` (T4, $0.52/hr) for cost-effective training
- **GCP**: `n1-standard-4` with T4 ($0.35/hr + $0.35/hr GPU)
- **Azure**: `Standard_NC4as_T4_v3` (T4, $0.53/hr)

Does MatCraft support GPU acceleration?

When GPU Helps

Installation

Configuration

Docker with GPU

Supported GPUs

Performance Benchmarks

Cloud GPU Instances

Related Questions

How do surrogate models work in MatCraft?

How do I scale MatCraft for large workloads?