Yes. MatCraft supports GPU acceleration for surrogate model training, which can provide a 5-20x speedup for large datasets and complex model architectures.
GPU acceleration is most beneficial when:
Install the GPU-enabled version:
pip install matcraft[gpu]This installs PyTorch with CUDA support. Verify GPU availability:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")Enable GPU training in your campaign config:
surrogate:
type: mlp
device: cuda # "cpu" or "cuda" (default: auto-detect)
hidden_layers: [256, 256, 128]
epochs: 500
batch_size: 256 # Larger batch sizes benefit more from GPUOr in Python:
from materia.surrogate import MLPSurrogate
surrogate = MLPSurrogate(
hidden_layers=[256, 256, 128],
device="cuda",
batch_size=256,
)Use the NVIDIA Container Toolkit:
# Install NVIDIA Container Toolkit
# (See: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
# Run with GPU support
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -dThe GPU compose override:
# docker-compose.gpu.yml
services:
worker:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
MATCRAFT_DEVICE: cuda| GPU | Status | Notes | |——-|————|———-| | NVIDIA A100 | Fully supported | Best performance, recommended for production | | NVIDIA V100 | Fully supported | Good performance | | NVIDIA T4 | Fully supported | Cost-effective for cloud (AWS g4dn instances) | | NVIDIA RTX 3090/4090 | Supported | Good for local development | | NVIDIA RTX 3060/4060 | Supported | Adequate for small to medium models | | AMD GPUs (ROCm) | Experimental | Requires PyTorch ROCm build | | Apple M1/M2/M3 (MPS) | Experimental | Use device: mps |
Training time for an MLP surrogate with [256, 256, 128] architecture, 200 epochs:
| Dataset Size | CPU (i7-12700) | GPU (RTX 3090) | Speedup | |——————-|————————|————————|————-| | 100 points | 2.1s | 1.8s | 1.2x | | 1,000 points | 8.3s | 1.9s | 4.4x | | 10,000 points | 45.2s | 3.1s | 14.6x | | 100,000 points | 412s | 21.4s | 19.3x |
For typical active learning campaigns with <1,000 data points, CPU training is fast enough. GPU becomes valuable for large-scale campaigns or when running many campaigns concurrently.
Recommended cloud instances for GPU workers:
g4dn.xlarge (T4, $0.52/hr) for cost-effective trainingn1-standard-4 with T4 ($0.35/hr + $0.35/hr GPU)Standard_NC4as_T4_v3 (T4, $0.53/hr)