Hyperparameters
Tuning guide for optimizer, surrogate, and active learning hyperparameters.
Hyperparameters
MatCraft's optimization pipeline has three groups of hyperparameters: the CMA-ES optimizer, the MLP surrogate model, and the active learning loop. This guide explains each parameter and provides tuning recommendations.
CMA-ES Hyperparameters
| Parameter | Default | Range | Effect | |—————-|————-|———-|————| | sigma0 | 0.3 | 0.05—0.5 | Initial step size. Larger values explore more broadly at the start. | | population_size | auto | 10—200 | Candidates per CMA-ES generation. Larger populations improve coverage but slow each generation. |
sigma0
The initial step size controls how far from the mean CMA-ES samples in its first generation. On normalized [0,1] parameter space:
sigma0 = 0.5: Covers the entire space. Good for unknown landscapes.sigma0 = 0.3: Default. Balances exploration and exploitation.sigma0 = 0.1: Focuses near the initial mean. Good for refining a known good region.
optimizer:
sigma0: 0.4 # More explorationpopulation_size
CMA-ES internally manages a population of candidates. The default size follows the formula 4 + floor(3 * ln(n)) where n is the number of parameters. For 10 parameters, this is approximately 11.
Increase the population size for:
- High-dimensional problems (> 20 parameters)
- Highly multi-modal landscapes
- When the optimizer gets stuck in local optima
Surrogate Hyperparameters
| Parameter | Default | Range | Effect | |—————-|————-|———-|————| | hidden_layers | [128, 64] | — | Network architecture. More layers = more capacity. | | learning_rate | 0.001 | 1e-4—0.01 | Adam optimizer learning rate. | | epochs | 200 | 50—1000 | Maximum training epochs per iteration. | | dropout | 0.1 | 0.0—0.5 | Regularization and uncertainty estimation. | | activation | relu | relu, tanh, silu | Activation function. |
Sizing the Network
The surrogate should be sized relative to the data available:
| Evaluations | Recommended Architecture | |——————-|————————————-| | < 50 | [32, 16] | | 50—200 | [64, 32] | | 200—500 | [128, 64] (default) | | 500+ | [256, 128, 64] |
Overly large networks on small datasets will overfit, producing overconfident predictions that mislead the acquisition function.
Learning Rate
- Too high (> 0.01): Training loss oscillates; surrogate predictions are noisy.
- Too low (< 1e-4): Training is slow; may not converge within the epoch budget.
- Default (0.001): Works well for most problems.
If training loss plateaus early, try reducing the learning rate. If training loss is noisy, reduce it further.
Dropout
Dropout serves dual purposes in MatCraft:
- Regularization: Prevents overfitting on small datasets.
- Uncertainty estimation: MC Dropout at inference time provides prediction uncertainty for the acquisition function.
Higher dropout (0.2—0.3) is better for small datasets. Lower dropout (0.05—0.1) is better for large datasets. Setting dropout to 0.0 disables uncertainty estimation, which degrades acquisition function quality.
Active Learning Hyperparameters
| Parameter | Default | Range | Effect | |—————-|————-|———-|————| | batch_size | 10 | 1—100 | Candidates per active learning iteration. | | initial_samples | batch_size | 10—100 | LHS samples before first surrogate training. | | acquisition | expected_improvement | — | Acquisition function type. | | exploration_weight | 0.01 | 0.001—0.1 | Exploration bonus in acquisition function. | | convergence_patience | 5 | 2—20 | Iterations of stagnation before stopping. | | convergence_threshold | 0.001 | 1e-4—0.01 | Minimum improvement to count as progress. |
batch_size
Larger batches:
- Enable parallel evaluation (important for experiments).
- Reduce the number of surrogate retraining cycles.
- May be less sample-efficient (more redundant evaluations per batch).
Smaller batches:
- Maximize sample efficiency (surrogate is retrained more frequently).
- Better for sequential evaluation pipelines.
A good rule of thumb: set batch_size to the number of evaluations you can run in parallel.
exploration_weight
Controls the exploration-exploitation balance in the acquisition function:
0.001: Strongly exploitative. Best for smooth, low-noise landscapes.0.01: Default. Balanced.0.1: Strongly exploratory. Best for noisy or highly multi-modal landscapes.
Recommended Configurations
Quick Screening Run
optimizer:
budget: 100
batch_size: 20
sigma0: 0.5
surrogate_config:
hidden_layers: [32, 16]
epochs: 100Production Campaign
optimizer:
budget: 500
batch_size: 15
sigma0: 0.3
seed: 42
surrogate_config:
hidden_layers: [128, 64]
epochs: 300
active_learning:
convergence_patience: 8
convergence_threshold: 0.0005High-Dimensional Search
optimizer:
budget: 1000
batch_size: 25
sigma0: 0.4
population_size: 100
surrogate_config:
hidden_layers: [256, 128, 64]
epochs: 500
dropout: 0.15