Active Learning
The iterative sample-train-acquire loop that drives efficient materials discovery.
Active Learning
Active learning is the core optimization paradigm in MatCraft. Rather than evaluating materials randomly or on a grid, the active learning loop intelligently selects the most informative candidates to evaluate, dramatically reducing the number of expensive evaluations needed to find optimal solutions.
The Active Learning Loop
+------------------+
| Initial Sample |
| (LHS, n=batch) |
+--------+---------+
|
v
+-----------+-----------+
| Evaluate Candidates |<-------+
| (Domain Plugin) | |
+-----------+-----------+ |
| |
v |
+-----------+-----------+ |
| Train Surrogate | |
| (MLP on all data) | |
+-----------+-----------+ |
| |
v |
+-----------+-----------+ |
| Acquire Next Batch | |
| (CMA-ES + EI) |--------+
+-----------+-----------+
|
Converged?
Yes -> StopPhase 1: Initial Sampling
The loop begins with Latin Hypercube Sampling (LHS) to generate a diverse set of initial candidates. LHS ensures uniform coverage of the parameter space with far fewer samples than a grid:
- A 5-dimensional space with 10 levels per dimension would need 100,000 grid points.
- LHS generates the same coverage with just 10—20 samples.
The number of initial samples equals the batch_size by default but can be overridden:
optimizer:
active_learning:
initial_samples: 30Phase 2: Evaluation
Each candidate is evaluated by the domain plugin's evaluation function. This is the expensive step — it may involve physics simulations, DFT calculations, or even real experiments. The evaluation returns objective values for each candidate.
Phase 3: Surrogate Training
After each batch of evaluations, the surrogate model is retrained from scratch on all available (parameters, objectives) data. Using all data (not just the latest batch) ensures the surrogate captures the full landscape learned so far.
Phase 4: Acquisition
The acquisition function scores unevaluated candidates based on how likely they are to improve the current best solution. MatCraft uses CMA-ES to maximize the acquisition function over the surrogate model, proposing a batch of new candidates.
Expected Improvement (EI)
The default acquisition function is Expected Improvement:
EI(x) = E[max(0, f(x) - f_best)]Where f(x) is the surrogate prediction and f_best is the current best objective value. EI balances:
- Exploitation: Candidates near known good regions (high predicted value).
- Exploration: Candidates in uncertain regions (high predicted variance).
The exploration-exploitation trade-off is controlled by the exploration_weight parameter:
optimizer:
active_learning:
acquisition: expected_improvement
exploration_weight: 0.01Higher values of exploration_weight encourage more exploration of uncertain regions.
Phase 5: Convergence Check
After each iteration, MatCraft checks whether the optimizer has converged:
- Hypervolume improvement: If the hypervolume indicator improves by less than
convergence_thresholdforconvergence_patienceconsecutive iterations, the campaign is considered converged. - Budget exhaustion: If the total number of evaluations reaches the budget, the campaign stops regardless of convergence.
optimizer:
budget: 300
active_learning:
convergence_patience: 5
convergence_threshold: 0.001Batch vs. Sequential Acquisition
MatCraft acquires candidates in batches rather than one at a time. This enables parallel evaluation and is more practical for experimental workflows. Within each batch, candidates are selected using a diversity-aware strategy to avoid redundant proposals:
- Select the top EI candidate.
- For each subsequent slot in the batch, select the candidate with highest EI that is at least a minimum distance from already-selected candidates.
Sample Efficiency
Active learning is typically 5—10x more sample-efficient than random search. For a water membrane optimization with 5 parameters:
| Strategy | Evaluations to find 90% of Pareto front | |—————|————————————————————| | Random search | ~1000 | | Grid search | ~3000 | | Active learning (MatCraft) | ~150 |
Programmatic Access
from materia.active_learning.loop import ActiveLearningLoop
loop = ActiveLearningLoop(
spec=material_spec,
evaluator=domain_plugin,
surrogate=mlp_surrogate,
optimizer=cmaes_optimizer,
)
# Run the full loop
results = loop.run()
# Or step through manually
loop.initialize()
while not loop.converged:
loop.step()See Also
- Convergence for convergence detection details
- CMA-ES for the optimizer used in the acquisition step
- MLP Surrogate for the surrogate model