What is active learning and how does MatCraft use it?

Question

Accepted Answer

Active learning is a machine learning strategy where the model actively selects which data points to learn from next, rather than passively receiving a fixed dataset. In MatCraft, active learning drives the optimization loop by choosing the most informative compositions to evaluate, minimizing the total number of expensive experiments needed.

## The Active Learning Loop

MatCraft's active learning loop follows this cycle:

```
Seed Data -> Train Surrogate -> Acquisition Function -> Select Candidates
    ^                                                        |
    |                                                        v
    +-------------- Evaluate & Add Data <------- Top-K Candidates
```

1. **Train surrogate**: Fit the MLP on all available data.
2. **Generate candidates**: Use CMA-ES to optimize the acquisition function, producing a large set of promising candidates.
3. **Rank by acquisition**: Score each candidate using the acquisition function, which balances predicted performance (exploitation) with prediction uncertainty (exploration).
4. **Select top-K**: Choose the top `batch_size` candidates for evaluation.
5. **Evaluate**: Run the candidates through your physics model, simulation, or flag them for experimental validation.
6. **Add data**: Incorporate the new measurements and repeat.

## Acquisition Functions

MatCraft supports several acquisition functions:

- **Expected Improvement (EI)**: The default. Measures the expected amount by which a candidate improves over the current best. Good general-purpose choice.
- **Upper Confidence Bound (UCB)**: Adds a weighted uncertainty bonus to the predicted value. The `exploration_weight` parameter (kappa) controls the exploration-exploitation trade-off.
- **Probability of Improvement (PI)**: Measures the probability that a candidate beats the current best. More conservative than EI.
- **Thompson Sampling**: Samples from the surrogate's posterior distribution. Naturally balances exploration and exploitation.

```yaml
acquisition:
  type: expected_improvement
  exploration_weight: 0.1  # Only used for UCB; ignored for EI
```

## Why Active Learning Matters

Without active learning, you might need 500+ experiments to find a near-optimal composition in a 5D space. With active learning, MatCraft typically finds competitive solutions in 50-100 total evaluations (10-20 seed + 5-15 iterations of batch size 5). This represents a 5-10x reduction in experimental cost, which translates directly to saved time and money in a lab setting.

## Convergence

The loop terminates when the convergence criterion is met (e.g., no improvement for N consecutive iterations) or the maximum iteration count is reached.

What is active learning and how does MatCraft use it?

The Active Learning Loop

Acquisition Functions

Why Active Learning Matters

Convergence

Related Questions

How do surrogate models work in MatCraft?

How does MatCraft determine when optimization has converged?