Project · 2025

Project Synthesis

Autonomous material discovery

Internal previewProject · 2025

Motivation

Why this project exists

Materials discovery is bottlenecked by characterization, not hypothesis. Modern computational chemistry can propose far more candidate compounds than human-paced laboratories can synthesize, characterize, and validate. Closing the loop requires automation across the entire pipeline.

Synthesis is our closed-loop materials-discovery pipeline. It hypothesizes candidate structures, simulates their properties, schedules synthesis on automated hardware, characterizes the result, and feeds outcomes back into the hypothesis generator. We are exploring novel superconductor and battery-material families. We are not claiming any specific result that has not been independently replicated.

Approach

How we work on it

Hypothesis generation under cost constraints

A foundation-model-based hypothesizer proposes candidate compounds prioritized by simulated property gain divided by estimated synthesis cost. The cost model is itself learned from prior synthesis runs.

Automated synthesis and characterization

Modular robotic synthesizer + multi-modal characterization stack (XRD, ARPES adapter, susceptometer). Sample-tracking is end-to-end: every characterization datum links back to a hypothesis lineage.

Closed-loop active learning

The pipeline runs as a Bayesian optimization loop with explicit uncertainty quantification on the surrogate property model. The loop also schedules dedicated runs to reduce uncertainty in regions of the design space the surrogate is least confident about.

Progress

Where the work stands

ShippedQ3 2025
Initial pipeline integration
End-to-end demo with three closed-loop iterations on a benchmark battery-material family.
In progressQ2 2026
Characterization throughput scale-up
New characterization hardware halves cycle time per candidate.
PlannedQ4 2026
Open release of pipeline framework
Public release of the orchestration layer and a curated benchmark.

Open questions

What we are still figuring out

01
Characterization-noise modeling
How do we faithfully model the systematic noise of a multi-modal characterization stack so the surrogate model is not trained on artifacts?
02
Cost-aware exploration
When synthesis cost varies by 100× across the design space, naive Bayesian optimization concentrates in the cheap region. Cost-aware acquisition is harder than it sounds.
03
Replication and provenance
For any positive result, the standard for replication should be high. We are designing the pipeline so external groups can replicate from the ledger.