Pipeline 2 enables full de novo protein scaffold generation. Learn how RFdiffusion and ProteinMPNN are integrated into Foldexa, how the self-consistency test works, and how researchers can design entirely new protein frameworks from scratch.
Azamat Armanuly
CEO & Bioengineer, KAIST
De novo protein design — creating proteins with no evolutionary precedent — is one of the most ambitious challenges in structural biology. With Foldexa Pipeline 2, it is now accessible to any researcher with a browser.
Classical protein engineering modifies naturally occurring proteins. De novo design starts from a blank slate: you define what a protein should do — bind a specific epitope, form a stable trimer, thread a catalytic motif — and the algorithm generates a backbone and sequence that achieves it. This unlocks protein functions that evolution never explored.
RFdiffusion, developed by the Baker Lab at the University of Washington, is a denoising diffusion probabilistic model fine-tuned from RoseTTAFold. It operates on protein backbone coordinates (N, Cα, C, O atoms), learning the joint distribution of residue positions in protein structures. Given a noisy backbone, it iteratively denoises toward physically realistic protein geometries.
Critically, RFdiffusion supports partial diffusion — you can fix a target motif (e.g., the epitope contact residues from a viral antigen) and diffuse only the surrounding scaffold. This is the key capability we use in Pipeline 2.
The self-consistency test is the most important filter in Pipeline 2. It asks: if we take the ProteinMPNN sequence and predict its structure with AlphaFold2, does it fold back into the RFdiffusion backbone?
Why Self-Consistency Matters
A backbone that passes RFdiffusion's geometry checks but whose designed sequence doesn't fold into that backbone is a failure mode. Self-consistency is the computational equivalent of expressing the protein — if the sequence folds into the intended structure, we have strong evidence the design is viable before any wet-lab work.
import foldexa
client = foldexa.Client(api_key="your_key")
# Submit de novo scaffold job
job = client.jobs.create(
pipeline="rfdiffusion-scaffold",
motif_pdb="6VXX",
motif_residues="A45-A67",
scaffold_length=120,
num_designs=100,
sequences_per_design=3,
)
# Wait for completion
result = client.jobs.wait(job.id)
# Filter by self-consistency RMSD
top = [
d for d in result.designs
if d.self_consistency_rmsd < 1.5
and d.foldexa_score >= 0.80
]
print(f"{len(top)} designs passed all filters")We benchmarked Pipeline 2 on three viral epitope scaffolding tasks: RSV site II (F protein), SARS-CoV-2 RBD ACE2-binding interface, and Influenza HA stem. For each, we generated 100 designs and measured self-consistency and scaffold quality.
73%
Self-Consistency Pass Rate
Designs with RMSD < 1.5 Å (mean across 3 targets)
0.9 Å
Mean Backbone RMSD
AF2 vs. RFdiffusion for passing designs
0.4 Å
Motif RMSD
Fixed motif conservation after scaffolding
18%
S-tier Yield
Foldexa Score ≥ 0.85 among all designs
Practical Tip
Set num_designs=100 for exploratory campaigns, 200+ for high-value targets. With 100 designs and 73% self-consistency pass rate, you expect ~73 candidates to advance to ranking — plenty for wet-lab prioritization.
Written by
Azamat Armanuly
CEO & Bioengineer, KAIST