ResearchRFdiffusionDe Novo DesignScaffoldProteinMPNN

De Novo Scaffold Design with RFdiffusion

Pipeline 2 enables full de novo protein scaffold generation. Learn how RFdiffusion and ProteinMPNN are integrated into Foldexa, how the self-consistency test works, and how researchers can design entirely new protein frameworks from scratch.

AZ

Azamat Armanuly

CEO & Bioengineer, KAIST

Mar 10, 2026 5 min read

De novo protein design — creating proteins with no evolutionary precedent — is one of the most ambitious challenges in structural biology. With Foldexa Pipeline 2, it is now accessible to any researcher with a browser.

What Is De Novo Protein Design?

Classical protein engineering modifies naturally occurring proteins. De novo design starts from a blank slate: you define what a protein should do — bind a specific epitope, form a stable trimer, thread a catalytic motif — and the algorithm generates a backbone and sequence that achieves it. This unlocks protein functions that evolution never explored.

RFdiffusion: Backbone Generation

RFdiffusion, developed by the Baker Lab at the University of Washington, is a denoising diffusion probabilistic model fine-tuned from RoseTTAFold. It operates on protein backbone coordinates (N, Cα, C, O atoms), learning the joint distribution of residue positions in protein structures. Given a noisy backbone, it iteratively denoises toward physically realistic protein geometries.

Critically, RFdiffusion supports partial diffusion — you can fix a target motif (e.g., the epitope contact residues from a viral antigen) and diffuse only the surrounding scaffold. This is the key capability we use in Pipeline 2.

Foldexa Pipeline 2 Workflow

  1. 1Define the target motif: upload target PDB and specify residues to scaffold (e.g., chain A residues 45–67)
  2. 2Configure scaffold parameters: total length (60–200 residues), secondary structure bias, symmetry (monomer, dimer, trimer)
  3. 3RFdiffusion generates N backbone structures (default N=100) conditioned on the fixed motif
  4. 4ProteinMPNN designs amino acid sequences for each backbone (3 sequences per backbone)
  5. 5AlphaFold2 predicts the structure of each sequence and compares it to the RFdiffusion backbone
  6. 6Self-consistency filter: keep designs where AF2 RMSD < 1.5 Å to the RFdiffusion backbone
  7. 7Final ranking by Foldexa Score; top candidates returned with full structure files

The Self-Consistency Test

The self-consistency test is the most important filter in Pipeline 2. It asks: if we take the ProteinMPNN sequence and predict its structure with AlphaFold2, does it fold back into the RFdiffusion backbone?

Why Self-Consistency Matters

A backbone that passes RFdiffusion's geometry checks but whose designed sequence doesn't fold into that backbone is a failure mode. Self-consistency is the computational equivalent of expressing the protein — if the sequence folds into the intended structure, we have strong evidence the design is viable before any wet-lab work.

Poll job and filter by self-consistency
import foldexa

client = foldexa.Client(api_key="your_key")

# Submit de novo scaffold job
job = client.jobs.create(
    pipeline="rfdiffusion-scaffold",
    motif_pdb="6VXX",
    motif_residues="A45-A67",
    scaffold_length=120,
    num_designs=100,
    sequences_per_design=3,
)

# Wait for completion
result = client.jobs.wait(job.id)

# Filter by self-consistency RMSD
top = [
    d for d in result.designs
    if d.self_consistency_rmsd < 1.5
    and d.foldexa_score >= 0.80
]
print(f"{len(top)} designs passed all filters")

Benchmark: Viral Epitope Scaffolding

We benchmarked Pipeline 2 on three viral epitope scaffolding tasks: RSV site II (F protein), SARS-CoV-2 RBD ACE2-binding interface, and Influenza HA stem. For each, we generated 100 designs and measured self-consistency and scaffold quality.

73%

Self-Consistency Pass Rate

Designs with RMSD < 1.5 Å (mean across 3 targets)

0.9 Å

Mean Backbone RMSD

AF2 vs. RFdiffusion for passing designs

0.4 Å

Motif RMSD

Fixed motif conservation after scaffolding

18%

S-tier Yield

Foldexa Score ≥ 0.85 among all designs

Practical Tip

Set num_designs=100 for exploratory campaigns, 200+ for high-value targets. With 100 designs and 73% self-consistency pass rate, you expect ~73 candidates to advance to ranking — plenty for wet-lab prioritization.

Share
AZ

Written by

Azamat Armanuly

CEO & Bioengineer, KAIST