3 research outputs found
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Structure-based protein design has attracted increasing interest, with
numerous methods being introduced in recent years. However, a universally
accepted method for evaluation has not been established, since the wet-lab
validation can be overly time-consuming for the development of new algorithms,
and the validation with recovery and perplexity metrics is
efficient but may not precisely reflect true foldability. To address this gap,
we introduce two novel metrics: refoldability-based metric, which leverages
high-accuracy protein structure prediction models as a proxy for wet lab
experiments, and stability-based metric, which assesses whether models can
assign high likelihoods to experimentally stable proteins. We curate datasets
from high-quality CATH protein data, high-throughput
designed proteins, and mega-scale experimental mutagenesis experiments, and in
doing so, present the benchmark that evaluates both
recent and previously uncompared protein design methods. Experimental results
indicate that ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our
benchmark, while ESM-Design and AF-Design fall short on the refoldability
metric. We also show that while some methods exhibit high sequence recovery,
they do not perform as well on our new benchmark. Our proposed benchmark paves
the way for a fair and comprehensive evaluation of protein design methods in
the future. Code is available at https://github.com/WANG-CR/PDB-Struct.Comment: 13 page