In the field of antibody engineering, an essential task is to design a novel
antibody whose paratopes bind to a specific antigen with correct epitopes.
Understanding antibody structure and its paratope can facilitate a mechanistic
understanding of its function. Therefore, antibody structure prediction from
its sequence alone has always been a highly valuable problem for de novo
antibody design. AlphaFold2, a breakthrough in the field of structural biology,
provides a solution to predict protein structure based on protein sequences and
computationally expensive coevolutionary multiple sequence alignments (MSAs).
However, the computational efficiency and undesirable prediction accuracy of
antibodies, especially on the complementarity-determining regions (CDRs) of
antibodies limit their applications in the industrially high-throughput drug
design. To learn an informative representation of antibodies, we employed a
deep antibody language model (ALM) on curated sequences from the observed
antibody space database via a transformer model. We also developed a novel
model named xTrimoABFold to predict antibody structure from antibody sequence
based on the pretrained ALM as well as efficient evoformers and structural
modules. The model was trained end-to-end on the antibody structures in PDB by
minimizing the ensemble loss of domain-specific focal loss on CDR and the
frame-aligned point loss. xTrimoABFold outperforms AlphaFold2 and other protein
language model based SOTAs, e.g., OmegaFold, HelixFold-Single, and IgFold with
a large significant margin (30+\% improvement on RMSD) while performing 151
times faster than AlphaFold2. To the best of our knowledge, xTrimoABFold
achieved state-of-the-art antibody structure prediction. Its improvement in
both accuracy and efficiency makes it a valuable tool for de novo antibody
design and could make further improvements in immuno-theory.Comment: 14 pages, 5 figure