2 research outputs found
TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation
Designing protein sequences with specific biological functions and structural
stability is crucial in biology and chemistry. Generative models already
demonstrated their capabilities for reliable protein design. However, previous
models are limited to the unconditional generation of protein sequences and
lack the controllable generation ability that is vital to biological tasks. In
this work, we propose TaxDiff, a taxonomic-guided diffusion model for
controllable protein sequence generation that combines biological species
information with the generative capabilities of diffusion models to generate
structurally stable proteins within the sequence space. Specifically, taxonomic
control information is inserted into each layer of the transformer block to
achieve fine-grained control. The combination of global and local attention
ensures the sequence consistency and structural foldability of
taxonomic-specific proteins. Extensive experiments demonstrate that TaxDiff can
consistently achieve better performance on multiple protein sequence generation
benchmarks in both taxonomic-guided controllable generation and unconditional
generation. Remarkably, the sequences generated by TaxDiff even surpass those
produced by direct-structure-generation models in terms of confidence based on
predicted structures and require only a quarter of the time of models based on
the diffusion model. The code for generating proteins and training new versions
of TaxDiff is available at:https://github.com/Linzy19/TaxDiff
HPC-Atlas: Computationally Constructing A Comprehensive Atlas of Human Protein Complexes
A fundamental principle of biology is that proteins tend to form complexes to play important roles in the core functions of cells. For a complete understanding of human cellular functions, it is crucial to have a comprehensive atlas of human protein complexes. Unfortunately, we still lack such a comprehensive atlas of experimentally validated protein complexes, which prevents us from gaining a complete understanding of the compositions and functions of human protein complexes, as well as the underlying biological mechanisms. To fill this gap, we built Human Protein Complexes Atlas (HPC-Atlas), as far as we know, the most accurate and comprehensive atlas of human protein complexes available to date. We integrated two latest protein interaction networks, and developed a novel computational method to identify nearly 9000 protein complexes, including many previously uncharacterized complexes. Compared with the existing methods, our method achieved outstanding performance on both testing and independent datasets. Furthermore, with HPC-Atlas we identified 751 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-affected human protein complexes, and 456 multifunctional proteins that contain many potential moonlighting proteins. These results suggest that HPC-Atlas can serve as not only a computing framework to effectively identify biologically meaningful protein complexes by integrating multiple protein data sources, but also a valuable resource for exploring new biological findings. The HPC-Atlas webserver is freely available at http://www.yulpan.top/HPC-Atlas