2,144 research outputs found
Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction
Peptides play a pivotal role in a wide range of biological activities through
participating in up to 40% protein-protein interactions in cellular processes.
They also demonstrate remarkable specificity and efficacy, making them
promising candidates for drug development. However, predicting peptide-protein
complexes by traditional computational approaches, such as Docking and
Molecular Dynamics simulations, still remains a challenge due to high
computational cost, flexible nature of peptides, and limited structural
information of peptide-protein complexes. In recent years, the surge of
available biological data has given rise to the development of an increasing
number of machine learning models for predicting peptide-protein interactions.
These models offer efficient solutions to address the challenges associated
with traditional computational approaches. Furthermore, they offer enhanced
accuracy, robustness, and interpretability in their predictive outcomes. This
review presents a comprehensive overview of machine learning and deep learning
models that have emerged in recent years for the prediction of peptide-protein
interactions.Comment: 46 pages, 10 figure
PCfun: a hybrid computational framework for systematic characterization of protein complex function
In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function
SyNDock: N Rigid Protein Docking via Learnable Group Synchronization
The regulation of various cellular processes heavily relies on the protein
complexes within a living cell, necessitating a comprehensive understanding of
their three-dimensional structures to elucidate the underlying mechanisms.
While neural docking techniques have exhibited promising outcomes in binary
protein docking, the application of advanced neural architectures to multimeric
protein docking remains uncertain. This study introduces SyNDock, an automated
framework that swiftly assembles precise multimeric complexes within seconds,
showcasing performance that can potentially surpass or be on par with recent
advanced approaches. SyNDock possesses several appealing advantages not present
in previous approaches. Firstly, SyNDock formulates multimeric protein docking
as a problem of learning global transformations to holistically depict the
placement of chain units of a complex, enabling a learning-centric solution.
Secondly, SyNDock proposes a trainable two-step SE(3) algorithm, involving
initial pairwise transformation and confidence estimation, followed by global
transformation synchronization. This enables effective learning for assembling
the complex in a globally consistent manner. Lastly, extensive experiments
conducted on our proposed benchmark dataset demonstrate that SyNDock
outperforms existing docking software in crucial performance metrics, including
accuracy and runtime. For instance, it achieves a 4.5% improvement in
performance and a remarkable millionfold acceleration in speed
Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements
Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)
- …