24,275 research outputs found
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Functional Diversity and Structural Disorder in the Human Ubiquitination Pathway
The ubiquitin-proteasome system plays a central role in cellular regulation and protein quality control (PQC). The system is built as a pyramid of increasing complexity, with two E1 (ubiquitin activating), few dozen E2 (ubiquitin conjugating) and several hundred E3 (ubiquitin ligase) enzymes. By collecting and analyzing E3 sequences from the KEGG BRITE database and literature, we assembled a coherent dataset of 563 human E3s and analyzed their various physical features. We found an increase in structural disorder of the system with multiple disorder predictors (IUPred - E1: 5.97%, E2: 17.74%, E3: 20.03%). E3s that can bind E2 and substrate simultaneously (single subunit E3, ssE3) have significantly higher disorder (22.98%) than E3s in which E2 binding (multi RING-finger, mRF, 0.62%), scaffolding (6.01%) and substrate binding (adaptor/substrate recognition subunits, 17.33%) functions are separated. In ssE3s, the disorder was localized in the substrate/adaptor binding domains, whereas the E2-binding RING/HECT-domains were structured. To demonstrate the involvement of disorder in E3 function, we applied normal modes and molecular dynamics analyses to show how a disordered and highly flexible linker in human CBL (an E3 that acts as a regulator of several tyrosine kinase-mediated signalling pathways) facilitates long-range conformational changes bringing substrate and E2-binding domains towards each other and thus assisting in ubiquitin transfer. E3s with multiple interaction partners (as evidenced by data in STRING) also possess elevated levels of disorder (hubs, 22.90% vs. non-hubs, 18.36%). Furthermore, a search in PDB uncovered 21 distinct human E3 interactions, in 7 of which the disordered region of E3s undergoes induced folding (or mutual induced folding) in the presence of the partner. In conclusion, our data highlights the primary role of structural disorder in the functions of E3 ligases that manifests itself in the substrate/adaptor binding functions as well as the mechanism of ubiquitin transfer by long-range conformational transitions. © 2013 Bhowmick et al
Kernel-based machine learning protocol for predicting DNA-binding proteins
DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines (SVMs). Information used for classification is derived from characteristics that include surface and overall composition, overall charge and positive potential patches on the protein surface. In total 121 DNA-BPs and 238 non-binding proteins are used to build and evaluate the protocol. In self-consistency, accuracy value of 100% has been achieved. For cross-validation (CV) optimization over entire dataset, we report an accuracy of 90%. Using leave 1-pair holdout evaluation, the accuracy of 86.3% has been achieved. When we restrict the dataset to less than 20% sequence identity amongst the proteins, the holdout accuracy is achieved at 85.8%. Furthermore, seven DNA-BPs with unbounded structures are all correctly predicted. The current performances are better than results published previously. The higher accuracy value achieved here originates from two factors: the ability of the SVM to handle features that demonstrate a wide range of discriminatory power and, a different definition of the positive patch. Since our protocol does not lean on sequence or structural homology, it can be used to identify or predict proteins with DNA-binding function(s) regardless of their homology to the known ones
Recommended from our members
Adaptations of Escherichia coli strains to oxidative stress are reflected in properties of their structural proteomes.
BACKGROUND:The reconstruction of metabolic networks and the three-dimensional coverage of protein structures have reached the genome-scale in the widely studied Escherichia coli K-12 MG1655 strain. The combination of the two leads to the formation of a structural systems biology framework, which we have used to analyze differences between the reactive oxygen species (ROS) sensitivity of the proteomes of sequenced strains of E. coli. As proteins are one of the main targets of oxidative damage, understanding how the genetic changes of different strains of a species relates to its oxidative environment can reveal hypotheses as to why these variations arise and suggest directions of future experimental work. RESULTS:Creating a reference structural proteome for E. coli allows us to comprehensively map genetic changes in 1764 different strains to their locations on 4118 3D protein structures. We use metabolic modeling to predict basal ROS production levels (ROStype) for 695 of these strains, finding that strains with both higher and lower basal levels tend to enrich their proteomes with antioxidative properties, and speculate as to why that is. We computationally assess a strain's sensitivity to an oxidative environment, based on known chemical mechanisms of oxidative damage to protein groups, defined by their localization and functionality. Two general groups - metalloproteins and periplasmic proteins - show enrichment of their antioxidative properties between the 695 strains with a predicted ROStype as well as 116 strains with an assigned pathotype. Specifically, proteins that a) utilize a molybdenum ion as a cofactor and b) are involved in the biogenesis of fimbriae show intriguing protective properties to resist oxidative damage. Overall, these findings indicate that a strain's sensitivity to oxidative damage can be elucidated from the structural proteome, though future experimental work is needed to validate our model assumptions and findings. CONCLUSION:We thus demonstrate that structural systems biology enables a proteome-wide, computational assessment of changes to atomic-level physicochemical properties and of oxidative damage mechanisms for multiple strains in a species. This integrative approach opens new avenues to study adaptation to a particular environment based on physiological properties predicted from sequence alone
Two polymorphisms facilitate differences in plasticity between two chicken major histocompatibility complex class I proteins
Major histocompatibility complex class I molecules (MHC I) present peptides to cytotoxic T-cells at the surface of almost all nucleated cells. The function of MHC I molecules is to select high affinity peptides from a large intracellular pool and they are assisted in this process by co-factor molecules, notably tapasin. In contrast to mammals, MHC homozygous chickens express a single MHC I gene locus, termed BF2, which is hypothesised to have co-evolved with the highly polymorphic tapasin within stable haplotypes. The BF2 molecules of the B15 and B19 haplotypes have recently been shown to differ in their interactions with tapasin and in their peptide selection properties. This study investigated whether these observations might be explained by differences in the protein plasticity that is encoded into the MHC I structure by primary sequence polymorphisms. Furthermore, we aimed to demonstrate the utility of a complimentary modelling approach to the understanding of complex experimental data. Combining mechanistic molecular dynamics simulations and the primary sequence based technique of statistical coupling analysis, we show how two of the eight polymorphisms between BF2*15:01 and BF2*19:01 facilitate differences in plasticity. We show that BF2*15:01 is intrinsically more plastic than BF2*19:01, exploring more conformations in the absence of peptide. We identify a protein sector of contiguous residues connecting the membrane bound ?3 domain and the heavy chain peptide binding site. This sector contains two of the eight polymorphic residues. One is residue 22 in the peptide binding domain and the other 220 is in the ?3 domain, a putative tapasin binding site. These observations are in correspondence with the experimentally observed functional differences of these molecules and suggest a mechanism for how modulation of MHC I plasticity by tapasin catalyses peptide selection allosterically
Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents
Few technological ideas have captivated the minds of biochemical researchers to the degree that machine learning (ML) and artificial intelligence (AI) have. Over the last few years, advances in the ML field have driven the design of new computational systems that improve with experience and are able to model increasingly complex chemical and biological phenomena. In this dissertation, we capitalize on these achievements and use machine learning to study drug receptor sites and design drugs to target these sites. First, we analyze the significance of various single nucleotide variations and assess their rate of contribution to cancer. Following that, we used a portfolio of machine learning and data science approaches to design new drugs to target protein kinase inhibitors. We show that these techniques exhibit strong promise in aiding cancer research and drug discovery
- …