5,919 research outputs found

    Knob-socket Investigation of Stability and Specificity in Alpha-helical Secondary and Quaternary Packing Structure

    Get PDF
    The novel knob-socket (KS) model provides a construct to interpret and analyze the direct contributions of amino acid residues to the stability in α-helical protein structures. Based on residue preferences derived from a set of protein structures, the KS construct characterizes intra- and inter-helical packing into regular patterns of simple motifs. The KS model was used in the de novo design of an α-helical homodimer, KSα1.1. Using site-directed mutagenesis, KSα1.1 point mutants were designed to selectively increase and decrease stability by relating KS propensities with changes to α-helical structure. This study suggests that the sockets from the KS Model can be used as a measure of α-helical structure and stability. The KS model was also used to investigate coiled-coil specificity in bZIP proteins. Identifying and characterizing the interactions that determine the dimerization specificity between bZIP proteins is a crucial factor in better understanding disease formation and proliferation, as well as developing drugs or therapeutics to combat these diseases. Knob-Socket mapping methods identified Asn residues at a positions within the helices, and were determined to be crucial factors in coiled-coil specificity. Site-directed mutagenesis was conducted to investigate the role of the Asn residues, as well as the role played by the neighboring residues at the g and b positions. The results indicate that the Asn at the a position defines coiled-coil specificity, and that the Knob-Socket model can be used to determine bZIP protein quaternary interactions

    Characterization of Protein Folding Pathways and Structural Stability

    Get PDF
    Proteins are large, flexible molecules with an extremely large number of potential conformations. Proteins expressed in cells traverse available conformations to reach a consistent, thermodynamically stable, biologically active structure through a process known as protein folding. The atomic composition of the protein, defined by a sequence of amino acid residues encoded in DNA as a gene, determines the protein folding pathway and ultimate native structure of the protein molecule. Understanding the relationship between the sequence of amino acids and the resulting protein structure has been a central challenge in protein research for decades. To fill this knowledge gap, we test the hypothesis that the distribution of conformers observed for a short protein sequence across all known protein structures reflects that sequence\u27s intrinsic structural properties. Qualitative and quantitative predictions based on our model are tested against experimental data for protein stability and folding pathways. Replica-exchange Monte Carlo simulations, data mining of the Worldwide Protein Data Bank (wwPDB), analysis of published protein stability data, thermodynamic and kinetic folding experiments, and Xray crystallography were used to characterize the structural properties of amino acid sequences. The role of turn sequences in guiding the protein folding process was extensively characterized by the combined methods. Turn composition, structural preferences, and cooperation with neighboring residues determined whether a turn had an active, passive, or counter-active role in a protein\u27s folding process. Proline-rich turns, NPSNP and KPSDP, from the two-helix bundles found in bacterial type III secretion system needle proteins form native-like structure early in the folding process. Each of these turns are flanked by sequences with very high helix propensity that, when oriented by the turn, can actively nucleate the hydrophobic core of the protein. The hydrophobic turn, MGYE, from the three-helix bundle UBA(1) also forms native-like structure early in the folding process. This turn structure places the Met (M) and Tyr (Y) residues together, nucleating the hydrophobic core of UBA(1). These two residues can then stabilize the adjacent helices to form a Helix- Turn-Helix structure. The second, proline-containing turn in UBA(1), ASYNNP, forms non-native structure early in the folding process. This turn restructures late in the folding process when the third helix docks to the previous Helix-Turn-Helix structure. Each of the active turns characterized (NPSNP, KPSDP, and MGYE) direct the folding process by nucleating the protein\u27s hydrophobic core. A general purpose computational method to model the local structural properties of protein sequences was developed from data mined from the wwPDB. Turn mechanisms can be rapidly characterized using the tool, EmCAST, in conjunction with a PDB structure of the protein of interest. The impact of surface mutations on protein stability can also be scored by EmCAST. Models and calculations were extensively validated against experimental data for multiple protein and peptide systems. Calculations for stabilizing mutations at well-structured positions in UBA(1) produced a near perfect correlation with experimental measurements (R2 = 0.97). A user-friendly web interface to the software was developed to share the method with other protein researchers. Our model provides key insights into the protein sequence/structure relationship that can be used to characterize protein surface stability, identify regions with dynamic structure, and predict protein folding intermediates

    From parasite genomes to one healthy world: Are we having fun yet?

    Get PDF
    In 1990, the Human Genome Sequencing Project was established. This laid the ground work for an explosion of sequence data that has since followed. As a result of this effort, the first complete genome of an animal, Caenorhabditis elegans was published in 1998. The sequence of Drosophila melanogaster was made available in March, 2000 and in the following year, working drafts of the human genome were generated with the completed sequence (92%) being released in 2003. Recent advancements and next-generation technologies have made sequencing common place and have infiltrated every aspect of biological research, including parasitology. To date, sequencing of 32 apicomplexa and 24 nematode genomes are either in progress or near completion, and over 600k nematode EST and 200k apicomplexa EST submissions fill the databases. However, the winds have shifted and efforts are now refocusing on how best to store, mine and apply these data to problem solving. Herein we tend not to summarize existing X-omics datasets or present new technological advances that promise future benefits. Rather, the information to follow condenses up-to-date-applications of existing technologies to problem solving as it relates to parasite research. Advancements in non-parasite systems are also presented with the proviso that applications to parasite research are in the making

    Machine Learning in Enzyme Engineering

    Get PDF
    Enzyme engineering plays a central role in developing efficient biocatalysts for biotechnology, biomedicine, and life sciences. Apart from classical rational design and directed evolution approaches, machine learning methods have been increasingly applied to find patterns in data that help predict protein structures, improve enzyme stability, solubility, and function, predict substrate specificity, and guide rational protein design. In this Perspective, we analyze the state of the art in databases and methods used for training and validating predictors in enzyme engineering. We discuss current limitations and challenges which the community is facing and recent advancements in experimental and theoretical methods that have the potential to address those challenges. We also present our view on possible future directions for developing the applications to the design of efficient biocatalysts

    Mapping the proteome with data-driven methods: A cycle of measurement, modeling, hypothesis generation, and engineering

    Get PDF
    The living cell exhibits emergence of complex behavior and its modeling requires a systemic, integrative approach if we are to thoroughly understand and harness it. The work in this thesis has had the more narrow aim of quantitatively characterizing and mapping the proteome using data-driven methods, as proteins perform most functional and structural roles within the cell. Covered are the different parts of the cycle from improving quantification methods, to deriving protein features relying on their primary structure, predicting the protein content solely from sequence data, and, finally, to developing theoretical protein engineering tools, leading back to experiment.\ua0\ua0\ua0\ua0 High-throughput mass spectrometry platforms provide detailed snapshots of a cell\u27s protein content, which can be mined towards understanding how the phenotype arises from genotype and the interplay between the various properties of the constituent proteins. However, these large and dense data present an increased analysis challenge and current methods capture only a small fraction of signal. The first part of my work has involved tackling these issues with the implementation of a GPU-accelerated and distributed signal decomposition pipeline, making factorization of large proteomics scans feasible and efficient. The pipeline yields individual analyte signals spanning the majority of acquired signal, enabling high precision quantification and further analytical tasks.\ua0\ua0\ua0 Having such detailed snapshots of the proteome enables a multitude of undertakings. One application has been to use a deep neural network model to learn the amino acid sequence determinants of temperature adaptation, in the form of reusable deep model features. More generally, systemic quantities may be predicted from the information encoded in sequence by evolutionary pressure. Two studies taking inspiration from natural language processing have sought to learn the grammars behind the languages of expression, in one case predicting mRNA levels from DNA sequence, and in the other protein abundance from amino acid sequence. These two models helped build a quantitative understanding of the central dogma and, furthermore, in combination yielded an improved predictor of protein amount. Finally, a mathematical framework relying on the embedded space of a deep model has been constructed to assist guided mutation of proteins towards optimizing their abundance

    RNA structure analysis : algorithms and applications

    Get PDF
    In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure. Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree model to capture the structural particularities. The novel structure alignment algorithm is implemented using dynamic programming techniques coupled by position-independent scoring matrices. The algorithm can find the optimal global and local alignments between two RNA secondary structures at quadratic time complexity. When applied to searching a structure database, the algorithm can find similar RNA substructures and therefore can be used to identify functional RNA motifs. Extension of the algorithm has also been accomplished to deal with position-dependent scoring matrix in the purpose of aligning multiple structures. All algorithms have been implemented in a package under the name RSmatch and applied to searching mRNA UTR structure database and mining RNA motifs. The experimental results showed high efficiency and effectiveness of the proposed techniques

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    Burial level change defines a high energetic relevance for protein binding interfaces

    Full text link
    © 2004-2012 IEEE. Protein-protein interfaces defined through atomic contact or solvent accessibility change are widely adopted in structural biology studies. But, these definitions cannot precisely capture energetically important regions at protein interfaces. The burial depth of an atom in a protein is related to the atom's energy. This work investigates how closely the change in burial level of an atom/residue upon complexation is related to the binding. Burial level change is different from burial level itself. An atom deeply buried in a monomer with a high burial level may not change its burial level after an interaction and it may have little burial level change. We hypothesize that an interface is a region of residues all undergoing burial level changes after interaction. By this definition, an interface can be decomposed into an onion-like structure according to the burial level change extent. We found that our defined interfaces cover energetically important residues more precisely, and that the binding free energy of an interface is distributed progressively from the outermost layer to the core. These observations are used to predict binding hot spots. Our approach's F-measure performance on a benchmark dataset of alanine mutagenesis residues is much superior or similar to those by complicated energy modeling or machine learning approaches

    Bioinformatics Techniques for Studying Drug Resistance In HIV and Staphylococcus Aureus

    Get PDF
    The worldwide HIV/AIDS pandemic has been partly controlled and treated by antivirals targeting HIV protease, integrase and reverse transcriptase, however, drug resistance has become a serious problem. HIV-1 drug resistance to protease inhibitors evolves by mutations in the PR gene. The resistance mutations can alter protease catalytic activity, inhibitor binding, and stability. Different machine learning algorithms (restricted boltzmann machines, clustering, etc.) have been shown to be effective machine learning tools for classification of genomic and resistance data. Application of restricted boltzmann machine produced highly accurate and robust classification of HIV protease resistance. They can also be used to compare resistance profiles of different protease inhibitors. HIV drug resistance has also been studied by enzyme kinetics and X-ray crystallography. Triple mutant HIV-1 protease with resistance mutations V32I, I47V and V82I has been used as a model for the active site of HIV-2 protease. The effects of four investigational antiviral inhibitors was measured for Triple mutant. The tested compounds had significantly worse inhibition of triple mutant with Ki values of 17-40 nM compared to 2-10 pM for wild type protease. The crystal structure of triple mutant in complex with GRL01111 was solved and showed few changes in protease interactions with inhibitor. These new inhibitors are not expected to be effective for HIV-2 protease or HIV-1 protease with changes V32I, I47V and V82I. Methicillin-resistant Staphylococcus aureus (MRSA) is an opportunistic pathogen that causes hospital and community-acquired infections. Antibiotic resistance occurs because of newly acquired low-affinity penicillin-binding protein (PBP2a). Transcriptome analysis was performed to determine how MuM (mutated PBP2 gene) responds to spermine and how Mu50 (wild type) responds to spermine and spermine–β-lactam synergy. Exogenous spermine and oxacillin were found to alter some significant gene expression patterns with major biochemical pathways (iron, sigB regulon) in MRSA with mutant PBP2 protein
    corecore