9 research outputs found

    Identification and localization of Tospovirus genus-wide conserved residues in 3D models of the nucleocapsid and the silencing suppressor proteins

    Get PDF
    Background: Tospoviruses (genus Tospovirus, family Peribunyaviridae, order Bunyavirales) cause significant losses to a wide range of agronomic and horticultural crops worldwide. Identification and characterization of specific sequences and motifs that are critical for virus infection and pathogenicity could provide useful insights and targets for engineering virus resistance that is potentially both broad spectrum and durable. Tomato spotted wilt virus (TSWV), the most prolific member of the group, was used to better understand the structure-function relationships of the nucleocapsid gene (N), and the silencing suppressor gene (NSs), coded by the TSWV small RNA. Methods: Using a global collection of orthotospoviral sequences, several amino acids that were conserved across the genus and the potential location of these conserved amino acid motifs in these proteins was determined. We used state of the art 3D modeling algorithms, MULTICOM-CLUSTER, MULTICOM-CONSTRUCT, MULTICOM-NOVEL, I-TASSER, ROSETTA and CONFOLD to predict the secondary and tertiary structures of the N and the NSs proteins. Results: We identified nine amino acid residues in the N protein among 31 known tospoviral species, and ten amino acid residues in NSs protein among 27 tospoviral species that were conserved across the genus. For the N protein, all three algorithms gave nearly identical tertiary models. While the conserved residues were distributed throughout the protein on a linear scale, at the tertiary level, three residues were consistently located in the coil in all the models. For NSs protein models, there was no agreement among the three algorithms. However, with respect to the localization of the conserved motifs, G was consistently located in coil, while H was localized in the coil in three models. Conclusions: This is the first report of predicting the 3D structure of any tospoviral NSs protein and revealed a consistent location for two of the ten conserved residues. The modelers used gave accurate prediction for N protein allowing the localization of the conserved residues. Results form the basis for further work on the structure-function relationships of tospoviral proteins and could be useful in developing novel virus control strategies targeting the conserved residues. 18 11

    Combining Cryo-EM Density Map and Residue Contact for Protein Secondary Structure Topologies

    Get PDF
    Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and ÎČ-strands of a protein. A topology of secondary structures defines the mapping between a set of sequence segments and a set of traces of secondary structures in three-dimensional space. In order to enhance accuracy in ranking secondary structure topologies, we explored a method that combines three sources of information: a set of sequence segments in 1D, a set of amino acid contact pairs in 2D, and a set of traces in 3D at the secondary structure level. A test of fourteen cases shows that the accuracy of predicted secondary structures is critical for deriving topologies. The use of significant long-range contact pairs is most effective at enriching the rank of the maximum-match topology for proteins with a large number of secondary structures, if the secondary structure prediction is fairly accurate. It was observed that the enrichment depends on the quality of initial topology candidates in this approach. We provide detailed analysis in various cases to show the potential and challenge when combining three sources of information

    Improved computational methods of protein sequence alignment, model selection and tertiary structure prediction

    Get PDF
    Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our group's MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition

    Mutational analysis of Kabuki Syndrome patients and functional dissection of KMT2D mutations

    Get PDF
    The discovery of histone methyltransferase KMT2D and demethylase KDM6A genetic alterations in Kabuki Syndrome (KS) expanded and highlighted the role of histone modifiers in causing congenital anomalies and intellectual disability syndromes. KS is a rare autosomal dominant condition characterized by facial features, various organ malformations, postnatal growth deficiency, and intellectual disability. Since 2011 we performed a mutational screening of our KS cohort, that includes now 505 KS patients, by Sanger sequencing and MLPA of KMT2D, followed by KDM6A analysis in those patients resulted as KMT2Dnegative. Of these 505 patients, we identified 196/505 (39%) patients with KMT2D variants and 208 different KMT2D variations; of them 37/208 (18%) never described before. The majority of KS patients carry nonsense and splicesite variants, suggesting the loss of function, and therefore haploinsufficiency, as the likely mechanism for the KS phenotype. RT-PCR and direct sequencing on cDNA from Kabuki patients carrying KMT2D splice site variants demonstrated that these cause aberrant splicing of the corresponding transcript, resulting in a truncating and not functional translated protein. Molecular assays also showed that KMT2D mRNAs bearing premature stop codon are degraded by the nonsense mediated mRNA decay, contributing to KMT2D protein haploinsufficiency. We hypothesized that KS patients may benefit from a readthrough therapy that mediates translational suppression of nonsense variants, restoring the physiologically levels of endogenous KMT2D protein. Fourteen KMT2D nonsense variants were tested for their response to readthrough treatment through an in vitro dual reporter luciferase vector system, identifying 11/14 variants that displayed high levels of readthrough in response to gentamicin treatment. Among our cohort we identified three new cases with a mosaic variants in KMT2D gene, consisting in single nucleotide change resulting in two already reported nonsense variants, the c.13450C=/>T (p.R4484X) and the c.15061C=/>T (p.R5021X) and in a new frameshift variant, the c.3596_3597=/del (p.L1199HfsX7) KMT2D, respectively. Moreover, relevant for diagnostic and counselling purposes, we implemented a number of bioinformatics tools to assess the pathogenicity of 69 KMT2D missense variants, found overall in our cohort of 505 KS patients, and for 14 of them we adopted a combination of biochemical and cellular approaches to investigate their role and characterize their functional impact in the pathogenesis of the disease. We found 9/14 missense variants showing altered H3K4 methylation activity. We additionally assessed the impact on complex formation with WRAD protein complex, and we found that the reduced methyltransferase activity could be a consequence of lack of interaction

    Methoden zur Vorhersage von komplexen biomolekularen Strukturen

    Get PDF
    Die erste hochaufgelöste Struktur eines Proteins wurde 1985 von John Kendrew und Max Perutz aufgelöst. Seitdem ist die experimentelle Aufklärung ein wichtiger Bestandteil der biologischen Forschung. Allerdings ist die Aufklärung der Strukturen von biomoleku- laren Komplexen sehr schwierig. Diese Strukturen sind jedoch immens wichtig für das Verständnis vieler biologischer Phänomene auf molekularer Ebene. Aus diesem Grund hat sich ein Forschungsfeld entwickelt, das computergestützte Modellierung zur Vorher- sage von biomolekularen Strukturen verwendet. In dieser Promotionsschrift sollten Methoden zur Vorhersage von komplexen biomolekularen Strukturen entwickelt werden. Diese Methoden basieren auf drei unter- schiedlichen Ansätzen: Die erste Methode wurde für Proteine entwickelt, die aus mehreren Domänen bestehen. Die Methode nutzt vorhandene Strukturen der einzelnen Domänen und experimentelle Daten, die geometrische Relationen der Domänen abbilden, und ermöglicht die Unter- suchung konformationeller Änderungen bedingt durch äußere Einflüsse, wie beispielsweise das Zuführen eines Substrates. Als Fallbeispiel wurde die Konformation des flexiblen zwei-Domänen Proteins peptidylprolyl cis/trans isomerase NIMA-interacting 1 (Pin1) untersucht, sowie die Änderung als Reaktion auf die Zugabe des Substrates polyethy- lene glycol (PEG). Die zweite Methode basiert auf dem neuen Verfahren Direct Coupling Analysis (DCA), das es ermöglicht geometrische Kontakte von Aminosäuren anhand eines multiplen Sequenzalignments (MSA) vorherzusagen. DCA nutzt eine Korrektur zur Vermeidung einer Stichprobenverzerrung bedingt durch die Auswahl der Sequenzen für das MSA. Die hier vorgestellte Optimierung ermöglicht eine robustere Vorhersage der geometrischen Kontakte. Die optimierte Methode wurde für die Analyse von Human Immunodeficiency Virus-1 Envelope Protein (HIV-1 Env) eingesetzt. Die letzte Methode wurde entwickelt, um Binderegionen des negativ geladenen Heparansulfates an Proteinen vorherzusagen. Dafür haben wir ein Modell entwickelt, das auf der elektrostatischen Wechselwirkung basiert. Die Fallbeispiele sind hier ver- schiedene Heparansulfat bindenden Proteine, wie das Chemokine CCL3 und den Hedgehog Proteinen. Insgesamt wird gezeigt, dass für verschiedene Arten von biomolekularer Strukturen und Komplexe moderne computergestützte Methoden Einsichten liefern, die im Einklang mit Experimenten stehen

    Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

    Get PDF
    Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

    The MULTICOM toolbox for protein structure prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources.</p> <p>Results</p> <p>To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction.</p> <p>Conclusions</p> <p>These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at <url>http://sysbio.rnet.missouri.edu/multicom_toolbox/</url>.</p
    corecore