1,030 research outputs found

    Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing

    Get PDF
    BACKGROUND: The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity. RESULTS: The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes. CONCLUSIONS: Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life. REVIEWERS: This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishi

    Integrated mining of feature spaces for bioinformatics domain discovery

    Get PDF
    One of the major challenges in the field of bioinformatics is the elucidation of protein folding for the functional annotation of proteins. The factors that govern protein folding include the chemical, physical, and environmental conditions of the protein\u27s surroundings, which can be measured and exploited for computational discovery purposes. These conditions enable the protein to transform from a sequence of amino acids to a globular three-dimensional structure. Information concerning the folded state of a protein has significant potential to explain biochemical pathways and their involvement in disorders and diseases. This information impacts the ways in which genetic diseases are characterized and cured and in which designer drugs are created. With the exponential growth of protein databases and the limitations of experimental protein structure determination, sophisticated computational methods have been developed and applied to search for, detect, and compare protein homology. Most computational tools developed for protein structure prediction are primarily based on sequence similarity searches. These approaches have improved the prediction accuracy of high sequence similarity proteins but have failed to perform well with proteins of low sequence similarity. Data mining offers unique algorithmic computational approaches that have been used widely in the development of automatic protein structure classification and prediction. In this dissertation, we present a novel approach for the integration of physico-chemical properties and effective feature extraction techniques for the classification of proteins. Our approaches overcome one of the major obstacles of data mining in protein databases, the encapsulation of different hydrophobicity residue properties into a much reduced feature space that possess high degrees of specificity and sensitivity in protein structure classification. We have developed three unique computational algorithms for coherent feature extraction on selected scale properties of the protein sequence. When plagued by the problem of the unequal cardinality of proteins, our proposed integration scheme effectively handles the varied sizes of proteins and scales well with increasing dimensionality of these sequences. We also detail a two-fold methodology for protein functional annotation. First, we exhibit our success in creating an algorithm that provides a means to integrate multiple physico-chemical properties in the form of a multi-layered abstract feature space, with each layer corresponding to a physico-chemical property. Second, we discuss a wavelet-based segmentation approach that efficiently detects regions of property conservation across all layers of the created feature space. Finally, we present a unique graph-theory based algorithmic framework for the identification of conserved hydrophobic residue interaction patterns using identified scales of hydrophobicity. We report that these discriminatory features are specific to a family of proteins, which consist of conserved hydrophobic residues that are then used for structural classification. We also present our rigorously tested validation schemes, which report significant degrees of accuracy to show that homologous proteins exhibit the conservation of physico-chemical properties along the protein backbone. We conclude our discussion by summarizing our results and contributions and by listing our goals for future research

    Structural Investigation of Binding Events in Proteins

    Full text link
    Understanding the biophysical properties that describe protein binding events has allowed for the advancement of drug discovery through structure-based drug design and in silico methodology. The accuracy of these in silico methods depends entirely on the parameters that we determine for them. Many of these parameters are derived from the structural information we have obtained as a community and therein resides the importance of integrity of the quality of this structural data. First, the curation and contents of the Binding MOAD database are extensively described. This database serves as a repository of 25,759 high-quality, ligand-bound X-ray protein crystal structures complemented by 9138 hand-curated binding affinity data for as many of those ligands as appropriate. The newly implemented extended binding site feature is presented, establishing more robust definitions of ligand binding sites than those provided by other databases. Finally, the contents of Binding MOAD are compared to similar databases, establishing the value of our dataset and which purposes it best serves. Second, a robust dataset of 305 unique protein sequences with at least two ligand-bound and two ligand-free structures for each unique protein is cultivated from Binding MOAD and the PDB. Protein flexibility is assessed using C-alpha RMSD for backbone motion and chi-1 angles to quantify side-chain motions. We establish that there is no statistically significant difference between the available conformational space for the backbones or the side chains of unbound proteins when compared to their bound structures. Examining the change in occupied conformational space upon ligand binding reveals a statistically significant increase in backbone conformational space of miniscule magnitude, but a significant increase of side-chain conformational space. To quantify the conformational space available to the side chains, flexibility profiles are established for each amino acid. We found no correlation between backbone and side-chain flexibility. Parallels are then made to common practices in flexible docking techniques. Six binding-site prediction algorithms are then benchmarked on a derivation of the previously established dataset of 305 proteins. We assessed the performance of ligand-bound vs ligand-free structures with these methods and concluded that five of the six methods showed no preference for either structure type. The remaining method, Fpocket, showed decreased performance for ligand-free structures. There was a staggering amount of inconsistency in performance with the methods; different structures of the exact same protein could achieve wildly different rates of success with the same method. The performance of individual structures for all six methods indicated that success and failure rates were seemingly random. Finally, we establish no correlation between the performance of the same structures with different methods, or the performance of the structures with structure resolution, Cruickshank DPI, or number of unresolved residues in their binding sites. Last, we examine the chemical and physical properties of protein-protein interactions (PPIs) with regard to their geometric location in the interface. First, we found that the relative elevation changes of the protein interface landscapes demonstrate that these interfaces are not as flat as previously described. Second, the hollows of druggable PPI interfaces are more sharply shaped and nonpolar in nature, and the protrusions of these druggable PPI interfaces are very polar in character. Last, no correlations exist between the binding affinity describing the subunits of a PPI and other physical and chemical parameters that we measured.PHDMedicinal ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145943/1/jordanjc_1.pd

    Mizzou engineer, volume 7, number 3

    Get PDF

    Identification of the Type Eleven Secretion System (T11SS) and Characterization of T11SS-dependent Effector Proteins

    Get PDF
    Host-associated microbes live in dangerous environments as a result of host immune killing, nutrient provisioning, and physiological conditions. Bacteria have evolved a host of surface and secreted proteins to help interact with this host environment and overcome nutrient limitation. The studies included within this dissertation describe the identification of a novel bacterial secretion system which has evolved to transport these symbiosis mediating proteins. This system, termed the type eleven secretion system (T11SS), is present throughout the Gram negative phylum Proteobacteria, including many human pathogens such as Neisseria meningitidis, Acinetobacter baumanii, Haemophilus haemolyticus, and Proteus vulgaris. Furthermore, these studies describe how novel cargo proteins of this secretion system were identified and characterized using molecular biology and physicochemical techniques. Chapter 1 establishes the importance of nematode model systems in researching symbiosis, highlighting how research in entomopathogenic nematodes identified the first T11S. Chapters 2 and 3 use a T11SS-dependent hemophore named hemophilin and its transporter protein to demonstrate T11SS secretion and its mechanisms of cargo specificity. Chapter 3 also explores the role of hemophilin within the nematode symbiont X. nematophila in surviving heme starvation and facilitating nematode fitness. Chapter 4 demonstrates that the lipidated symbiosis factor NilC is surface exposed by the T11SS NilB and uses a combination of metabolomics, proteomics, and lectin library analysis to describe the role of NilC in colonization. Chapter 5 describes a protocol for bioinformatically controlling genome co-occurrence analyses and utilizes this technique to demonstrate significant co-occurrence of T11SS with metal uptake pathways, single carbon metabolism, and mobile genetic elements. Additionally, this protocol allowed prediction of 141 T11SS-dependent cargo falling into 10 distinct architectures, including never before seen T11SS-dependent adhesins and glycoproteins. Finally, Chapter 6 summarizes our findings and contextualizes how the T11SS plays essential roles in host-microbe association in mutualistic bacteria and pathogenic bacteria alike

    Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

    Get PDF
    The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism. From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable. In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems. Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis
    • …
    corecore