361 research outputs found

    A tRNA world

    Get PDF
    Knowledge about the kinetics of chemical reactions in cells is important for an understanding of signaling pathways and regulation. Even though there are many kinetic measurements of in vitro reactions in literature, methods for in vivo measurements are sparse. With help of Temperature Oscillation Optical Lock-in (TOOL) microscopy we measure the kinetics of DNA hybridization inside cells and detect signicant acceleration or deceleration compared to in vitro measurements, dependent on the DNA sample. The dierences can not be explained by molecular crowding eects. Only models that take the background interactions with genomic DNA and RNA as well as the activity of single stranded and double stranded binding proteins into account, can be tted to data. The results imply that the biological relevance of kinetic rates measured in vitro has to be rejudged carefully. The RNA world hypothesis predicts catalytic molecules based on RNA, as for example early replicators, as precursor of modern biology. But how can a pool of appropriate RNA molecules arise under early earth conditions? In a Gillespie-model, we observe the length distribution, secondary structure and sequences of a pool of RNA molecules in porous rocks like they appear near sites of volcanic activity. We assume a monomer in ux, a length dependent out ux, a random, non-templated polymerisation and a degradation that is much stronger for single stranded than for double stranded RNA. After equilibrium is reached, the pool is populated with many hairpin-like structures due to the selection pressure for hybridized strands that can be bricks for RNA machines. Once sequence motifs and their complements appear in the reactor, they protect each other and are present longer than statistically expected. This "protection by hybridization" has the same ngerprint as a weak replication. As a consequence, the pool does not cover the full sequence space but includes more similar sequences, which is an important condition for chemical reactions. Replication of genetic information by RNA molecules is considered to be a key process in the beginning of evolution. It is so crucial that traces of this early replication are expected to be present in key processes of modern biology. We present a replication scheme based on hairpins derived from the sequence of tRNA that replicates the genetic information about a succession of sequence snippets. The replication is driven by temperature oscillations as they occur naturally inside of porous rocks in presence of temperature gradients, and independent on external chemical energy sources. It is selective for correct information and shows exponential growth rates with doubling times in the range of seconds to minutes and is thereby the fastest early replicator in the literature. The replication scheme can naturally be expanded to longer successions by using double hairpins derived from full tRNA sequences by only few mutations. By charging double hairpins with amino acids or peptides, the proposed replication bridges the gap from the RNA world to modern biology by oering a rudimentary translation mechanism, that sorts amino acids to chains according to genetic information

    Navigating the Extremes of Biological Datasets for Reliable Structural Inference and Design

    Get PDF
    Structural biologists currently confront serious challenges in the effective interpretation of experimental data due to two contradictory situations: a severe lack of structural data for certain classes of proteins, and an incredible abundance of data for other classes. The challenge with small data sets is how to extract sufficient information to draw meaningful conclusions, while the challenge with large data sets is how to curate, categorize, and search the data to allow for its meaningful interpretation and application to scientific problems. Here, we develop computational strategies to address both sparse and abundant data sets. In the category of sparse data sets, we focus our attention on the problem of transmembrane (TM) protein structure determination. As X-ray crystallography and NMR data is notoriously difficult to obtain for TM proteins, we develop a novel algorithm which uses low-resolution data from protein cross-linking or scanning mutagenesis studies to produce models of TM helix oligomers and show that our method produces models with an accuracy on par with X-ray crystallography or NMR for a test set of known TM proteins. Turning to instances of data abundance, we examine how to mine the vast stores of protein structural data in the Protein Data Bank (PDB) to aid in the design of proteins with novel binding properties. We show how the identification of an anion binding motif in an antibody structure allowed us to develop a phosphate binding module that can be used to produce novel antibodies to phosphorylated peptides - creating antibodies to 7 novel phospho-peptides to illustrate the utility of our approach. We then describe a general strategy for designing binders to a target protein epitope based upon recapitulating protein interaction geometries which are over-represented in the PDB. We follow this by using data describing the transition probabilities of amino acids to develop a novel set of degenerate codons to create more efficient gene libraries. We conclude by describing a novel, real-time, all-atom structural search engine, giving researchers the ability to quickly search known protein structures for a motif of interest and providing a new interactive paradigm of protein design

    Novel bioinformatics programs for taxonomical classification and functional analysis of the whole genome sequencing data of arbuscular mycorrhizal fungi

    Full text link
    RĂ©sumĂ© [TITRE] Classification taxonomique et analyse fonctionnelle spĂ©cifique Ă la position des sĂ©quences gĂ©nomique des champignons mycorhiziens arbusculaires et les microorganismes qui leurs sont associĂ©s [PROBLÉMATIQUE ET CADRE CONCEPTUEL] Les champignons mycorhiziens arbusculaires (CMA) sont des symbiotes obligatoires des racines de la majoritĂ©des plantes vasculaires. Les CMA appartiennent au phylum Glomeromycota et ils sont considĂ©rĂ©s comme une lignĂ©e fongique primitive qui a conservĂ© la structure coenocytique des hyphes et la production des spores asexuĂ©es multinuclĂ©Ă©es. De nombeuses Ă©tudes ont dĂ©montrĂ©que plusieurs microorganismes sont associĂ©s avec les mycĂ©lia des CMA soit Ă la surface des hyphes et des spores mais aussi Ă l'intĂ©rieurs de celles-ci. Le sĂ©quençage des gĂ©nomes des CMA cultivĂ©s in-vivo reprĂ©sente un dĂ©fi considĂ©rable car il s’agit d’un mĂ©tagĂ©nome constituĂ©du gĂ©nome du CMA lui-mĂȘme et les gĂ©nomes des microbes qui lui sont associĂ©s. Par consĂ©quence, l’identification de l'origine taxonomique de chaque sĂ©quence reprĂ©sente une tĂąche extrĂȘmement ardue. Dans mon projet, j’ai dĂ©veloppĂ©deux nouveaux programmes bioinformatiques qui permettent de classer les sĂ©quences selon groupe taxonomique et d’identifier les fonctions de celles-ci. J’ai crĂ©Ă©une base de donnĂ©es avec 444 gĂ©nomes d'espĂšces appartenant Ă 54 genres. Le choix de ces espĂšces des bactĂ©ries et des champignons a Ă©tĂ©basĂ©sur leur abondance dans les sols). [MÉTHODOLOGIE] Le programme bioinformatique utilise le tableau des rĂ©fĂ©rences des microorganismes et des mĂ©thodes statistiques pour la classification taxonomique des sĂ©quences. Par la suite, des tableaux des codons synonymes Ă©taient crĂ©Ă©s Ă partir des structures secondaires (SS) des bases de donnĂ©es de protĂ©ines (PDB) pour les sĂ©quences codantes (SC) et des motifs de composition pour les sĂ©quences non-codantes (SNC). Chaque tableau est composĂ©de 3 niveaux - les caractĂ©ristiques d'acides aminĂ©s; l'utilisation des acides aminĂ©s synonymes correspondants, et l'utilisation des codons synonymes correspondants. En comparant les mĂ©thodes existantes qui utilisent les taux de substitution moyenne globale quelle que soit les spĂ©cificitĂ©s des acides aminĂ©s dans diverses structures, mon programme fournit une classification Ă haute rĂ©solution pour des sĂ©quences courtes (150-300 pb) parce que les biais dans l'utilisation des codons synonymes Ă partir d'environ 8000 trimĂšres d'acides aminĂ©s spĂ©cifiques des sous-unitĂ©s de structure secondaire, ont Ă©tĂ©extraits avec des substitutions d'acides aminĂ©s pris en considĂ©ration dans chaque trimĂšre spĂ©cifique. Pour l'analyse fonctionnelle, le programme crĂ©e dynamiquement des donnĂ©es comparatives de 54 genres microbiens basĂ©s sur leurs biais dans l'utilisation des codons synonymes d'appariement de trois codons d’ADN (9-mĂšres) identifiĂ©s dans une sĂ©quence de requĂȘte. Le programme applique une analyse en composantes principales basĂ©e sur la matrice de corrĂ©lation en association avec le partitionnement en k-moyennes aux donnĂ©es comparatives. [RETOMBÉES] Les taux de prĂ©diction correcte de la CDS et les non-CDS Ă©taient de 50 Ă 71% pour les bactĂ©ries, et 65 Ă 73% pour les champignons, respectivement. Pour les CMA, 49% des CDS et 72% des non-CDS ont Ă©tĂ©correctement classĂ©s. Ce programme nous permet d'estimer les abondances approximatives des communautĂ©s microbiennes associĂ©es au CMA. Les rĂ©sultats de l'analyse fonctionnelle peuvent fournir des informations sur des sites d'interaction molĂ©culaire importants impliquĂ©s dans la diversification des sĂ©quences et l’évolution des gĂšnes. Les programmes sont disponibles gratuitement sur www.fungalsesame.org. Mots-clĂ©s: sesame, sesame PS function, les caractĂ©ristiques d'acides aminĂ©s, trois codons ADN 9-mĂšres, structure secondaire, classification taxonomique, analyse fonctionnelle spĂ©cifique Ă la position; Code gĂ©nĂ©tique; Étude Comparative; GĂ©nome MitochondrialAbstract Arbuscular Mycorrhizal Fungi (AMF) are obligate plant-root symbionts belonging to the phylum Glomeromycota. They form coenocytic hyphae and reproduce through large multinucleated asexual spores. Numerous studies have shown that AMF interact closely or loosely with a myriad of microorganisms, particularly bacteria and fungi that live on the surface of or inside of their mycelia and spores. Whole genome sequencing (WGS) data of the AMF grown in-vivo (typically grown in root of a host plant in pot filled with soil) contain a large amount of sequences from microorganisms inhabiting in their spore along with their own genome sequences, resulting in a metagenome. The goal of my study was to develop bioinformatics programs for taxonomical classification and for functional analysis of the WGS data of the AMF. In the area of metagenomics, there are mainly two approaches for taxonomical classification: similarity-based (i.e., homology search) and composition-based (i.e., k-mers) methods. Similarity-based method solely depends on bioinformatics sequence databases and homology search programs such as BLAST program. The similarity-based method may not be suitable for ancient fungi AMF, because bioinformatics databases represent only a small fraction of the diversity of existing microorganisms, and gene prediction programs are highly biased towards intensively studied microorganisms. Considering that AMF have high inter/ intra genome variations, in addition to coenocytic and multi-genomic characteristics, probably due to their adaptation via various kinds of symbioses, composition-based method alone is not an effective solution for AMF, because it relies on base composition biases and focuses on taxonomical classification for prokaryotic organisms. In the first project, I a developed novel bioinformatics program, called SeSaMe (Spore associated Symbiotic Microbes), for taxonomical classification of the WGS data of the AMF. I selected microorganisms that were dominant in soil environment and grouped them into 54 genera which were used as references. I created a reference sequence database with a variable called Three codon DNA 9-mer. They were created based on a large number of structure files from Protein Data Bank (PDB): approx. 224,000 Three codon DNA 9-mers encoding for subunits of protein secondary structures. Based on the reference sequence database, I created genus specific usage databases containing codon usage and amino acid usage per taxonomic rank- genus. The program distinguishes between coding sequence (CDS) and non-CDS, detects an open reading frame, and classifies a query sequence into a genus group out of 54 genera used as reference. The developed program enables us to estimate relative abundances of taxonomic groups and to assess symbiotic roles of taxonomic groups associated with AMF. The program can be applied to other microorganisms as well as soil metagenome data. The program has applications in applied environmental microbiology. The developed program is available for free of charge at www.fungalsesame.org. In the second project, I developed another bioinformatics program, called SeSaMe PS Function, for position specific functional analysis of the WGS data of the AMF. AMF may contain a large portion of genes with unknown functions for which we may not be able to find homologues in existing sequence databases. While existing motif annotation programs rely on sequence alignment and have limitations for inferring functionality of novel genes, the developed program identifies potentially important interaction sites that are structurally and functionally distinctive from other subsequences, within a query sequence with exploratory data analysis. The program identifies matching Three codon DNA 9-mers in a query sequence, and dynamically creates comparative dataset of 54 genera, based on codon usage bias information retrieved from the genus specific usage databases. The program applies correlation Principal Component Analysis in conjunction with K-means clustering method to the comparative dataset. The program identifies outliers; Three codon DNA 9-mers, assigned into a cluster with a single member or with only a few members, are often outliers with important structures that may play roles in molecular interaction. In the third project, I developed a novel bioinformatics program called Posts (POsition Specific genetic code Tables) that assigns a codon into an amino acid group according to the codon position. The standard genetic code table may be more readily applicable to the genes whose genetic codes comply with the standard biological coding rules obtained from model organisms grown under laboratory condition. However, it may be insufficient for studying evolutions of genetic codes that may provide important information about codon properties. The mainstream hypotheses of genetic code origin suggested that codon position played important roles in the evolution of genetic codes. As a case study, we investigated irregular codons in 187 mitochondrial genomes of plants, lichen-forming fungi, endophytic fungi, and AMF. Each column of the Post contains 16 codons and the amino acids encoded by these are called an amino acid characteristics group (A.A. Char Group). Based on A.A. Char Group, an irregular codon can be classified into within-column type or trans-column type. The majority of the identified irregular codons belonged to the within-column type. The Post may offer new perspectives on codon property and codon assignment. The developed program is freely available at www.codon.kr. Taken together, the developed programs, the SeSaMe, the SeSaMe PS Function, and the Post, provide important research tools for advancing our knowledge of AMF genomics and for studying their symbiotic relations with associated microorganisms. Keywords: Sesame; Spore associated Symbiotic Microbes; Symbiosis; Sesame PS function; Arbuscular mycorrhizal fungi; Three codon DNA 9-mer; Amino acid characteristics; Secondary structure; Taxonomical classification; Position specific functional analysis; Position specific genetic code tables; Post; Comparative study; Mitochondrial genom

    Biochemical analysis of translational recording driven by 2A peptide

    Get PDF
    PhD2A/2A-like peptides are short sequences (20-30 amino acids) encoded predominantly within open reading frames (ORFs) of RNA viruses. They drive a non-canonical translation, in which the nascent chain is released from the ribosome at a sense (proline) codon, followed by continued translation to generate a separate downstream protein, initiated from the same proline codon. The aim of this study is to investigate the role of ribosomal factors in the 2A reaction in Saccharomyces cerevisiae cells. Results obtained showed that reduced activity of eRF1/3 inhibits the 2A reaction. This inhibition did not strongly correlate with the effect that mutations have on termination at stop codons. In particular, several mutations within the NIKS motif, which is essential for stop codon recognition, had minimal effect on the 2A reaction. To confirm these results, we developed a new reporter to investigate the 2A activity, where the green fluorescent protein (GFP) sequence was separated with a 2A sequence, between residues 157 and 158. This reporter was utilised to confirm the effects of eRF1 mutations, previously assessed by immunoprecipitation, and results, observed by flow cytometry, revealed consistency in terms of the role of eRFs in the 2A reaction. In summary, these observations provide evidences supporting recruitment of eRFs to the ribosome to drive the non-canonical termination event that releases the first part of the 2A reaction.The Ministry of Higher Education and the University of Mosul/IRA

    Computational Analyses of mRNA Ribosome Loading in Arabidopsis Thaliana

    Get PDF
    Translation of mRNA into protein is a critical step in gene expression, but the principles guiding its regulation at the genome level are not completely understood. Translation can be quantified at a genome scale by measuring the ribosome loading of mRNA—the extent to which mRNA is associated with ribosomes. In this dissertation, I present investigations into how genome-wide ribosome loading is controlled in Arabidopsis thaliana. In chapter 1, I give an overview of regulation of ribosome loading and translation. In chapter 2, I present research demonstrating for the first time that genome-wide ribosome loading in plants is partially controlled by the circadian clock. In chapter 3, I present a study of a computational model that describes how various biochemical steps control ribosome loading. And in chapter 4, I conclude by briefly summarizing the dissertation as a whole and discussing future perspectives

    Minimal models of evolution: germline ïŹtness effects of cancer mutations and stochastic tunneling under strong recombination

    Get PDF
    In a time where data on the genetic make-up of organisms is available in abundance, the theory of evolution is of immediate importance to answer key questions of biology: How can one explain the variation seen in the DNA of different organisms and species? What are the effects of changes in the DNA on the function of cells? What are the driving mechanisms of diseases with a genetic component such as cancer? Minimal mathematical models of evolution provide a basis for the interpretation of DNA data. The explanations they offer are concrete and testable, their assumptions and limitations explicit. The application and further development of minimal evolution models is the main theme of this work. In the ïŹrst part, the functional effects of mutations found in cancer cells are analyzed from the perspective of germline evolution. This is the process that produced the DNA of organisms as we see it today. Mutations have an effect on the ïŹtness of healthy cells. This impact can be estimated from the variation seen in the sequences of protein domains. It is found that this evolutionarily informed conservation score has utility to identify cancer driver genes, especially if they are tumor suppressor genes. The relevance of this ïŹtness scale for cancer mutations is demonstrated on a data set of mutations in protein kinase genes. This analysis is followed by an application of Hidden Markov Models (HMM) to the detection of signals of positive selection in cancer mutation data. Cancer as an evolutionary process of cells is markedly different from the process of germline evolution. Cancer-speciïŹc selection can be seen in genes, whose activity or lack thereof is essential for the progress of cancer. These cancer genes exhibit an increased rate of amino acid changing mutations, beyond the level expected by chance. The identiïŹcation of these genes is a statistical task for which HMM are shown to be most suitable. Finally, an extended mathematical model of evolution is analyzed which describes the adaptation of a sexually reproducing population to a global ïŹtness maximum via compensatory mutations. In a two-locus/two-allele model, the compound effects of mutation, selection, genetic drift, recombination and sign epistasis lead to the interesting situation of adaption via the crossing of a ïŹtness valley in genotype space. This bottleneck can be overcome by rare large ïŹ‚uctuations in the allele frequencies overcoming the effect of recombinatorial reshufïŹ‚ing. The relevant time scales are derived for a parameter regime that includes large recombination

    Transcript Mapping in Human Cytomegalovirus Strain AD169

    Get PDF
    Human cytomegalovirus (HCMV) is of considerable medical importance, with infection in utero being a major health risk for the developing foetus, causing a variety of neonatal abnormalities including deafness, physical abnormality and mental retardation. HCMV also poses a life-threatening risk to immunosuppressed individuals such as allograft recipients and HIV-infected people. HCMV is responsible for the blindness due to retinitis that can affect some AIDS patients. The gene content of HCMV is less well understood than that of any other human herpesvirus. This reflects the large size and complexity of the genome, and also the lack of a laboratory strain with the full genetic complement of wild type virus. The complete DNA sequence (229,354 bp) of HCMV strain AD 169 was published in 1990, and the genome was predicted to contain 208 protein-coding open reading frames (ORFs). This is not likely to be an accurate estimate of the actual number of genes, as the criteria employed to identify coding regions were necessarily arbitrary and applied without the benefit of comparisons with other betaherpesviruses. Moreover, HCMV strain Toledo and other low passage isolates were later found to possess a 15 kbp genome segment absent from AD 169. Recently, the gene content of HCMV has been revised by comparison to the chimpanzee cytomegalovirus (CCMV) sequence (241,087 bp), and the number of protein-coding genes in AD 169 is now estimated at 145, several of which are novel. It is anticipated that this picture of the gene content of HCMV will be improved further. The HCMV genome contains a set of 41 conserved herpesvirus-common 'core' genes, which are arranged in blocks that maintain relative position and orientation in different herpesviruses and reflect evolution from a common ancestor. The majority of genes are not spliced and overall the genome has relatively few polyadenylation signals. At the outset of this project, 12 HCMV genes had been shown experimentally to be spliced, and more spliced genes probably remained to be identified. Ten different families of related genes (RL11, US6, US22, OCR, UL25, UL82, UL146, US1, US12 and US22) have been recognised in HCMV that appear to have been generated by gene duplication events. The US22 gene family contains 13 distantly related members (UL23, UL24, UL26, UL36, UL43, US22, US23, US24, US26, TRS1 and IRS1) sharing one or more of four conserved amino acid sequence motifs. Three members of this family (UL36, TRS1 and IRS1) have been reported as exhibiting transcriptional trans-activating properties in transient transfection assays, indicating that US22 genes are likely to be regulatory proteins. Moreover, since each of the sequenced betaherpesviruses contains a similar number of US22 genes, it is anticipated that these genes provide important functions during virus replication. Although the AD 169 genome was sequenced over ten years ago, the products of a large number of HCMV genes have not been identified, and the assignment of gene function is largely based on sequence similarity to homologous genes in herpes simplex virus type 1 (HSV-1). Transcript mapping data are also fragmentary. The purpose of this study was to evaluate transcription of a selection of AD 169 genes, including several that are conserved in CCMV and some that appear unlikely to encode functional proteins because the HCMV ORFs are not conserved in CCMV. Primary use was made of northern blotting, RT-PCR and RACE techniques, employing RNA isolated from infected human fibroblasts. Three groups of genes were analysed: the 13 members of the US22 gene family; the 14 ORFs in TRL and 30 adjacent ORFs at the left end of UL; and the novel spliced genes UL128 and UL131A. Transcripts were detected by northern blotting for nine members of the US22 family, and 5'- and 3'-ends were identified for eight. Failure to obtain data for the other members analysed was probably due to transcription at low levels. RNAs were identified for most ORFs in TRL and the adjacent part of UL. Most of the 5'-ends are located 20-30 bp downstream from TATA elements, and all the 3'-ends are located 20-24 bp downstream from polyadenylation signals. The 5'-ends of two genes (UL18 and US24) appeared to be located downstream from the first ATG codon in the relevant ORF. Transcripts were detected for five ORFs in TRL and one in UL that appear unlikely to encode proteins. Certain ORFs in TRL and UL have more than one 5'-end, suggesting that they are transcribed in a complex manner

    New algorithms and methods for protein and DNA sequence comparison

    Get PDF

    Development of a strategy for genetic transformation of plant mitochondria

    Get PDF
    • 

    corecore