7,302 research outputs found

    Searching for alternate RNA structures in genomic sequences

    Get PDF
    International audienceWe introduce the concept of RNA multi-structures, that is a formal grammar based framework specifically designed to model a set of alternate RNA secondary structures. Such alternate structures can either be a set of suboptimal foldings, or distinct stable folding states, or variants within an RNA family. We provide several such examples and propose an efficient algorithm to search for RNA multi-structures within a genomic sequence

    Bridging the synaptic gap: neuroligins and neurexin I in Apis mellifera

    Get PDF
    Vertebrate studies show neuroligins and neurexins are binding partners in a trans-synaptic cell adhesion complex, implicated in human autism and mental retardation disorders. Here we report a genetic analysis of homologous proteins in the honey bee. As in humans, the honeybee has five large (31-246 kb, up to 12 exons each) neuroligin genes, three of which are tightly clustered. RNA analysis of the neuroligin-3 gene reveals five alternatively spliced transcripts, generated through alternative use of exons encoding the cholinesterase-like domain. Whereas vertebrates have three neurexins the bee has just one gene named neurexin I (400 kb, 28 exons). However alternative isoforms of bee neurexin I are generated by differential use of 12 splice sites, mostly located in regions encoding LNS subdomains. Some of the splice variants of bee neurexin I resemble the vertebrate alpha- and beta-neurexins, albeit in vertebrates these forms are generated by alternative promoters. Novel splicing variations in the 3' region generate transcripts encoding alternative trans-membrane and PDZ domains. Another 3' splicing variation predicts soluble neurexin I isoforms. Neurexin I and neuroligin expression was found in brain tissue, with expression present throughout development, and in most cases significantly up-regulated in adults. Transcripts of neurexin I and one neuroligin tested were abundant in mushroom bodies, a higher order processing centre in the bee brain. We show neuroligins and neurexins comprise a highly conserved molecular system with likely similar functional roles in insects as vertebrates, and with scope in the honeybee to generate substantial functional diversity through alternative splicing. Our study provides important prerequisite data for using the bee as a model for vertebrate synaptic development.Australian National University PhD Scholarship Award to Sunita Biswas

    A large-scale proteogenomics study of apicomplexan pathogens-Toxoplasma gondii and Neospora caninum

    Get PDF
    Proteomics data can supplement genome annotation efforts, for example being used to confirm gene models or correct gene annotation errors. Here, we present a large‐scale proteogenomics study of two important apicomplexan pathogens: Toxoplasma gondii and Neospora caninum. We queried proteomics data against a panel of official and alternate gene models generated directly from RNASeq data, using several newly generated and some previously published MS datasets for this meta‐analysis. We identified a total of 201 996 and 39 953 peptide‐spectrum matches for T. gondii and N. caninum, respectively, at a 1% peptide FDR threshold. This equated to the identification of 30 494 distinct peptide sequences and 2921 proteins (matches to official gene models) for T. gondii, and 8911 peptides/1273 proteins for N. caninum following stringent protein‐level thresholding. We have also identified 289 and 140 loci for T. gondii and N. caninum, respectively, which mapped to RNA‐Seq‐derived gene models used in our analysis and apparently absent from the official annotation (release 10 from EuPathDB) of these species. We present several examples in our study where the RNA‐Seq evidence can help in correction of the current gene model and can help in discovery of potential new genes

    A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes

    Get PDF
    Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth

    Tetrahymena Genome Database (TGD): a new genomic resource for Tetrahymena thermophila research

    Get PDF
    We have developed a web-based resource (available at ) for researchers studying the model ciliate organism Tetrahymena thermophila. Employing the underlying database structure and programming of the Saccharomyces Genome Database, the Tetrahymena Genome Database (TGD) integrates the wealth of knowledge generated by the Tetrahymena research community about genome structure, genes and gene products with the newly sequenced macronuclear genome determined by The Institute for Genomic Research (TIGR). TGD provides information curated from the literature about each published gene, including a standardized gene name, a link to the genomic locus in our graphical genome browser, gene product annotations utilizing the Gene Ontology, links to published literature about the gene and more. TGD also displays automatic annotations generated for the gene models predicted by TIGR. A variety of tools are available at TGD for searching the Tetrahymena genome, its literature and information about members of the research community

    Comparative genomics of Australian isolates of the wheat stem rust pathogen Puccinia graminis f. sp. tritici reveals extensive polymorphism in candidate effector genes

    Get PDF
    The wheat stem rust fungus Puccinia graminis f. sp. tritici (Pgt) is one of the most destructive pathogens of wheat. In this study, a draft genome was built for a founder Australian Pgt isolate of pathotype (pt.) 21-0 (collected in 1954) by next generation DNA sequencing. A combination of reference-based assembly using the genome of the previously sequenced American Pgt isolate CDL 75-36-700-3 (p7a) and de novo assembly were performed resulting in a 92 Mbp reference genome for Pgt isolate 21-0. Approximately 13 Mbp of de novo assembled sequence in this genome is not present in the p7a reference assembly. This novel sequence is not specific to 21-0 as it is also present in three other Pgt rust isolates of independent origin. The new reference genome was subsequently used to build a pan-genome based on five Australian Pgt isolates. Transcriptomes from germinated urediniospores and haustoria were separately assembled for pt. 21-0 and comparison of gene expression profiles showed differential expression in ∼10% of the genes each in germinated spores and haustoria. A total of 1,924 secreted proteins were predicted from the 21-0 transcriptome, of which 520 were classified as haustorial secreted proteins (HSPs). Comparison of 21-0 with two presumed clonal field derivatives of this lineage (collected in 1982 and 1984) that had evolved virulence on four additional resistance genes (Sr5, Sr11, Sr27, SrSatu) identified mutations in 25 HSP effector candidates. Some of these mutations could explain their novel virulence phenotypes.Authors wish to thank the Two Blades Foundation for financial support. Part of this work was supported through access to facilities managed by Bioplatforms Australia and funded by the Australian Government National Collaborative Research Infrastructure Strategy and Education Investment Fund Super Science Initiative

    Bridging the Synaptic Gap: Neuroligins and Neurexin I in Apis mellifera

    Get PDF
    Vertebrate studies show neuroligins and neurexins are binding partners in a trans-synaptic cell adhesion complex, implicated in human autism and mental retardation disorders. Here we report a genetic analysis of homologous proteins in the honey bee. As in humans, the honeybee has five large (31–246 kb, up to 12 exons each) neuroligin genes, three of which are tightly clustered. RNA analysis of the neuroligin-3 gene reveals five alternatively spliced transcripts, generated through alternative use of exons encoding the cholinesterase-like domain. Whereas vertebrates have three neurexins the bee has just one gene named neurexin I (400 kb, 28 exons). However alternative isoforms of bee neurexin I are generated by differential use of 12 splice sites, mostly located in regions encoding LNS subdomains. Some of the splice variants of bee neurexin I resemble the vertebrate α- and β-neurexins, albeit in vertebrates these forms are generated by alternative promoters. Novel splicing variations in the 3′ region generate transcripts encoding alternative trans-membrane and PDZ domains. Another 3′ splicing variation predicts soluble neurexin I isoforms. Neurexin I and neuroligin expression was found in brain tissue, with expression present throughout development, and in most cases significantly up-regulated in adults. Transcripts of neurexin I and one neuroligin tested were abundant in mushroom bodies, a higher order processing centre in the bee brain. We show neuroligins and neurexins comprise a highly conserved molecular system with likely similar functional roles in insects as vertebrates, and with scope in the honeybee to generate substantial functional diversity through alternative splicing. Our study provides important prerequisite data for using the bee as a model for vertebrate synaptic development

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
    corecore