34 research outputs found

    Open reading frames provide a rich pool of potential natural antisense transcripts in fungal genomes

    Get PDF
    Natural antisense transcripts are reported from all kingdoms of life and several recent reports of genomewide screens indicate that they are widely distributed. These transcripts seem to be involved in various biological functions and may govern the expression of their respective sense partner. Very little, however, is known about the degree of evolutionary conservation of antisense transcripts. Furthermore, none of the earlier analyses has studied whether antisense relationships are solely dual or involved in more complex relationships. Here we present a systematic screen for cis- and trans-located antisense transcripts based on open reading frames (ORFs) from five fungal species. The relative number of ORFs involved in antisense relationships varies greatly between the five species. In addition, other significant differences are found between the species, such as the mean length of the antisense region. The majority of trans-located antisense transcripts is found to be involved in complex relationships, resulting in highly connected networks. The analysis of the degree of evolutionary conservation of antisense transcripts shows that most antisense transcripts have no ortholog in any other species. An annotation of antisense transcripts based on Gene Ontology directs to common terms and shows that proteins of genes involved in antisense relationships preferentially localize to the nucleus with common functions in the regulation or maintenance of nucleic acids

    Learning Channel Importance for High Content Imaging with Interpretable Deep Input Channel Mixing

    Full text link
    Uncovering novel drug candidates for treating complex diseases remain one of the most challenging tasks in early discovery research. To tackle this challenge, biopharma research established a standardized high content imaging protocol that tags different cellular compartments per image channel. In order to judge the experimental outcome, the scientist requires knowledge about the channel importance with respect to a certain phenotype for decoding the underlying biology. In contrast to traditional image analysis approaches, such experiments are nowadays preferably analyzed by deep learning based approaches which, however, lack crucial information about the channel importance. To overcome this limitation, we present a novel approach which utilizes multi-spectral information of high content images to interpret a certain aspect of cellular biology. To this end, we base our method on image blending concepts with alpha compositing for an arbitrary number of channels. More specifically, we introduce DCMIX, a lightweight, scaleable and end-to-end trainable mixing layer which enables interpretable predictions in high content imaging while retaining the benefits of deep learning based methods. We employ an extensive set of experiments on both MNIST and RXRX1 datasets, demonstrating that DCMIX learns the biologically relevant channel importance without scarifying prediction performance.Comment: Accepted @ DAGM German Conference on Pattern Recognition (GCPR) 202

    Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-coding RNAs (ncRNAs) are an emerging focus for both computational analysis and experimental research, resulting in a growing number of novel, non-protein coding transcripts with often unknown functions. Whole genome screens in higher eukaryotes, for example, provided evidence for a surprisingly large number of ncRNAs. To supplement these searches, we performed a computational analysis of seven yeast species and searched for new ncRNAs and RNA motifs.</p> <p>Results</p> <p>A comparative analysis of the genomes of seven yeast species yielded roughly 2800 genomic loci that showed the hallmarks of evolutionary conserved RNA secondary structures. A total of 74% of these regions overlapped with annotated non-coding or coding genes in yeast. Coding sequences that carry predicted structured RNA elements belong to a limited number of groups with common functions, suggesting that these RNA elements are involved in post-transcriptional regulation and/or cellular localization. About 700 conserved RNA structures were found outside annotated coding sequences and known ncRNA genes. Many of these predicted elements overlapped with UTR regions of particular classes of protein coding genes. In addition, a number of RNA elements overlapped with previously characterized antisense transcripts. Transcription of about 120 predicted elements located in promoter regions and other, previously un-annotated, intergenic regions was supported by tiling array experiments, ESTs, or SAGE data.</p> <p>Conclusion</p> <p>Our computational predictions strongly suggest that yeasts harbor a substantial pool of several hundred novel ncRNAs. In addition, we describe a large number of RNA structures in coding sequences and also within antisense transcripts that were previously characterized using tiling arrays.</p

    Genomic organization of eukaryotic tRNAs

    Get PDF
    BACKGROUND: Surprisingly little is known about the organization and distribution of tRNA genes and tRNA-related sequences on a genome-wide scale. While tRNA gene complements are usually reported in passing as part of genome annotation efforts, and peculiar features such as the tandem arrangements of tRNA gene in Entamoeba histolytica have been described in some detail, systematic comparative studies are rare and mostly restricted to bacteria. We therefore set out to survey the genomic arrangement of tRNA genes and pseudogenes in a wide range of eukaryotes to identify common patterns and taxon-specific peculiarities. RESULTS: In line with previous reports, we find that tRNA complements evolve rapidly and tRNA gene and pseudogene locations are subject to rapid turnover. At phylum level, the distributions of the number of tRNA genes and pseudogenes numbers are very broad, with standard deviations on the order of the mean. Even among closely related species we observe dramatic changes in local organization. For instance, 65% and 87% of the tRNA genes and pseudogenes are located in genomic clusters in zebrafish and stickleback, resp., while such arrangements are relatively rare in the other three sequenced teleost fish genomes. Among basal metazoa, Trichoplax adherens has hardly any duplicated tRNA gene, while the sea anemone Nematostella vectensis boasts more than 17000 tRNA genes and pseudogenes. Dramatic variations are observed even within the eutherian mammals. Higher primates, for instance, have 616 +/- 120 tRNA genes and pseudogenes of which 17% to 36% are arranged in clusters, while the genome of the bushbaby Otolemur garnetti has 45225 tRNA genes and pseudogenes of which only 5.6% appear in clusters. In contrast, the distribution is surprisingly uniform across plant genomes. Consistent with this variability, syntenic conservation of tRNA genes and pseudogenes is also poor in general, with turn-over rates comparable to those of unconstrained sequence elements. Despite this large variation in abundance in Eukarya we observe a significant correlation between the number of tRNA genes, tRNA pseudogenes, and genome size. CONCLUSIONS: The genomic organization of tRNA genes and pseudogenes shows complex lineage-specific patterns characterized by an extensive variability that is in striking contrast to the extreme levels of sequence-conservation of the tRNAs themselves. The comprehensive analysis of the genomic organization of tRNA genes and pseudogenes in Eukarya provides a basis for further studies into the interplay of tRNA gene arrangements and genome organization in general

    Non-coding RNA annotation of the genome of Trichoplax adhaerens

    Get PDF
    A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAsβ€”in particular rRNAs, tRNAs and some snRNAsβ€”we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome

    Sequence–structure relationships in yeast mRNAs

    Get PDF
    It is generally accepted that functionally important RNA structure is more conserved than sequence due to compensatory mutations that may alter the sequence without disrupting the structure. For small RNA molecules sequence–structure relationships are relatively well understood. However, structural bioinformatics of mRNAs is still in its infancy due to a virtual absence of experimental data. This report presents the first quantitative assessment of sequence–structure divergence in the coding regions of mRNA molecules based on recently published transcriptome-wide experimental determination of their base paring patterns. Structural resemblance in paralogous mRNA pairs quickly drops as sequence identity decreases from 100% to 85–90%. Structures of mRNAs sharing sequence identity below roughly 85% are essentially uncorrelated. This outcome is in dramatic contrast to small functional non-coding RNAs where sequence and structure divergence are correlated at very low levels of sequence similarity. The fact that very similar mRNA sequences can have vastly different secondary structures may imply that the particular global shape of base paired elements in coding regions does not play a major role in modulating gene expression and translation efficiency. Apparently, the need to maintain stable three-dimensional structures of encoded proteins places a much higher evolutionary pressure on mRNA sequences than on their RNA structures

    Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae

    Get PDF
    Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5β€² or 3β€² untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements

    A comparative genome-wide study of ncRNAs in trypanosomatids

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent studies have provided extensive evidence for multitudes of non-coding RNA (ncRNA) transcripts in a wide range of eukaryotic genomes. ncRNAs are emerging as key players in multiple layers of cellular regulation. With the availability of many whole genome sequences, comparative analysis has become a powerful tool to identify ncRNA molecules. In this study, we performed a systematic genome-wide in silico screen to search for novel small ncRNAs in the genome of <it>Trypanosoma brucei </it>using techniques of comparative genomics.</p> <p>Results</p> <p>In this study, we identified by comparative genomics, and validated by experimental analysis several novel ncRNAs that are conserved across multiple trypanosomatid genomes. When tested on known ncRNAs, our procedure was capable of finding almost half of the known repertoire through homology over six genomes, and about two-thirds of the known sequences were found in at least four genomes. After filtering, 72 conserved unannotated sequences in at least four genomes were found, 29 of which, ranging in size from 30 to 392 nts, were conserved in all six genomes. Fifty of the 72 candidates in the final set were chosen for experimental validation. Eighteen of the 50 (36%) were shown to be expressed, and for 11 of them a distinct expression product was detected, suggesting that they are short ncRNAs. Using functional experimental assays, five of the candidates were shown to be novel H/ACA and C/D snoRNAs; these included three sequences that appear as singletons in the genome, unlike previously identified snoRNA molecules that are found in clusters. The other candidates appear to be novel ncRNA molecules, and their function is, as yet, unknown.</p> <p>Conclusions</p> <p>Using comparative genomic techniques, we predicted 72 sequences as ncRNA candidates in <it>T. brucei</it>. The expression of 50 candidates was tested in laboratory experiments. This resulted in the discovery of 11 novel short ncRNAs in procyclic stage <it>T. brucei</it>, which have homologues in the other trypansomatids. A few of these molecules are snoRNAs, but most of them are novel ncRNA molecules. Based on this study, our analysis suggests that the total number of ncRNAs in trypanosomatids is in the range of several hundred.</p

    A Cytosine Methyltransferase Homologue Is Essential for Sexual Development in Aspergillus nidulans

    Get PDF
    Background: The genome defense processes RIP (repeat-induced point mutation) in the filamentous fungus Neurospora crassa, and MIP (methylation induced premeiotically) in the fungus Ascobolus immersus depend on proteins with DNA methyltransferase (DMT) domains. Nevertheless, these proteins, RID and Masc1, respectively, have not been demonstrated to have DMT activity. We discovered a close homologue in Aspergillus nidulans, a fungus thought to have no methylation and no genome defense system comparable to RIP or MIP. Principal Findings: We report the cloning and characterization of the DNA methyltransferase homologue A (dmtA) gene from Aspergillus nidulans. We found that the dmtA locus encodes both a sense (dmtA) and an anti-sense transcript (tmdA). Both transcripts are expressed in vegetative, conidial and sexual tissues. We determined that dmtA, but not tmdA, is required for early sexual development and formation of viable ascospores. We also tested if DNA methylation accumulated in any of the dmtA/tmdA mutants we constructed, and found that in both asexual and sexual tissues, these mutants, just like wild-type strains, appear devoid of DNA methylation. Conclusions/Significance: Our results demonstrate that a DMT homologue closely related to proteins implicated in RIP and MIP has an essential developmental function in a fungus that appears to lack both DNA methylation and RIP or MIP. It remains formally possible that DmtA is a bona fide DMT, responsible for trace, undetected DNA methylation that i
    corecore