86 research outputs found

    Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-coding RNAs (ncRNAs) are an emerging focus for both computational analysis and experimental research, resulting in a growing number of novel, non-protein coding transcripts with often unknown functions. Whole genome screens in higher eukaryotes, for example, provided evidence for a surprisingly large number of ncRNAs. To supplement these searches, we performed a computational analysis of seven yeast species and searched for new ncRNAs and RNA motifs.</p> <p>Results</p> <p>A comparative analysis of the genomes of seven yeast species yielded roughly 2800 genomic loci that showed the hallmarks of evolutionary conserved RNA secondary structures. A total of 74% of these regions overlapped with annotated non-coding or coding genes in yeast. Coding sequences that carry predicted structured RNA elements belong to a limited number of groups with common functions, suggesting that these RNA elements are involved in post-transcriptional regulation and/or cellular localization. About 700 conserved RNA structures were found outside annotated coding sequences and known ncRNA genes. Many of these predicted elements overlapped with UTR regions of particular classes of protein coding genes. In addition, a number of RNA elements overlapped with previously characterized antisense transcripts. Transcription of about 120 predicted elements located in promoter regions and other, previously un-annotated, intergenic regions was supported by tiling array experiments, ESTs, or SAGE data.</p> <p>Conclusion</p> <p>Our computational predictions strongly suggest that yeasts harbor a substantial pool of several hundred novel ncRNAs. In addition, we describe a large number of RNA structures in coding sequences and also within antisense transcripts that were previously characterized using tiling arrays.</p

    Multiple sequence alignments of partially coding nucleic acid sequences

    Get PDF
    BACKGROUND: High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes. RESULTS: The standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW. CONCLUSION: We demonstrate the applicability of codaln to various biologically relevant types of sequences (bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements

    Conserved RNA secondary structures in viral genomes: a survey

    Get PDF
    The genomes of RNA viruses often carry conserved RNA structures that perform vital functions during the life cycle of the virus. Such structures can be detected using a combination of structure prediction and co-variation analysis. Here we present results from pilot studies on a variety of viral families performed during bioinformatics computer lab courses in past years

    Interleukin-6-dependent survival of multiple myeloma cells involves the Stat3-mediated induction of micro-RNA-21 through a highly conserved enhancer

    Get PDF
    Signal transducer and activator of transcription 3 (Stat3) is implicated in the pathogenesis of many malignancies and essential for IL-6–dependent survival and growth of multiple myeloma cells. Here, we demonstrate that the gene encoding oncogenic microRNA-21 (miR-21) is controlled by an upstream enhancer containing 2 Stat3 binding sites strictly conserved since the first observed evolutionary appearance of miR-21 and Stat3. MiR-21 induction by IL-6 was strictly Stat3 dependent. Ectopically raising miR-21 expression in myeloma cells in the absence of IL-6 significantly reduced their apoptosis levels. These data provide strong evidence that miR-21 induction contributes to the oncogenic potential of Stat3

    Absolute quantification of cohesin, CTCF and their regulators in human cells.

    No full text
    The organisation of mammalian genomes into loops and topologically associating domains (TADs) contributes to chromatin structure, gene expression and recombination. TADs and many loops are formed by cohesin and positioned by CTCF. In proliferating cells, cohesin also mediates sister chromatid cohesion, which is essential for chromosome segregation. Current models of chromatin folding and cohesion are based on assumptions of how many cohesin and CTCF molecules organise the genome. Here we have measured absolute copy numbers and dynamics of cohesin, CTCF, NIPBL, WAPL and sororin by mass spectrometry, fluorescence-correlation spectroscopy and fluorescence recovery after photobleaching in HeLa cells. In G1-phase, there are similar to 250,000 nuclear cohesin complexes, of which similar to 160,000 are chromatin-bound. Comparison with chromatin immunoprecipitation-sequencing data implies that some genomic cohesin and CTCF enrichment sites are unoccupied in single cells at any one time. We discuss the implications of these findings for how cohesin can contribute to genome organisation and cohesion

    Can comprehensive background knowledge be incorporated into substitution models to improve phylogenetic analyses? A case study on major arthropod relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Whenever different data sets arrive at conflicting phylogenetic hypotheses, only testable causal explanations of sources of errors in at least one of the data sets allow us to critically choose among the conflicting hypotheses of relationships. The large (28S) and small (18S) subunit rRNAs are among the most popular markers for studies of deep phylogenies. However, some nodes supported by this data are suspected of being artifacts caused by peculiarities of the evolution of these molecules. Arthropod phylogeny is an especially controversial subject dotted with conflicting hypotheses which are dependent on data set and method of reconstruction. We assume that phylogenetic analyses based on these genes can be improved further i) by enlarging the taxon sample and ii) employing more realistic models of sequence evolution incorporating non-stationary substitution processes and iii) considering covariation and pairing of sites in rRNA-genes.</p> <p>Results</p> <p>We analyzed a large set of arthropod sequences, applied new tools for quality control of data prior to tree reconstruction, and increased the biological realism of substitution models. Although the split-decomposition network indicated a high noise content in the data set, our measures were able to both improve the analyses and give causal explanations for some incongruities mentioned from analyses of rRNA sequences. However, misleading effects did not completely disappear.</p> <p>Conclusion</p> <p>Analyses of data sets that result in ambiguous phylogenetic hypotheses demand for methods, which do not only filter stochastic noise, but likewise allow to differentiate phylogenetic signal from systematic biases. Such methods can only rely on our findings regarding the evolution of the analyzed data. Analyses on independent data sets then are crucial to test the plausibility of the results. Our approach can easily be extended to genomic data, as well, whereby layers of quality assessment are set up applicable to phylogenetic reconstructions in general.</p

    ESCO1 and CTCF enable formation of long chromatin loops by protecting cohesinSTAG1 from WAPL.

    Get PDF
    Eukaryotic genomes are folded into loops. It is thought that these are formed by cohesin complexes via extrusion, either until loop expansion is arrested by CTCF or until cohesin is removed from DNA by WAPL. Although WAPL limits cohesin's chromatin residence time to minutes, it has been reported that some loops exist for hours. How these loops can persist is unknown. We show that during G1-phase, mammalian cells contain acetylated cohesinSTAG1 which binds chromatin for hours, whereas cohesinSTAG2 binds chromatin for minutes. Our results indicate that CTCF and the acetyltransferase ESCO1 protect a subset of cohesinSTAG1 complexes from WAPL, thereby enable formation of long and presumably long-lived loops, and that ESCO1, like CTCF, contributes to boundary formation in chromatin looping. Our data are consistent with a model of nested loop extrusion, in which acetylated cohesinSTAG1 forms stable loops between CTCF sites, demarcating the boundaries of more transient cohesinSTAG2 extrusion activity

    Improving the accuracy of predicting secondary structure for aligned RNA sequences

    Get PDF
    Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms

    Accurate and efficient reconstruction of deep phylogenies from structured RNAs

    Get PDF
    Ribosomal RNA (rRNA) genes are probably the most frequently used data source in phylogenetic reconstruction. Individual columns of rRNA alignments are not independent as a consequence of their highly conserved secondary structures. Unless explicitly taken into account, these correlation can distort the phylogenetic signal and/or lead to gross overestimates of tree stability. Maximum likelihood and Bayesian approaches are of course amenable to using RNA-specific substitution models that treat conserved base pairs appropriately, but require accurate secondary structure models as input. So far, however, no accurate and easy-to-use tool has been available for computing structure-aware alignments and consensus structures that can deal with the large rRNAs. The RNAsalsa approach is designed to fill this gap. Capitalizing on the improved accuracy of pairwise consensus structures and informed by a priori knowledge of group-specific structural constraints, the tool provides both alignments and consensus structures that are of sufficient accuracy for routine phylogenetic analysis based on RNA-specific substitution models. The power of the approach is demonstrated using two rRNA data sets: a mitochondrial rRNA set of 26 Mammalia, and a collection of 28S nuclear rRNAs representative of the five major echinoderm groups

    MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

    Get PDF
    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment
    corecore