4 research outputs found

    R2R - software to speed the depiction of aesthetic consensus RNA secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or interactive programs specific to RNA. Unfortunately, the use of applications like Illustrator is extremely time consuming, while existing RNA-specific programs produce figures that are useful, but usually not of the same aesthetic quality as those produced at great cost in Illustrator. Additionally, most existing RNA-specific applications are designed for drawing single RNA molecules, not consensus diagrams.</p> <p>Results</p> <p>We created R2R, a computer program that facilitates the generation of aesthetic and readable drawings of RNA consensus diagrams in a fraction of the time required with general-purpose drawing programs. Since the inference of a consensus RNA structure typically requires a multiple-sequence alignment, the R2R user annotates the alignment with commands directing the layout and annotation of the RNA. R2R creates SVG or PDF output that can be imported into Adobe Illustrator, Inkscape or CorelDRAW. R2R can be used to create consensus sequence and secondary structure models for novel RNA structures or to revise models when new representatives for known RNA classes become available. Although R2R does not currently have a graphical user interface, it has proven useful in our efforts to create 100 schematic models of distinct noncoding RNA classes.</p> <p>Conclusions</p> <p>R2R makes it possible to obtain high-quality drawings of the consensus sequence and structural models of many diverse RNA structures with a more practical amount of effort. R2R software is available at <url>http://breaker.research.yale.edu/R2R</url> and as an Additional file.</p

    A transcriptional sketch of a primary human breast cancer by 454 deep sequencing

    Get PDF
    Background: The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results: We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion: Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling

    Visual Analysis of Form and Function in Computational Biology

    Get PDF
    In the last years, the amount of available data in the field of computational biology steadily increased. In order to be able to analyze these data, various algorithms have been developed by bioinformaticians to process them efficiently. Moreover, computational models were developed to predict for instance biological relationships of species. Furthermore, the prediction of properties like the structure of certain biological molecules is modeled by complex algorithms. Despite these advances in handling such complicated tasks with automated workflows and a huge variety of freely available tools, the expert still needs to supervise the data analysis pipeline inspecting the quality of both the input data and the results. Additionally, choosing appropriate parameters of a model is quite involved. Visual support puts the expert into the data analysis loop by providing visual encodings of the data and the analysis results together with interaction facilities. In order to meet the requirements of the experts, the visualizations usually have to be adapted for the application purpose or completely new representations have to be developed. Furthermore, it is necessary to combine these visualizations with the algorithms of the experts to prepare the data. These in-situ visualizations are needed due to the amount of data handled within the analysis pipeline in this domain. In this thesis, algorithms and visualizations are presented that were developed in two different research areas of computational biology. On the one hand, the multi-replicate peak-caller Sierra Platinum was developed, which is capable of predicting significant regions of histone modifications occurring in genomes based on experimentally generated input data. This algorithm can use several input data sets simultaneously to calculate statistically meaningful results. Multiple quality measurements and visualizations were integrated into to the data analysis pipeline to support the analyst. Based on these in-situ visualizations, the analyst can modify the parameters of the algorithm to obtain the best results for a given input data set. Furthermore, Sierra Platinum and related algorithms were benchmarked against an artificial data set to evaluate the performance under specific conditions of the input data set, e.g., low read quality or undersequenced data. It turned out that Sierra Platinum achieved the best results in every test scenario. Additionally, the performance of Sierra Platinum was evaluated with experimental data confirming existing knowledge. It should be noticed that the results of the other algorithms seemed to contradict this knowledge. On the other hand, this thesis describes two new visualizations for RNA secondary structures. First, the interactive dot plot viewer iDotter is described that is able to visualize RNA secondary structure predictions as a web service. Several interaction techniques were implemented that support the analyst exploring RNA secondary structure dot plots. iDotter provides an API to share or archive annotated dot plots. Additionally, the API enables the embedding of iDotter in existing data analysis pipelines. Second, the algorithm RNApuzzler is presented that generates (outer-)planar graph drawings for all RNA secondary structure predictions. Previously presented algorithms failed in always producing crossing-free graphs. First, several drawing constraints were derived from the literature. Based on these, the algorithm RNAturtle was developed that did not always produced planar drawings. Therefore, some drawing constraints were relaxed and additional drawing constraints were established. Building on these modified constraints, RNApuzzler was developed. It takes the drawing generated by RNAturtle as an input and resolves the possible intersections of the graph. Due to the resolving mechanism, modified loops can become very large during the intersection resolving step. Therefore, an optimization was developed. During a post-processing step the radii of the heavily modified loops are reduced to a minimum. Based on the constraints and the intersection resolving mechanism, it can be shown that RNApuzzler is able to produce planar drawings for any RNA secondary structure. Finally, the results of RNApuzzler are compared to other algorithms

    Analysis of Interacting Nucleic Acids in Dilute Solutions

    Get PDF
    Motivated by the growing demand for analysis tools for diverse natural and engineered DNA and RNA systems, we develop a general theory and set of computational algorithms to perform thermodynamic analysis of dilute reactive solutions and then apply these techniques to interacting nucleic acids. The theory correctly accounts for the effects of indistinguishability in partition function calculations for complexes of interacting strands. With partition functions in hand, the unique complex concentrations corresponding to thermodynamic equilibrium are obtained by solving a convex programming problem. Partition function and concentration information can then be used to calculate equilibrium base-pairing observables corresponding to experimentally measurable properties. The underlying physics and mathematical formulation of these problems lead to an interesting blend of approaches, including ideas from graph theory, group theory, dynamic programming, combinatorics, convex optimization, and Lagrange duality. To make these analysis tools available to researchers worldwide, we present NUPACK, a web-based software suite for thermodynamic analysis of nucleic acids. Its efficacy is demonstrated in example calculations and the results are shown to be in agreement with experiment. Finally, the thermodynamic properties of a DNA-based triggered self-assembly device [1] are analyzed using NUPACK and extensions of its tools. The computational results complement experimental studies, exposing novel properties about the system and dictating further research.</p
    corecore