1,414 research outputs found

    RNA FRABASE version 1.0: an engine with a database to search for the three-dimensional fragments within RNA structures

    Get PDF
    The RNA FRABASE is a web-accessible engine with a relational database, which allows for the automatic search of user-defined, 3D RNA fragments within a set of RNA structures. This is a new tool to search and analyse RNA structures, directed at the 3D structure modelling. The user needs to input either RNA sequence(s) and/or secondary structure(s) given in a ‘dot-bracket’ notation. The algorithm searching for the requested 3D RNA fragments is very efficient. As of August 2007, the database contains: (i) RNA sequences and secondary structures, in the ‘dot-bracket’ notation, derived from 1065 protein data bank (PDB)-deposited RNA structures and their complexes, (ii) a collection of atom coordinates of unmodified and modified nucleotide residues occurring in RNA structures, (iii) calculated RNA torsion angles and sugar pucker parameters and (iv) information about base pairs. Advanced query involves filters sensitive to: modified residue contents, experimental method used and limits of conformational parameters. The output list of query-matching RNA fragments gives access to their coordinates in the PDB-format files, ready for direct download and visualization, conformational parameters and information about base pairs. The RNA FRABASE is automatically, monthly updated and is freely accessible at http://rnafrabase.ibch.poznan.pl (mirror at http://cerber.cs.put.poznan.pl/rnadb)

    A method for aligning RNA secondary structures and its application to RNA motif detection

    Get PDF
    BACKGROUND: Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases. RESULTS: We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions. CONCLUSION: With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large

    New algorithms and methods for protein and DNA sequence comparison

    Get PDF

    SAMMate: a GUI tool for processing short read alignments in SAM/BAM format

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next Generation Sequencing (NGS) technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/Map (SAM) or Binary SAM (BAM) format is now standard, biomedical researchers still have difficulty accessing this information.</p> <p>Results</p> <p>We have developed a Graphical User Interface (GUI) software tool named SAMMate. SAMMate allows biomedical researchers to quickly process SAM/BAM files and is compatible with both single-end and paired-end sequencing technologies. SAMMate also automates some standard procedures in DNA-seq and RNA-seq data analysis. Using either standard or customized annotation files, SAMMate allows users to accurately calculate the short read coverage of genomic intervals. In particular, for RNA-seq data SAMMate can accurately calculate the gene expression abundance scores for customized genomic intervals using short reads originating from both exons and exon-exon junctions. Furthermore, SAMMate can quickly calculate a whole-genome signal map at base-wise resolution allowing researchers to solve an array of bioinformatics problems. Finally, SAMMate can export both a wiggle file for alignment visualization in the UCSC genome browser and an alignment statistics report. The biological impact of these features is demonstrated via several case studies that predict miRNA targets using short read alignment information files.</p> <p>Conclusions</p> <p>With just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files. Our software is constantly updated and will greatly facilitate the downstream analysis of NGS data. Both the source code and the GUI executable are freely available under the GNU General Public License at <url>http://sammate.sourceforge.net</url>.</p

    RNA structure analysis : algorithms and applications

    Get PDF
    In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure. Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree model to capture the structural particularities. The novel structure alignment algorithm is implemented using dynamic programming techniques coupled by position-independent scoring matrices. The algorithm can find the optimal global and local alignments between two RNA secondary structures at quadratic time complexity. When applied to searching a structure database, the algorithm can find similar RNA substructures and therefore can be used to identify functional RNA motifs. Extension of the algorithm has also been accomplished to deal with position-dependent scoring matrix in the purpose of aligning multiple structures. All algorithms have been implemented in a package under the name RSmatch and applied to searching mRNA UTR structure database and mining RNA motifs. The experimental results showed high efficiency and effectiveness of the proposed techniques

    Why High-Performance Modelling and Simulation for Big Data Applications Matters

    Get PDF
    Modelling and Simulation (M&S) offer adequate abstractions to manage the complexity of analysing big data in scientific and engineering domains. Unfortunately, big data problems are often not easily amenable to efficient and effective use of High Performance Computing (HPC) facilities and technologies. Furthermore, M&S communities typically lack the detailed expertise required to exploit the full potential of HPC solutions while HPC specialists may not be fully aware of specific modelling and simulation requirements and applications. The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications has created a strategic framework to foster interaction between M&S experts from various application domains on the one hand and HPC experts on the other hand to develop effective solutions for big data applications. One of the tangible outcomes of the COST Action is a collection of case studies from various computing domains. Each case study brought together both HPC and M&S experts, giving witness of the effective cross-pollination facilitated by the COST Action. In this introductory article we argue why joining forces between M&S and HPC communities is both timely in the big data era and crucial for success in many application domains. Moreover, we provide an overview on the state of the art in the various research areas concerned

    RNA motif search with data-driven element ordering

    Get PDF
    BACKGROUND: In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. RESULTS: We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. CONCLUSIONS: We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1074-x) contains supplementary material, which is available to authorized users

    Development of a stochastic simulator for biological systems based on Calculus of Looping Sequences.

    Get PDF
    Molecular Biology produces a huge amount of data concerning the behavior of the single constituents of living organisms. Nevertheless, this reductionism view is not sucient to gain a deep comprehension of how such components interact together at the system level, generating the set of complex behavior we observe in nature. This is the main motivation of the rising of one of the most interesting and recent applications of computer science: Computational Systems Biology, a new science integrating experimental activity and mathematical modeling in order to study the organization principles and the dynamic behavior of biological systems. Among the formalisms that either have been applied to or have been inspired by biological systems there are automata based models, rewrite systems, and process calculi. Here we consider a formalism based on term rewriting called Calculus of Looping Sequences (CLS) aimed to model chemical and biological systems. In order to quantitatively simulate biological systems a stochastic extension of CLS has been developed; it allows to express rule schemata with the simplicity of notation of term rewriting and has some semantic means which are common in process calculi. In this thesis we carry out the study of the implementation of a stochastic simulator for the CLS formalism. We propose an extension of Gillespie's stochastic simulation algorithm that handles rule schemata with rate functions, and we present an efficient bottom-up, pre-processing based, CLS pattern matching algorithm. A simulator implementing the ideas introduced in this thesis, has been developed in F#, a multi-paradigm programming language for .NET framework modeled on OCaml. Although F# is a research project, still under continuous development, it has a product quality performance. It merges seamlessly the object oriented, the functional and the imperative programming paradigms, allowing to exploit the performance, the portability and the tools of .NET framework
    corecore