9 research outputs found

    Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4SALE

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The function of a noncoding RNA sequence is mainly determined by its secondary structure and therefore a family of noncoding RNA sequences is much more conserved on the structural level than on the sequence level. Understanding the function of noncoding RNA sequence families requires two things: a hand-crafted or hand-improved alignment and detailed analyses of the secondary structures. There are several tools available that help performing these tasks, but all of them are specialized and focus on only one aspect, editing the alignment or plotting the secondary structure. The problem is both these tasks need to be performed simultaneously.</p> <p>Findings</p> <p>4SALE is designed to handle sequence and secondary structure information of RNAs synchronously. By including a complete new method of simultaneous visualization and editing RNA sequences and secondary structure information, 4SALE enables to improve and understand RNA sequence and secondary structure evolution much more easily.</p> <p>Conclusion</p> <p>4SALE is a step further for simultaneously handling RNA sequence and secondary structure information. It provides a complete new way of visual monitoring different structural aspects, while editing the alignment. The software is freely available and distributed from its website at <url>http://4sale.bioapps.biozentrum.uni-wuerzburg.de/</url></p

    4SALE – A tool for synchronous RNA sequence and secondary structure alignment and editing

    Get PDF
    BACKGROUND: In sequence analysis the multiple alignment builds the fundament of all proceeding analyses. Errors in an alignment could strongly influence all succeeding analyses and therefore could lead to wrong predictions. Hand-crafted and hand-improved alignments are necessary and meanwhile good common practice. For RNA sequences often the primary sequence as well as a secondary structure consensus is well known, e.g., the cloverleaf structure of the t-RNA. Recently, some alignment editors are proposed that are able to include and model both kinds of information. However, with the advent of a large amount of reliable RNA sequences together with their solved secondary structures (available from e.g. the ITS2 Database), we are faced with the problem to handle sequences and their associated secondary structures synchronously. RESULTS: 4SALE fills this gap. The application allows a fast sequence and synchronous secondary structure alignment for large data sets and for the first time synchronous manual editing of aligned sequences and their secondary structures. This study describes an algorithm for the synchronous alignment of sequences and their associated secondary structures as well as the main features of 4SALE used for further analyses and editing. 4SALE builds an optimal and unique starting point for every RNA sequence and structure analysis. CONCLUSION: 4SALE, which provides an user-friendly and intuitive interface, is a comprehensive toolbox for RNA analysis based on sequence and secondary structure information. The program connects sequence and structure databases like the ITS2 Database to phylogeny programs as for example the CBCAnalyzer. 4SALE is written in JAVA and therefore platform independent. The software is freely available and distributed from the website a

    Unsupervised Meta-Analysis on Diverse Gene Expression Datasets Allows Insight into Gene Function and Regulation

    Get PDF
    Over the past years, microarray databases have increased rapidly in size. While they offer a wealth of data, it remains challenging to integrate data arising from different studies. Here we propose an unsupervised approach of a large-scale meta-analysis on Arabidopsis thaliana whole genome expression datasets to gain additional insights into the function and regulation of genes. Applying kernel principal component analysis and hierarchical clustering, we found three major groups of experimental contrasts sharing a common biological trait. Genes associated to two of these clusters are known to play an important role in indole-3-acetic acid (IAA) mediated plant growth and development or pathogen defense. Novel functions could be assigned to genes including a cluster of serine/threonine kinases that carry two uncharacterized domains (DUF26) in their receptor part implicated in host defense. With the approach shown here, hidden interrelations between genes regulated under different conditions can be unraveled

    XML schemas for common bioinformatic data types and their application in workflow systems

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats.</p> <p>Results</p> <p>Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at <url>http://bioschemas.sourceforge.net</url>, the BioDOM library can be obtained at <url>http://biodom.sourceforge.net</url>.</p> <p>Conclusion</p> <p>The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.</p

    Detecting species-site dependencies in large multiple sequence alignments

    Get PDF
    Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence–site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals

    XML schemas for common bioinformatic data types and their application in workflow systems-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "XML schemas for common bioinformatic data types and their application in workflow systems"</p><p>http://www.biomedcentral.com/1471-2105/7/490</p><p>BMC Bioinformatics 2006;7():490-490.</p><p>Published online 6 Nov 2006</p><p>PMCID:PMC2001303.</p><p></p>ameters) as input and returns the result data as an EBIApplicationResult XML document. The input data can originate from a file (containing sequence information as SequenceML) or from an external data source like the SOAPDB webservice (which returns sequence information in FASTA format). Using BioDOM as a converter between the different data formats, it is quite easy to add another data source. The e2g webservice is a workflow itself and also uses webservice technology to mask repeats (using the RepeatMasker webservice) and match the input sequence data against huge EST databases (using the vmatch webservice). The match result is filtered (depending on input parameters) and returned as an EBIApplicationResult document

    XML schemas for common bioinformatic data types and their application in workflow systems-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "XML schemas for common bioinformatic data types and their application in workflow systems"</p><p>http://www.biomedcentral.com/1471-2105/7/490</p><p>BMC Bioinformatics 2006;7():490-490.</p><p>Published online 6 Nov 2006</p><p>PMCID:PMC2001303.</p><p></p> Se-quenceML object is generated (line 12), afterwards a FASTA formatted file is appended (line 14). Some of the possibilities for further processing, as shown in the comments, are given in lines 16 to 20
    corecore