144 research outputs found

    inPHAP: Interactive visualization of genotype and phased haplotype data

    Get PDF
    Background: To understand individual genomes it is necessary to look at the variations that lead to changes in phenotype and possibly to disease. However, genotype information alone is often not sufficient and additional knowledge regarding the phase of the variation is needed to make correct interpretations. Interactive visualizations, that allow the user to explore the data in various ways, can be of great assistance in the process of making well informed decisions. But, currently there is a lack for visualizations that are able to deal with phased haplotype data. Results: We present inPHAP, an interactive visualization tool for genotype and phased haplotype data. inPHAP features a variety of interaction possibilities such as zooming, sorting, filtering and aggregation of rows in order to explore patterns hidden in large genetic data sets. As a proof of concept, we apply inPHAP to the phased haplotype data set of Phase 1 of the 1000 Genomes Project. Thereby, inPHAP's ability to show genetic variations on the population as well as on the individuals level is demonstrated for several disease related loci. Conclusions: As of today, inPHAP is the only visual analytical tool that allows the user to explore unphased and phased haplotype data interactively. Due to its highly scalable design, inPHAP can be applied to large datasets with up to 100 GB of data, enabling users to visualize even large scale input data. inPHAP closes the gap between common visualization tools for unphased genotype data and introduces several new features, such as the visualization of phased data.Comment: BioVis 2014 conferenc

    Open reading frames provide a rich pool of potential natural antisense transcripts in fungal genomes

    Get PDF
    Natural antisense transcripts are reported from all kingdoms of life and several recent reports of genomewide screens indicate that they are widely distributed. These transcripts seem to be involved in various biological functions and may govern the expression of their respective sense partner. Very little, however, is known about the degree of evolutionary conservation of antisense transcripts. Furthermore, none of the earlier analyses has studied whether antisense relationships are solely dual or involved in more complex relationships. Here we present a systematic screen for cis- and trans-located antisense transcripts based on open reading frames (ORFs) from five fungal species. The relative number of ORFs involved in antisense relationships varies greatly between the five species. In addition, other significant differences are found between the species, such as the mean length of the antisense region. The majority of trans-located antisense transcripts is found to be involved in complex relationships, resulting in highly connected networks. The analysis of the degree of evolutionary conservation of antisense transcripts shows that most antisense transcripts have no ortholog in any other species. An annotation of antisense transcripts based on Gene Ontology directs to common terms and shows that proteins of genes involved in antisense relationships preferentially localize to the nucleus with common functions in the regulation or maintenance of nucleic acids

    Mayday - integrative analytics for expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA Microarrays have become the standard method for large scale analyses of gene expression and epigenomics. The increasing complexity and inherent noisiness of the generated data makes visual data exploration ever more important. Fast deployment of new methods as well as a combination of predefined, easy to apply methods with programmer's access to the data are important requirements for any analysis framework. Mayday is an open source platform with emphasis on visual data exploration and analysis. Many built-in methods for clustering, machine learning and classification are provided for dissecting complex datasets. Plugins can easily be written to extend Mayday's functionality in a large number of ways. As Java program, Mayday is platform-independent and can be used as Java WebStart application without any installation. Mayday can import data from several file formats, database connectivity is included for efficient data organization. Numerous interactive visualization tools, including box plots, profile plots, principal component plots and a heatmap are available, can be enhanced with metadata and exported as publication quality vector files.</p> <p>Results</p> <p>We have rewritten large parts of Mayday's core to make it more efficient and ready for future developments. Among the large number of new plugins are an automated processing framework, dynamic filtering, new and efficient clustering methods, a machine learning module and database connectivity. Extensive manual data analysis can be done using an inbuilt R terminal and an integrated SQL querying interface. Our visualization framework has become more powerful, new plot types have been added and existing plots improved.</p> <p>Conclusions</p> <p>We present a major extension of Mayday, a very versatile open-source framework for efficient micro array data analysis designed for biologists and bioinformaticians. Most everyday tasks are already covered. The large number of available plugins as well as the extension possibilities using compiled plugins and ad-hoc scripting allow for the rapid adaption of Mayday also to very specialized data exploration. Mayday is available at <url>http://microarray-analysis.org</url>.</p

    DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Get PDF
    BACKGROUND: Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. RESULTS: Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. CONCLUSIONS: By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope

    Visual Analysis of Microarray Data from Bioinformatics Applications

    Get PDF
    We present a new application designed for the visual exploration of microarray data.It is based on an extension and adaption of parallel coordinates to support the visual exploration of large and high-dimensional datasets. In particular, we investigate the visual analysis of gene-expression data as generated by microarray experiments. We combine refined visual exploration with statistical methods to a visual analytics approach, which proved to be particularly successful in this application domain. We will demonstrate the usefulness on several multidimensional gene-expression datasets from different bioinformatics applications

    Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-coding RNAs (ncRNAs) are an emerging focus for both computational analysis and experimental research, resulting in a growing number of novel, non-protein coding transcripts with often unknown functions. Whole genome screens in higher eukaryotes, for example, provided evidence for a surprisingly large number of ncRNAs. To supplement these searches, we performed a computational analysis of seven yeast species and searched for new ncRNAs and RNA motifs.</p> <p>Results</p> <p>A comparative analysis of the genomes of seven yeast species yielded roughly 2800 genomic loci that showed the hallmarks of evolutionary conserved RNA secondary structures. A total of 74% of these regions overlapped with annotated non-coding or coding genes in yeast. Coding sequences that carry predicted structured RNA elements belong to a limited number of groups with common functions, suggesting that these RNA elements are involved in post-transcriptional regulation and/or cellular localization. About 700 conserved RNA structures were found outside annotated coding sequences and known ncRNA genes. Many of these predicted elements overlapped with UTR regions of particular classes of protein coding genes. In addition, a number of RNA elements overlapped with previously characterized antisense transcripts. Transcription of about 120 predicted elements located in promoter regions and other, previously un-annotated, intergenic regions was supported by tiling array experiments, ESTs, or SAGE data.</p> <p>Conclusion</p> <p>Our computational predictions strongly suggest that yeasts harbor a substantial pool of several hundred novel ncRNAs. In addition, we describe a large number of RNA structures in coding sequences and also within antisense transcripts that were previously characterized using tiling arrays.</p

    iHAT: interactive Hierarchical Aggregation Table for Genetic Association Data

    Get PDF
    In the search for single-nucleotide polymorphisms which influence the observable phenotype, genome wide association studies have become an important technique for the identification of associations between genotype and phenotype of a diverse set of sequence-based data. We present a methodology for the visual assessment of single-nucleotide polymorphisms using interactive hierarchical aggregation techniques combined with methods known from traditional sequence browsers and cluster heatmaps. Our tool, the interactive Hierarchical Aggregation Table (iHAT), facilitates the visualization of multiple sequence alignments, associated metadata, and hierarchical clusterings. Different color maps and aggregation strategies as well as filtering options support the user in finding correlations between sequences and metadata. Similar to other visualizations such as parallel coordinates or heatmaps, iHAT relies on the human pattern-recognition ability for spotting patterns that might indicate correlation or anticorrelation. We demonstrate iHAT using artificial and real-world datasets for DNA and protein association studies as well as expression Quantitative Trait Locus data

    Prequips—an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics data

    Get PDF
    Summary: We describe an integrative software platform, Prequips, for comparative proteomics-based systems biology analysis that: (i) integrates all information generated from mass spectrometry (MS)-based proteomics as well as from basic proteomics data analysis tools, (ii) visualizes such information for various proteomic analyses via graphical interfaces and (iii) links peptide and protein abundances to external tools often used in systems biology studies. Availability: http://prequips.sourceforge.net Contact: [email protected]

    High-Resolution Transcriptome Maps Reveal Strain-Specific Regulatory Features of Multiple Campylobacter jejuni Isolates

    Get PDF
    Campylobacter jejuni is currently the leading cause of bacterial gastroenteritis in humans. Comparison of multiple Campylobacter strains revealed a high genetic and phenotypic diversity. However, little is known about differences in transcriptome organization, gene expression, and small RNA (sRNA) repertoires. Here we present the first comparative primary transcriptome analysis based on the differential RNA–seq (dRNA–seq) of four C. jejuni isolates. Our approach includes a novel, generic method for the automated annotation of transcriptional start sites (TSS), which allowed us to provide genome-wide promoter maps in the analyzed strains. These global TSS maps are refined through the integration of a SuperGenome approach that allows for a comparative TSS annotation by mapping RNA–seq data of multiple strains into a common coordinate system derived from a whole-genome alignment. Considering the steadily increasing amount of RNA–seq studies, our automated TSS annotation will not only facilitate transcriptome annotation for a wider range of pro- and eukaryotes but can also be adapted for the analysis among different growth or stress conditions. Our comparative dRNA–seq analysis revealed conservation of most TSS, but also single-nucleotide-polymorphisms (SNP) in promoter regions, which lead to strain-specific transcriptional output. Furthermore, we identified strain-specific sRNA repertoires that could contribute to differential gene regulation among strains. In addition, we identified a novel minimal CRISPR-system in Campylobacter of the type-II CRISPR subtype, which relies on the host factor RNase III and a trans-encoded sRNA for maturation of crRNAs. This minimal system of Campylobacter, which seems active in only some strains, employs a unique maturation pathway, since the crRNAs are transcribed from individual promoters in the upstream repeats and thereby minimize the requirements for the maturation machinery. Overall, our study provides new insights into strain-specific transcriptome organization and sRNAs, and reveals genes that could modulate phenotypic variation among strains despite high conservation at the DNA level
    • …