1,494 research outputs found

    Visualization of biological data: Infrastructure, design and application

    Get PDF
    Visualization is an important component of biological data analysis. Ideally, visual methods are tightly integrated with analysis methods, so that it is seamless to plot data from different intermediate stages of the analysis. Bioconductor provides a substantial analysis platform, but limited tools for genomic data visualization. Visual tools for genomic data, eg GenomeView, IGV, IGB, primarily are detached from the analysis engine. This research fills this gap, by developing visualization methods that are integrated into the Bioconductor suite. There are three main components of the research: * New visual tools for genomic data that utilize the latest research in visualization. * Infrastructure development to support the visual tools, and analysis of other types of biological data. * Application of the visualization methods to the analysis of RNA-seq and DNA-seq data

    DNA microarray integromics analysis platform

    Get PDF
    Background: The study of interactions between molecules belonging to different biochemical families (such as lipids and nucleic acids) requires specialized data analysis methods. This article describes the DNA Microarray Integromics Analysis Platform, a unique web application that focuses on computational integration and analysis of "multi-omics" data. Our tool supports a range of complex analyses, including - among others - low- and high-level analyses of DNA microarray data, integrated analysis of transcriptomics and lipidomics data and the ability to infer miRNA-mRNA interactions. Results: We demonstrate the characteristics and benefits of the DNA Microarray Integromics Analysis Platform using two different test cases. The first test case involves the analysis of the nutrimouse dataset, which contains measurements of the expression of genes involved in nutritional problems and the concentrations of hepatic fatty acids. The second test case involves the analysis of miRNA-mRNA interactions in polysaccharide-stimulated human dermal fibroblasts infected with porcine endogenous retroviruses. Conclusions: The DNA Microarray Integromics Analysis Platform is a web-based graphical user interface for "multi-omics" data management and analysis. Its intuitive nature and wide range of available workflows make it an effective tool for molecular biology research. The platform is hosted at https://lifescience.plgrid.pl

    Network-based analysis of gene expression data

    Get PDF
    The methods of molecular biology for the quantitative measurement of gene expression have undergone a rapid development in the past two decades. High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology. To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products. Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus. This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks. For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be scaled-up to the level of larger systems. I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators. A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast. To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime. For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented. In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of current and future questions arising from the new generation of genomic data

    Fluxograma computacional para detecção e análise de sequências potencialmente formadoras de Z-DNA utilizando bioconductor

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Biológicas, Departamento de Biologia Celular, Programa de Pós-Graduação em Biologia Molecular, 2012.O Z-DNA é uma conformação alternativa da molécula de DNA envolvida na regulação da expressão gênica. Porém, a função específica desta estrutura no metabolismo celular ainda não foi totalmente elucidada. Este trabalho apresenta um fluxograma de análise que utiliza o ambiente R para investigar regiões potencialmente formadoras de Z-DNA (ZDRs) ao longo de genomas. Tal método combina a análise termodinâmica empregada pelo conhecido software Z-Catcher com a capacidade de manipulação de dados biológicos dos pacotes do Bioconductor. A metodologia desenvolvida foi aplicada no cromossomo 14 do genoma humano como estudo de caso e com isso foi possível estabelecer uma correlação entre as ZDRs e os sítios de início da trancrição (TSSs), que se mostrou de acordo com resultados de estudos anteriores. Além disso, foi possível demonstrar que ZDRs posicionadas no interior de genes tendem a ocorrer preferencialmente em introns ao invés de exons e que ZDRs à montante dos TSSs podem ter correlação positiva com estimulação da atividade da RNA polimerase. ______________________________________________________________________________ ABSTRACTZ-DNA is an alternative conformation of the DNA molecule implied in regulation of gene expression. However, the exact role of this structure in cell metabolism is not yet fully understood. Presented in this work is a novel Z-DNA analysis work ow which employs the R software environment to investigate Z-DNA forming regions (ZDRs) throughout genomes. It combines thermodynamic analysis of the well-known software Z-Catcher with biological data manipulation capabilities of several Bioconductor packages. The methodology was applied in the human chromosome 14 as a case study. With that, a correlation was established between ZDRs and transcription start sites (TSSs) which is in agreement with previous reports. In addition, the work ow was able to show that ZDRs which are positioned inside genes tend to occur in intronic sequences rather than exonic and that ZDRs upstream to TSSs may have a positive correlation with the up-regulation of RNA polymerase activity

    BiofilmGeneSet: Leveraging Multi-Omics Data Mining and ICA To Discover Biofilm Stage Genes of Interest from Condition-Specific Expression Dataset

    Get PDF
    Biofilm formation occurs in the attachment, colony, maturation, and dispersion stages. Understanding the molecular basis at every point of this process is essential to developing efficient diagnostics devices and effective antibiofilm agents. Gene expression data provide molecular insight for both static and temporal biofilm development. The most used analytic techniques for biofilm gene expression data are clustering and network inference algorithms, which class genes with similar expressions across the samples. However, these methods are inherently deficient because they do not capture gene(s) expressed in a subset of the samples. These subsets might be unique to a developmental stage, for example. Secondly, these methods perform a nonoverlapping gene assignment to the classes. This also leads to loss of information because gene expression is combinatorial, and a gene product can simultaneously participate more or less in different pathways. In this study, I developed an analysis Framework referred BiofilmGeneSet to classify genes significantly contributing to biofilm developmental stages. I applied the JADE algorithm to Expression data (X) to extract statistically independent expression modules (S) and their module activity (A). Next, Pearson correlation coefficients between the module activity and expression profile were computed to determine significant modules. BioNERO: an all-in-one Bioconductor package for comprehensive and easy biological network reconstruction was applied to the same data to evaluate the performance of this workflow. Of the 15 independent expression modules, modules 14, 11, and 4 were significantly associated with the attachment, colony, and maturation stages. The significance of this work can be summarized as follows: (i) a new data mining and expression gene classification framework with high accuracy compared to weighted gene co-expression network methods for problem-based gene set identification; (ii) a new gene set as a potential biomarker for each biofilm development stage; (iii) the generalization of our framework allows us to find gene sets relevant to several other related biological events such as quorum sensing, EPS, antibiotic resistance, etc.; (iv) a relevant functional annotation that will guide scientist in designing an experiment to validate our newly discovered marker gene sets

    SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty.

    Get PDF
    Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time

    Metabarcoding protocol: Analysis of Bacteria (including Cyanobacteria) using the 16S rRNA gene and a DADA2 pipeline (Version 1)

    Get PDF
    This protocol has been prepared as part of the Interreg Alpine Space project Eco-AlpsWater (ASP569) - Innovative Ecological Assessment and Water Management Strategy for the Protection of Ecosystem Services in Alpine Lakes and Rivers, Activity A.T1.3, Deliverable D.T1.3.2 – 1, https://www.alpine-space.eu/projects/eco-alpswater/en/hom
    • …
    corecore