121 research outputs found

    A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data

    Get PDF
    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.c

    The GWIPS-viz browser

    Get PDF
    GWIPS-viz is a publicly available browser that provides Genome Wide Information on Protein Synthesis through the visualization of ribosome profiling data. Ribosome profiling (Ribo-seq) is a high-throughput technique which isolates fragments of messenger RNA that are protected by the ribosome. The alignment of the ribosome-protected fragments or footprint sequences to the corresponding reference genome and their visualization using GWIPS-viz allows for unique insights into the genome loci that are expressed as potentially translated RNA. The GWIPS-viz browser hosts both Ribo-seq data and corresponding mRNA-seq data from publicly available studies across a number of genomes, avoiding the need for computational processing on the user side. Since its initial publication in 2014, over 1885 tracks have been produced across 24 genomes. This unit describes the navigation of the GWIPS-viz genome browser, the uploading of custom tracks, and the downloading of the Ribo-seq/mRNA-seq alignment data

    Epiviz: Integrative Visual Analysis Software for Genomics

    Get PDF
    Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. Commonly, the data visualized in these tools is the output of analyses performed in powerful computing environments like R/Bioconductor or Python. Two essential aspects of data analysis are usually treated as distinct, in spite of being part of the same exploratory process: algorithmic analysis and interactive visualization. In current technologies these are not integrated within one tool, but rather, one precedes the other. Recent technological advances in web-based data visualization have made it possible for interactive visualization tools to tightly integrate with powerful algorithmic tools, without being restricted to one such tool in particular. We introduce Epiviz (http://epiviz.cbcb.umd.edu), an integrative visualization tool that bridges the gap between the two types of tools, simplifying genomic data analysis workflows. Epiviz is the first genomics interactive visualization tool to provide tight-knit integration with computational and statistical modeling and data analysis. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a visualization tool for genomics. Epiviz can be used in multiple branches of genomics data analysis for various types of datasets, of which we detail two: functional genomics data, aligned to a continuous coordinate such as the genome, and metagenomics, organized according to volatile hierarchical coordinate spaces. We also present security implications of the current design, performance benchmarks, a series of limitations and future research steps

    Epiviz: a view inside the design of an integrated visual analysis software for genomics

    Get PDF
    Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.https://doi.org/10.1186/1471-2105-16-S11-S

    SGLT1 is required for the survival of triple-negative breast cancer cells via potentiation of EGFR activity.

    Get PDF
    Sodium/glucose cotransporter 1 (SGLT1), an essential active glucose transport protein that helps maintain high intracellular glucose levels, was previously shown to interact with epidermal growth factor receptor (EGFR); the SGLT1-EGFR interaction maintains intracellular glucose levels to promote survival of cancer cells. Here, we explore the role of SGLT1 in triple-negative breast cancer (TNBC), which is the most aggressive type of breast cancer. We performed TCGA analysis coupled to in vitro experiments in TNBC cell lines as well as in vivo xenografts established in the mammary fat pad of female nude mice. Tissue microarrays of TNBC patients with information of clinical-pathological parameters were also used to investigate the expression and function of SGLT1 in TNBC. We show that high levels of SGLT1 are associated with greater tumour size in TNBC. Knockdown of SGLT1 compromises cell growth in vitro and in vivo. We further demonstrate that SGLT1 depletion results in decreased levels of phospho-EGFR, and as a result, the activity of downstream signalling pathways (such as AKT and ERK) is inhibited. Hence, targeting SGLT1 itself or the EGFR-SGLT1 interaction may provide novel therapeutics against TNBC

    Genome visualisation and user studies in biologist-computer interaction

    Get PDF
    We surveyed a number of genome visualisation tools used in biomedical research. We recognised that none of the tools shows all the relevant data geneticists who look for candidate disease genes would like to see. The biological researchers we collaborate with would like to view integrated data from a variety of sources and be able to see both data overviews and details. In response to this need, we developed a new visualisation tool, VisGenome, which allows the users to add their own data or data downloaded from other sources, such as Ensembl. VisGenome visualises single and comparative representations of the rat, the mouse, and the human chromosomes, and can easily be used for other genomes. In the context of VisGenome development we made the following research contributions. We developed a new algorithm (CartoonPlus) which allows the users to see different kinds of data in cartoon scaling depending on a selected basis. Also, two user studies were conducted: an initial quantitative user study and a mixed paradigm user study. The first study showed that neither Ensembl nor VisGenome fulfil all user requirements and can be regarded as user-friendly, as the users make a significant number of mistakes during data navigation. To help users navigate their data easily, we improved existing visualisation techniques in VisGenome and added a new technique CartoonPlus. To verify if this solution was useful, we conducted a second user study. We saw that the users became more familiar with the tool, and found new ways to use the application on its own and in connection with other tools. They frequently used CartoonPlus, which allowed them to see small regions of their data in a way that was not possible before

    Exploratory visualizations and statistical analysis of large, heterogeneous epigenetic datasets

    Get PDF
    Epigenetic marks, such as DNA methylation and histone modifications, are important regulatory mechanisms that allow a single genomic sequence to give rise to a complex multicellular organism. When studying mechanisms of epigenetic regulation, the analyses depend on the experimental technologies and the available data. Recent advancements in sequencing technologies allow for the efficient extraction of genome-wide maps of epigenetic marks. A number of large-scale mapping projects, such as ENCODE and IHEC, intensively produce data for different tissues and cell cultures. The increasing quantity of data highlights a major bottleneck in bioinformatic research, namely the lack of bioinformatic tools for analyzing these data. To date, there are bioinformatics tools for detailed (mostly visual) inspection of single genomic loci, allowing biologists to focus research on regions of interest. Also, efficient tools for manipulation and analysis of the data have been published, but often they require computer science abilities. Furthermore, the available tools provide solutions to only already well formulated biological questions. What is missing, in our opinion, are tools (or pipelines of tools) to explore the data interactively, in a process that would facilitate a trained biologist to recognize interesting aspects and pursue them further until concrete hypotheses are formulated. A possible solution stems from the best practices in the fields of information retrieval and exploratory search. In this thesis, I propose EpiExplorer, a paradigm for integration of state-of-the-art information retrieval methods and indexing structures, applied to offer instant interactive exploration of large epigenetic datasets. The algorithms we use are developed for semi-structured text data, but we apply them on bioinformatic data through clever textual mapping of biological properties. We demonstrate the power of EpiExplorer in a series of studies that address interesting biological problems. We also present in this manuscript EpiGRAPH, a bioinformatic software that we developed with colleagues. EpiGRAPH helps identify and model significant biological associations among epigenetic and genetic properties for sets of regions. Using EpiExplorer and EpiGRAPH, independently or in a pipeline, provides the bioinformatic community with access to large databases of annotations, allows for exploratory visualizations or statistical analysis and facilitates reproduction and sharing of results.Epigenetische Signaturen wie die Methylierung der DNS oder posttranslationale Modifikationen der Histonproteine stellen wichtige regulatorische Mechanismen dar. Diese ermöglichen es, dass ein komplexer, multizellulärer Organismus aus einer einzelnen genomische Sequenz hervorgeht. Adequate Analysemethoden hängen von den verwendeten experimentellen Technologien und den verfügbaren Daten ab. Jüngste Fortschritte in der DNS-Sequenzierungstechnologie ermöglichen die effiziente Erstellung genomweiter Karten epigenetischer Informationen. Diese Epigenomkarten werden von einigen Projekten und Initiativen wie ENCODE und IHEC im grossen Massstab für diverse Gewebe- und Zelltypen erstellt. Hierbei stellt der Mangel an effizienten bioinformatischen Softwarewerkzeugen einen wesentlichen Engpass in der Analyse dieser stetig wachsenden Datenflut dar. Experimentelle Biologen können heute einzelne genomische Loci mithilfe benutzerfreundlicher (meist visueller) bioinformatischer Software im Detail inspizieren. Des Weiteren existieren effiziente Werkzeuge für die Manipulation und Analyse dieser Datensätze, die jedoch ein gewisses Mass informatischer Expertise erfordern und sich zumeist auf die Lösung bereits wohldefinierter biologischer Fragestellungen fokussieren. Unserer Ansicht nach fehlen Werkzeuge und Softwarepipelines mithilfe derer ein Benutzer, der über ein fundiertes Wissen der biologischen Grundlagen, jedoch nicht unbedingt über informatische Kenntnisse verfügt, die verfügbaren Datensätze interaktiv durchstöbern und darauf aufbauend weiterführende Hypothesen entwickeln kann. Eine möglichen Ansatz hierfür bieten Methoden aus den Bereichen Information Retrieval und der explorativen Suche. Diese Arbeit beschreibt EpiExplorer, eine Software, die auf dem Paradigma der Integration von modernen Information Retrieval und Indexstrukturen basiert und darauf ausgelegt ist eine Vielzahl von (epi-)genomweiten Datensätzen in Echtzeit zu explorieren. Die verwendeten Algorithmen wurden ursprünglich für die Suche in semistrukturierten, textuellen Datensätzen entwickelt. EpiExplorer ermöglicht ihre Verwendung durch eine systematische Umwandlung biologischer Eigenschaften in Textdukumente. Ausserdem demonstriert diese Arbeit EpiExplorers Leistungsfähigkeit und Nützlichkeit durch relevante Anwendungsbeispiele biologisch interessanter Fragestellungen. Komplementär zu EpiExplorer wurde in Kollaboration mit Kollegen EpiGRAPH entwickelt, mithilfe dessen signifikante biologische Assoziationen zwischen genetischen und epigenetischen Eigenschaften regionsbasiert identifiziert und modelliert werden können. EpiExplorer und EpiGRAPH stellen - unabhängig voneinander oder im Verbund miteinander - nützliche Ressourcen dar. In einer bioinformatischen Softwarepipeline ermöglichen sie den Datenbank-basierten Zugriff auf eine Vielzahl (epi-)genomischer Datensätze, deren explorative Visualisierung oder statistische Analyse sowie die Reproduzierbarkeit und den Austausch von Analyseergebnissen

    Exploratory visualizations and statistical analysis of large, heterogeneous epigenetic datasets

    Get PDF
    Epigenetic marks, such as DNA methylation and histone modifications, are important regulatory mechanisms that allow a single genomic sequence to give rise to a complex multicellular organism. When studying mechanisms of epigenetic regulation, the analyses depend on the experimental technologies and the available data. Recent advancements in sequencing technologies allow for the efficient extraction of genome-wide maps of epigenetic marks. A number of large-scale mapping projects, such as ENCODE and IHEC, intensively produce data for different tissues and cell cultures. The increasing quantity of data highlights a major bottleneck in bioinformatic research, namely the lack of bioinformatic tools for analyzing these data. To date, there are bioinformatics tools for detailed (mostly visual) inspection of single genomic loci, allowing biologists to focus research on regions of interest. Also, efficient tools for manipulation and analysis of the data have been published, but often they require computer science abilities. Furthermore, the available tools provide solutions to only already well formulated biological questions. What is missing, in our opinion, are tools (or pipelines of tools) to explore the data interactively, in a process that would facilitate a trained biologist to recognize interesting aspects and pursue them further until concrete hypotheses are formulated. A possible solution stems from the best practices in the fields of information retrieval and exploratory search. In this thesis, I propose EpiExplorer, a paradigm for integration of state-of-the-art information retrieval methods and indexing structures, applied to offer instant interactive exploration of large epigenetic datasets. The algorithms we use are developed for semi-structured text data, but we apply them on bioinformatic data through clever textual mapping of biological properties. We demonstrate the power of EpiExplorer in a series of studies that address interesting biological problems. We also present in this manuscript EpiGRAPH, a bioinformatic software that we developed with colleagues. EpiGRAPH helps identify and model significant biological associations among epigenetic and genetic properties for sets of regions. Using EpiExplorer and EpiGRAPH, independently or in a pipeline, provides the bioinformatic community with access to large databases of annotations, allows for exploratory visualizations or statistical analysis and facilitates reproduction and sharing of results.Epigenetische Signaturen wie die Methylierung der DNS oder posttranslationale Modifikationen der Histonproteine stellen wichtige regulatorische Mechanismen dar. Diese ermöglichen es, dass ein komplexer, multizellulärer Organismus aus einer einzelnen genomische Sequenz hervorgeht. Adequate Analysemethoden hängen von den verwendeten experimentellen Technologien und den verfügbaren Daten ab. Jüngste Fortschritte in der DNS-Sequenzierungstechnologie ermöglichen die effiziente Erstellung genomweiter Karten epigenetischer Informationen. Diese Epigenomkarten werden von einigen Projekten und Initiativen wie ENCODE und IHEC im grossen Massstab für diverse Gewebe- und Zelltypen erstellt. Hierbei stellt der Mangel an effizienten bioinformatischen Softwarewerkzeugen einen wesentlichen Engpass in der Analyse dieser stetig wachsenden Datenflut dar. Experimentelle Biologen können heute einzelne genomische Loci mithilfe benutzerfreundlicher (meist visueller) bioinformatischer Software im Detail inspizieren. Des Weiteren existieren effiziente Werkzeuge für die Manipulation und Analyse dieser Datensätze, die jedoch ein gewisses Mass informatischer Expertise erfordern und sich zumeist auf die Lösung bereits wohldefinierter biologischer Fragestellungen fokussieren. Unserer Ansicht nach fehlen Werkzeuge und Softwarepipelines mithilfe derer ein Benutzer, der über ein fundiertes Wissen der biologischen Grundlagen, jedoch nicht unbedingt über informatische Kenntnisse verfügt, die verfügbaren Datensätze interaktiv durchstöbern und darauf aufbauend weiterführende Hypothesen entwickeln kann. Eine möglichen Ansatz hierfür bieten Methoden aus den Bereichen Information Retrieval und der explorativen Suche. Diese Arbeit beschreibt EpiExplorer, eine Software, die auf dem Paradigma der Integration von modernen Information Retrieval und Indexstrukturen basiert und darauf ausgelegt ist eine Vielzahl von (epi-)genomweiten Datensätzen in Echtzeit zu explorieren. Die verwendeten Algorithmen wurden ursprünglich für die Suche in semistrukturierten, textuellen Datensätzen entwickelt. EpiExplorer ermöglicht ihre Verwendung durch eine systematische Umwandlung biologischer Eigenschaften in Textdukumente. Ausserdem demonstriert diese Arbeit EpiExplorers Leistungsfähigkeit und Nützlichkeit durch relevante Anwendungsbeispiele biologisch interessanter Fragestellungen. Komplementär zu EpiExplorer wurde in Kollaboration mit Kollegen EpiGRAPH entwickelt, mithilfe dessen signifikante biologische Assoziationen zwischen genetischen und epigenetischen Eigenschaften regionsbasiert identifiziert und modelliert werden können. EpiExplorer und EpiGRAPH stellen - unabhängig voneinander oder im Verbund miteinander - nützliche Ressourcen dar. In einer bioinformatischen Softwarepipeline ermöglichen sie den Datenbank-basierten Zugriff auf eine Vielzahl (epi-)genomischer Datensätze, deren explorative Visualisierung oder statistische Analyse sowie die Reproduzierbarkeit und den Austausch von Analyseergebnissen

    Patched Together: cis-Regulatory Logic of the Hedgehog Response.

    Full text link
    Understanding the processes that control how we develop from a fertilized embryo to a functional adult is paramount for treating the diseases that result when these processes are disrupted at any stage of life. My dissertation investigates the cis-regulatory logic underlying how cell signaling pathways utilize the genome to create and maintain the wide variety of cell types and tissues needed for proper development and survival. Surprisingly few cell signaling pathways are used throughout embryonic development; I have chosen to focus on Hedgehog (Hh) signaling, a pathway used in such diverse cellular contexts as digit specification, brain development, lung function, and reproductive maintenance. Disruption of this pathway results in developmental defects and cancer. It is essential to understand the mechanisms by which Hh signaling functions to treat these diseases more effectively. One relatively unexplored mechanism of Hh function is how its signal is transduced at the level of DNA, specifically through the regulation of gene expression. In this thesis, I explore the mechanisms that mediate tissue-specific, Hh-dependent gene regulation, and uncover an ancient cis-regulatory logic shared between flies and mice that has significant implications for the maintenance and evolution of cellular communication. I experimentally demonstrate that multiple enhancer elements, which control tissue-specific gene expression, rely on sub-optimal DNA sequences for binding of GLI proteins, the transcriptional effectors of Hh signaling. These sequences are essential to control gene expression in response to Hh and can influence the function of the pathway in a variety of cellular contexts. I also characterize several new transcriptional regulators of Hh signaling and introduce new tools to the field that allow for in depth analysis of the regulatory landscape of Hh target genes at any stage of development. My work presented here addresses a significant gap in our knowledge of how the Hh signaling pathway functions at the cis-regulatory level and describes a framework by which new advances can be made on this topic in the future.PHDCellular & Molecular BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135811/1/dslorber_1.pd
    corecore