8 research outputs found

    Addressing the unmet need for visualizing Conditional Random Fields in Biological Data

    Get PDF
    Background: The biological world is replete with phenomena that appear to be ideally modeled and analyzed by one archetypal statistical framework - the Graphical Probabilistic Model (GPM). The structure of GPMs is a uniquely good match for biological problems that range from aligning sequences to modeling the genome-to-phenome relationship. The fundamental questions that GPMs address involve making decisions based on a complex web of interacting factors. Unfortunately, while GPMs ideally fit many questions in biology, they are not an easy solution to apply. Building a GPM is not a simple task for an end user. Moreover, applying GPMs is also impeded by the insidious fact that the complex web of interacting factors inherent to a problem might be easy to define and also intractable to compute upon. Discussion: We propose that the visualization sciences can contribute to many domains of the bio-sciences, by developing tools to address archetypal representation and user interaction issues in GPMs, and in particular a variety of GPM called a Conditional Random Field(CRF). CRFs bring additional power, and additional complexity, because the CRF dependency network can be conditioned on the query data. Conclusions: In this manuscript we examine the shared features of several biological problems that are amenable to modeling with CRFs, highlight the challenges that existing visualization and visual analytics paradigms induce for these data, and document an experimental solution called StickWRLD which, while leaving room for improvement, has been successfully applied in several biological research projects.Comment: BioVis 2014 conferenc

    Emergence of the erythroid lineage from multipotent hematopoiesis [preprint]

    Get PDF
    Red cell formation begins with the hematopoietic stem cell, but the manner by which it gives rise to erythroid progenitors, and their subsequent developmental path, remain unclear. Here we combined single-cell transcriptomics of murine hematopoietic tissues with fate potential assays to infer a continuous yet hierarchical structure for the hematopoietic network. We define the erythroid differentiation trajectory as it emerges from multipotency and diverges from 6 other blood lineages. With the aid of a new flow-cytometric sorting strategy, we validated predicted cell fate potentials at the single cell level, revealing a coupling between erythroid and basophil/mast cell fates. We uncovered novel growth factor receptor regulators of the erythroid trajectory, including the proinflammatory IL- 17RA, found to be a strong erythroid stimulator; and identified a global hematopoietic response to stress erythropoiesis. We further identified transcriptional and high-purity FACS gates for the complete isolation of all classically-defined erythroid burst-forming (BFU-e) and colony-forming progenitors (CFU-e), finding that they express a dedicated transcriptional program, distinct from that of terminally-differentiating erythroblasts. Intriguingly, profound remodeling of the cell cycle is intimately entwined with CFU-e developmental progression and with a sharp transcriptional switch that extinguishes the CFU-e stage and activates terminal differentiation. Underlying these results, our work showcases the utility of theoretic approaches linking transcriptomic data to predictive fate models, providing key insights into lineage development in vivo

    StickWRLD as an Interactive Visual Pre-Filter for Canceromics-Centric Expression Quantitative Trait Locus Data

    No full text
    As datasets increase in complexity, the time required for analysis (both computational and human domain-expert) increases. One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models. Simple tables of summary statistics rarely provide an adequate picture of the patterns and details of the dataset to enable researchers to make well-informed decisions about the adequacy of the models they are constructing. We have developed a tool, StickWRLD, which allows the user to visually browse through their data, displaying all possible correlations. By allowing the user to dynamically modify the retention parameters (both P and the residual, r ), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria – effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis. In this study, we applied StickWRLD to a semi-synthetic dataset constructed from two published human datasets. In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene–SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance
    corecore