8 research outputs found

    Visual Encodings for Networks with Multiple Edge Types

    Get PDF
    This paper reports on a formal user study on visual encodings ofnetworks with multiple edge types in adjacency matrices. Our tasksand conditions were inspired by real problems in computationalbiology. We focus on encodings in adjacency matrices, selectingfour designs from a potentially huge design space of visual encodings.We then settle on three visual variables to evaluate in acrowdsourcing study with 159 participants: orientation, positionand colour. The best encodings were integrated into a visual analyticstool for inferring dynamic Bayesian networks and evaluated bycomputational biologists for additional evidence.We found that theencodings performed differently depending on the task, however,colour was found to help in all tasks except when trying to find theedge with the largest number of edge types. Orientation generallyoutperformed position in all of our tasks

    Visualisation Support for Biological Bayesian Network Inference

    Get PDF
    Extracting valuable information from the visualisation of biological data and turning it into a network model is the main challenge addressed in this thesis. Biological networks are mathematical models that describe biological entities as nodes and their relationships as edges. Because they describe patterns of relationships, networks can show how a biological system works as a whole. However, network inference is a challenging optimisation problem impossible to resolve computationally in polynomial time. Therefore, the computational biologists (i.e. modellers) combine clustering and heuristic search algorithms with their tacit knowledge to infer networks. Visualisation can play an important role in supporting them in their network inference workflow. The main research question is: “How can visualisation support modellers in their workflow to infer networks from biological data?” To answer this question, it was required to collaborate with computational biologists to understand the challenges in their workflow and form research questions. Following the nested model methodology helped to characterise the domain problem, abstract data and tasks, design effective visualisation components and implement efficient algorithms. Those steps correspond to the four levels of the nested model for collaborating with domain experts to design effective visualisations. We found that visualisation can support modellers in three steps of their workflow. (a) To select variables, (b) to infer a consensus network and (c) to incorporate information about its dynamics.To select variables (a), modellers first apply a hierarchical clustering algorithm which produces a dendrogram (i.e. a tree structure). Then they select a similarity threshold (height) to cut the tree so that branches correspond to clusters. However, applying a single-height similarity threshold is not effective for clustering heterogeneous multidimensional data because clusters may exist at different heights. The research question is: Q1 “How to provide visual support for the effective hierarchical clustering of many multidimensional variables?” To answer this question, MLCut, a novel visualisation tool was developed to enable the application of multiple similarity thresholds. Users can interact with a representation of the dendrogram, which is coordinated with a view of the original multidimensional data, select branches of the tree at different heights and explore different clustering scenarios. Using MLCut in two case studies has shown that this method provides transparency in the clustering process and enables the effective allocation of variables into clusters.Selected variables and clusters constitute nodes in the inferred network. In the second step (b), modellers apply heuristic search algorithms which sample a solution space consisting of all possible networks. The result of each execution of the algorithm is a collection of high-scoring Bayesian networks. The task is to guide the heuristic search and help construct a consensus network. However, this is challenging because many network results contain different scores produced by different executions of the algorithm. The research question is: Q2 “How to support the visual analysis of heuristic search results, to infer representative models for biological systems?” BayesPiles, a novel interactive visual analytics tool, was developed and evaluated in three case studies to support modellers explore, combine and compare results, to understand the structure of the solution space and to construct a consensus network.As part of the third step (c), when the biological data contain measurements over time, heuristics can also infer information about the dynamics of the interactions encoded as different types of edges in the inferred networks. However, representing such multivariate networks is a challenging visualisation problem. The research question is: Q3 “How to effectively represent information related to the dynamics of biological systems, encoded in the edges of inferred networks?” To help modellers explore their results and to answer Q3, a human-centred crowdsourcing experiment took place to evaluate the effectiveness of four visual encodings for multiple edge types in matrices. The design of the tested encodings combines three visual variables: position, orientation, and colour. The study showed that orientation outperforms position and that colour is helpful in most tasks. The results informed an extension to the design of BayePiles, which modellers evaluated exploring dynamic Bayesian networks. The feedback of most participants confirmed the results of the crowdsourcing experiment.This thesis focuses on the investigation, design, and application of visualisation approaches for gaining insights from biological data to infer network models. It shows how visualisation can help modellers in their workflow to select variables, to construct representative network models and to explore their different types of interactions, contributing in gaining a better understanding of how biological processes within living organisms work

    MLCut : exploring Multi-Level Cuts in dendrograms for biological data

    Get PDF
    Choosing a single similarity threshold for cutting dendrograms is not sufficient for performing hierarchical clustering analysis of heterogeneous data sets. In addition, alternative automated or semi-automated methods that cut dendrograms in multiple levels make assumptions about the data in hand. In an attempt to help the user to find patterns in the data and resolve ambiguities in cluster assignments, we developed MLCut: a tool that provides visual support for exploring dendrograms of heterogeneous data sets in different levels of detail. The interactive exploration of the dendrogram is coordinated with a representation of the original data, shown as parallel coordinates. The tool supports three analysis steps. Firstly, a single-height similarity threshold can be applied using a dynamic slider to identify the main clusters. Secondly, a distinctiveness threshold can be applied using a second dynamic slider to identify “weak-edges” that indicate heterogeneity within clusters. Thirdly, the user can drill-down to further explore the dendrogram structure - always in relation to the original data - and cut the branches of the tree at multiple levels. Interactive drill-down is supported using mouse events such as hovering, pointing and clicking on elements of the dendrogram. Two prototypes of this tool have been developed in collaboration with a group of biologists for analysing their own data sets. We found that enabling the users to cut the tree at multiple levels, while viewing the effect in the original data, isa promising method for clustering which could lead to scientific discoveries.Postprin

    TetraploidSNPMap: Software for Linkage Analysis and QTL Mapping in Autotetraploid Populations Using SNP Dosage Data

    Get PDF
    An earlier software application of ours, TetraploidMap for Windows, enabled linkage analysis and quantitative trait locus interval mapping to be carried out in an experimental cross of an autotetraploid species, using both dominant markers such as amplified fragment length polymorphisms and codominant markers such as simple sequence repeats. The size was limited to 800 markers, and quantitative trait locus mapping was conducted for each parent separately due to the difficulties in obtaining a reliable consensus map for the 2 parents. Modern genotyping technologies now give rise to datasets of thousands of single nucleotide polymorphisms, and these can be scored in autotetraploid species as single nucleotide polymorphism dosages, distinguishing among the heterozygotes AAAB, AABB, and ABBB, rather than simply using the presence or absence of an allele. The dosage data is more informative about recombination and leads to higher density linkage maps. The current program, TetraploidSNPMap, makes full use of the dosage data, and has new facilities for displaying the clustering of single nucleotide polymorphisms, rapid ordering of large numbers of single nucleotide polymorphisms using a multidimensional scaling analysis, and phase calling. It also has new routines for quantitative trait locus mapping based on a hidden Markov model, which use the dosage data to model the effects of alleles from both parents simultaneously. A Windows-based interface facilitates data entry and exploration. It is distributed with a detailed user guide. TetraploidSNPMap is freely available from our GitHub repository

    BayesPiles: Visualisation Support for Bayesian Network Structure Learning

    Get PDF
    We address the problem of exploring, combining and comparing large collections of scored, directed networks for understanding inferred Bayesian networks used in biology. In this feld, heuristic algorithms explore the space of possible network solutions, sampling this space based on algorithm parameters and a network score that encodes the statistical fit to the data. The goal of the analyst is to guide the heuristic search and decide how to determine a final consensus network structure, usually by selecting the top-scoring network or constructing the consensus network from a collection of high-scoring networks. BayesPiles, our visualisation tool, helps with understanding the structure of the solution space and supporting the construction of a final consensus network that is representative of the underlying dataset. BayesPiles builds upon and extends MultiPiles to meet our domain requirements. We developed BayesPiles in conjunction with computational biologists who have used this tool on datasets used in their research. The biologists found our solution provides them with new insights and helps them achieve results that are representative of the underlying data

    MLCut:exploring Multi-Level Cuts in dendrograms for biological data

    No full text
    Choosing a single similarity threshold for cutting dendrograms is not sufficient for performing hierarchical clustering analysis of heterogeneous data sets. In addition, alternative automated or semi-automated methods that cut dendrograms in multiple levels make assumptions about the data in hand. In an attempt to help the user to find patterns in the data and resolve ambiguities in cluster assignments, we developed MLCut: a tool that provides visual support for exploring dendrograms of heterogeneous data sets in different levels of detail. The interactive exploration of the dendrogram is coordinated with a representation of the original data, shown as parallel coordinates. The tool supports three analysis steps. Firstly, a single-height similarity threshold can be applied using a dynamic slider to identify the main clusters. Secondly, a distinctiveness threshold can be applied using a second dynamic slider to identify “weak-edges” that indicate heterogeneity within clusters. Thirdly, the user can drill-down to further explore the dendrogram structure - always in relation to the original data - and cut the branches of the tree at multiple levels. Interactive drill-down is supported using mouse events such as hovering, pointing and clicking on elements of the dendrogram. Two prototypes of this tool have been developed in collaboration with a group of biologists for analysing their own data sets. We found that enabling the users to cut the tree at multiple levels, while viewing the effect in the original data, isa promising method for clustering which could lead to scientific discoveries
    corecore