245 research outputs found

    Interactive Visual Self-service Data Classification Approach to Democratize Machine Learning

    Get PDF
    Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and lossless multidimensional visualization, end users can identify the pattern in the data based on which they can make explainable decisions. Such options would not be possible in black box machine learning methodologies. The interpretable IVLC algorithm is supported by the Interactive Shifted Paired Coordinates Software System (SPCVis). It is a lossless multidimensional data visualization system with interactive features. The interactive approach provides flexibility to the end user to perform data classification as self-service without having to rely on a machine learning expert. iv Interactive pattern discovery becomes challenging while dealing with large datasets with hundreds of dimensions/features. To overcome this problem, an automated classification approach combined with new Coordinate Order Optimizer (COO) algorithm and a Genetic algorithm (GA) is proposed. The COO algorithm automatically generates the coordinate pair sequences that best represent the data separation and GA helps optimizing the proposed IVLC algorithm by automatically generating the areas for data classification. The feasibility of the approach is shown by experiments on benchmark datasets covering both interactive and automated processes used for data classification

    NeatMap - non-clustering heat map alternatives in R

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map.</p> <p>Results</p> <p>NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets.</p> <p>Conclusions</p> <p>NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.</p

    Visualising Mutually Non-dominating Solution Sets in Many-objective Optimisation

    Get PDF
    Copyright © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.As many-objective optimization algorithms mature, the problem owner is faced with visualizing and understanding a set of mutually nondominating solutions in a high dimensional space. We review existing methods and present new techniques to address this problem. We address a common problem with the well-known heatmap visualization, since the often arbitrary ordering of rows and columns renders the heatmap unclear, by using spectral seriation to rearrange the solutions and objectives and thus enhance the clarity of the heatmap. A multiobjective evolutionary optimizer is used to further enhance the simultaneous visualization of solutions in objective and parameter space. Two methods for visualizing multiobjective solutions in the plane are introduced. First, we use RadViz and exploit interpretations of barycentric coordinates for convex polygons and simplices to map a mutually nondominating set to the interior of a regular convex polygon in the plane, providing an intuitive representation of the solutions and objectives. Second, we introduce a new measure of the similarity of solutions—the dominance distance—which captures the order relations between solutions. This metric provides an embedding in Euclidean space, which is shown to yield coherent visualizations in two dimensions. The methods are illustrated on standard test problems and data from a benchmark many-objective problem

    Visualisation of Pareto Front Approximation: A Short Survey and Empirical Comparisons

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record.Visualisation is an effective way to facilitate the analysis and understanding of multivariate data. In the context of multi-objective optimisation, comparing to quantitative performance metrics, visualisation is, in principle, able to provide a decision maker better insights about Pareto front approximation sets (e.g. the distribution of solutions, the geometric characteristics of Pareto front approximation) thus to facilitate the decision-making (e.g. the exploration of trade-off relationship, the knee region or region of interest). In this paper, we overview some currently prevalent visualisation techniques according to the way how data is represented. To have a better understanding of the pros and cons of different visualisation techniques, we empirically compare six representative visualisation techniques for the exploratory analysis of different Pareto front approximation sets obtained by four state-of-the-art evolutionary multi-objective optimisation algorithms on the classic DTLZ benchmark test problems. From the empirical results, we find that visual comparisons also follow the \textit{No-Free-Lunch} theorem where no single visualisation technique is able to provide a comprehensive understanding of the characteristics of a Pareto front approximation set. In other words, a specific type of visualisation technique is only good at exploring a particular aspect of the data.Royal Societ

    Visualisation Support for Biological Bayesian Network Inference

    Get PDF
    Extracting valuable information from the visualisation of biological data and turning it into a network model is the main challenge addressed in this thesis. Biological networks are mathematical models that describe biological entities as nodes and their relationships as edges. Because they describe patterns of relationships, networks can show how a biological system works as a whole. However, network inference is a challenging optimisation problem impossible to resolve computationally in polynomial time. Therefore, the computational biologists (i.e. modellers) combine clustering and heuristic search algorithms with their tacit knowledge to infer networks. Visualisation can play an important role in supporting them in their network inference workflow. The main research question is: “How can visualisation support modellers in their workflow to infer networks from biological data?” To answer this question, it was required to collaborate with computational biologists to understand the challenges in their workflow and form research questions. Following the nested model methodology helped to characterise the domain problem, abstract data and tasks, design effective visualisation components and implement efficient algorithms. Those steps correspond to the four levels of the nested model for collaborating with domain experts to design effective visualisations. We found that visualisation can support modellers in three steps of their workflow. (a) To select variables, (b) to infer a consensus network and (c) to incorporate information about its dynamics.To select variables (a), modellers first apply a hierarchical clustering algorithm which produces a dendrogram (i.e. a tree structure). Then they select a similarity threshold (height) to cut the tree so that branches correspond to clusters. However, applying a single-height similarity threshold is not effective for clustering heterogeneous multidimensional data because clusters may exist at different heights. The research question is: Q1 “How to provide visual support for the effective hierarchical clustering of many multidimensional variables?” To answer this question, MLCut, a novel visualisation tool was developed to enable the application of multiple similarity thresholds. Users can interact with a representation of the dendrogram, which is coordinated with a view of the original multidimensional data, select branches of the tree at different heights and explore different clustering scenarios. Using MLCut in two case studies has shown that this method provides transparency in the clustering process and enables the effective allocation of variables into clusters.Selected variables and clusters constitute nodes in the inferred network. In the second step (b), modellers apply heuristic search algorithms which sample a solution space consisting of all possible networks. The result of each execution of the algorithm is a collection of high-scoring Bayesian networks. The task is to guide the heuristic search and help construct a consensus network. However, this is challenging because many network results contain different scores produced by different executions of the algorithm. The research question is: Q2 “How to support the visual analysis of heuristic search results, to infer representative models for biological systems?” BayesPiles, a novel interactive visual analytics tool, was developed and evaluated in three case studies to support modellers explore, combine and compare results, to understand the structure of the solution space and to construct a consensus network.As part of the third step (c), when the biological data contain measurements over time, heuristics can also infer information about the dynamics of the interactions encoded as different types of edges in the inferred networks. However, representing such multivariate networks is a challenging visualisation problem. The research question is: Q3 “How to effectively represent information related to the dynamics of biological systems, encoded in the edges of inferred networks?” To help modellers explore their results and to answer Q3, a human-centred crowdsourcing experiment took place to evaluate the effectiveness of four visual encodings for multiple edge types in matrices. The design of the tested encodings combines three visual variables: position, orientation, and colour. The study showed that orientation outperforms position and that colour is helpful in most tasks. The results informed an extension to the design of BayePiles, which modellers evaluated exploring dynamic Bayesian networks. The feedback of most participants confirmed the results of the crowdsourcing experiment.This thesis focuses on the investigation, design, and application of visualisation approaches for gaining insights from biological data to infer network models. It shows how visualisation can help modellers in their workflow to select variables, to construct representative network models and to explore their different types of interactions, contributing in gaining a better understanding of how biological processes within living organisms work

    A Neural Circuit for Angular Velocity Computation

    Get PDF
    In one of the most remarkable feats of motor control in the animal world, some Diptera, such as the housefly, can accurately execute corrective flight maneuvers in tens of milliseconds. These reflexive movements are achieved by the halteres, gyroscopic force sensors, in conjunction with rapidly tunable wing steering muscles. Specifically, the mechanosensory campaniform sensilla located at the base of the halteres transduce and transform rotation-induced gyroscopic forces into information about the angular velocity of the fly's body. But how exactly does the fly's neural architecture generate the angular velocity from the lateral strain forces on the left and right halteres? To explore potential algorithms, we built a neuromechanical model of the rotation detection circuit. We propose a neurobiologically plausible method by which the fly could accurately separate and measure the three-dimensional components of an imposed angular velocity. Our model assumes a single sign-inverting synapse and formally resembles some models of directional selectivity by the retina. Using multidimensional error analysis, we demonstrate the robustness of our model under a variety of input conditions. Our analysis reveals the maximum information available to the fly given its physical architecture and the mathematics governing the rotation-induced forces at the haltere's end knob
    corecore