16 research outputs found

    Probabilistic topographic information visualisation

    Get PDF
    The focus of this thesis is the extension of topographic visualisation mappings to allow for the incorporation of uncertainty. Few visualisation algorithms in the literature are capable of mapping uncertain data with fewer able to represent observation uncertainties in visualisations. As such, modifications are made to NeuroScale, Locally Linear Embedding, Isomap and Laplacian Eigenmaps to incorporate uncertainty in the observation and visualisation spaces. The proposed mappings are then called Normally-distributed NeuroScale (N-NS), T-distributed NeuroScale (T-NS), Probabilistic LLE (PLLE), Probabilistic Isomap (PIso) and Probabilistic Weighted Neighbourhood Mapping (PWNM). These algorithms generate a probabilistic visualisation space with each latent visualised point transformed to a multivariate Gaussian or T-distribution, using a feed-forward RBF network. Two types of uncertainty are then characterised dependent on the data and mapping procedure. Data dependent uncertainty is the inherent observation uncertainty. Whereas, mapping uncertainty is defined by the Fisher Information of a visualised distribution. This indicates how well the data has been interpolated, offering a level of ‘surprise’ for each observation. These new probabilistic mappings are tested on three datasets of vectorial observations and three datasets of real world time series observations for anomaly detection. In order to visualise the time series data, a method for analysing observed signals and noise distributions, Residual Modelling, is introduced. The performance of the new algorithms on the tested datasets is compared qualitatively with the latent space generated by the Gaussian Process Latent Variable Model (GPLVM). A quantitative comparison using existing evaluation measures from the literature allows performance of each mapping function to be compared. Finally, the mapping uncertainty measure is combined with NeuroScale to build a deep learning classifier, the Cascading RBF. This new structure is tested on the MNist dataset achieving world record performance whilst avoiding the flaws seen in other Deep Learning Machines

    Topographic visual analytics of multibeam dynamic SONAR data

    Get PDF
    This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain, we are investigating a preliminary prototype topographic visualisation model. The focus of the current paper is on the mathematical problem of exploiting a relative dissimilarity representation of signals in a visual informatics mapping model, driven by real-world sonar systems. A realistic noise model is explored and incorporated into non-linear and topographic visualisation algorithms building on the approach of [9]. Concepts are illustrated using a real world dataset of 32 hydrophones monitoring a shallow-water environment in which targets are present and dynamic

    Predictive gene lists for breast cancer prognosis: A topographic visualisation study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists.</p> <p>Methods</p> <p>We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether <it>a-posteriori </it>two prognosis groups are separable on the evidence of the gene lists.</p> <p>A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset.</p> <p>Results</p> <p>The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results.</p> <p>Conclusion</p> <p>The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers.</p> <p>However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses.</p> <p>We conclude that many of the patients involved in such medical studies are <it>intrinsically unclassifiable </it>on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.</p

    A Decision Support System to Ease Operator Overload in Multibeam Passive Sonar

    Get PDF
    Creating human-informative signal processing systems for the underwater acoustic environment that do not generate operator cognitive saturation and overload is a major challenge. To alleviate cognitive operator overload, we present a visual analytics methodology in which multiple beam-formed sonar returns are mapped to an optimized 2-D visual representation, which preserves the relevant data structure. This representation alerts the operator as to which beams are likely to contain anomalous information by modeling a latent distribution of information for each beam. Sonar operators therefore focus their attention only on the surprising events. In addition to the principled visualization of high-dimensional uncertain data, the system quantifies anomalous information using a Fisher Information measure. Central to this process is the novel use of both signal and noise observation modeling to characterize the sensor information. A demonstration of detecting exceptionally low signal-to-noise ratio targets embedded in real-world 33-beam passive sonar data is presented

    Discovering properties of new DNA-binding activity of proteins

    Get PDF
    Protein-DNA interactions are an essential feature in the genetic activities of life, and the ability to predict and manipulate such interactions has applications in a wide range of fields. This Thesis presents the methods of modelling the properties of protein-DNA interactions. In particular, it investigates the methods of visualising and predicting the specificity of DNA-binding Cys2His2 zinc finger interaction. The Cys2His2 zinc finger proteins interact via their individual fingers to base pair subsites on the target DNA. Four key residue positions on the a- helix of the zinc fingers make non-covalent interactions with the DNA with sequence specificity. Mutating these key residues generates combinatorial possibilities that could potentially bind to any DNA segment of interest. Many attempts have been made to predict the binding interaction using structural and chemical information, but with only limited success. The most important contribution of the thesis is that the developed model allows for the binding properties of a given protein-DNA binding to be visualised in relation to other protein-DNA combinations without having to explicitly physically model the specific protein molecule and specific DNA sequence. To prove this, various databases were generated, including a synthetic database which includes all possible combinations of the DNA-binding Cys2His2 zinc finger interactions. NeuroScale, a topographic visualisation technique, is exploited to represent the geometric structures of the protein-DNA interactions by measuring dissimilarity between the data points. In order to verify the effect of visualisation on understanding the binding properties of the DNA-binding Cys2His2 zinc finger interaction, various prediction models are constructed by using both the high dimensional original data and the represented data in low dimensional feature space. Finally, novel data sets are studied through the selected visualisation models based on the experimental DNA-zinc finger protein database. The result of the NeuroScale projection shows that different dissimilarity representations give distinctive structural groupings, but clustering in biologically-interesting ways. This method can be used to forecast the physiochemical properties of the novel proteins which may be beneficial for therapeutic purposes involving genome targeting in general

    Γ-stochastic neighbour embedding for feed-forward data visualization

    Get PDF
    t-distributed Stochastic Neighbour Embedding (t-SNE) is one of the most popular nonlinear dimension reduction techniques used in multiple application domains. In this paper we propose a variation on the embedding neighbourhood distribution, resulting in Γ-SNE, which can construct a feed-forward mapping using an RBF network. We compare the visualizations generated by Γ-SNE with those of t-SNE and provide empirical evidence suggesting the network is capable of robust interpolation and automatic weight regularization

    Dimensionality reduction in complex models

    Get PDF
    As a part of the Managing Uncertainty in Complex Models (MUCM) project, research at Aston University will develop methods for dimensionality reduction of the input and/or output spaces of models, as seen within the emulator framework. Towards this end this report describes a framework for generating toy datasets, whose underlying structure is understood, to facilitate early investigations of dimensionality reduction methods and to gain a deeper understanding of the algorithms employed, both in terms of how effective they are for given types of models / situations, and also their speed in applications and how this scales with various factors. The framework, which allows the evaluation of both screening and projection approaches to dimensionality reduction, is described. We also describe the screening and projection methods currently under consideration and present some preliminary results. The aim of this draft of the report is to solicit feedback from the project team on the dataset generation framework, the methods we propose to use, and suggestions for extensions that should be considered

    Visualising Mutually Non-dominating Solution Sets in Many-objective Optimisation

    Get PDF
    Copyright © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.As many-objective optimization algorithms mature, the problem owner is faced with visualizing and understanding a set of mutually nondominating solutions in a high dimensional space. We review existing methods and present new techniques to address this problem. We address a common problem with the well-known heatmap visualization, since the often arbitrary ordering of rows and columns renders the heatmap unclear, by using spectral seriation to rearrange the solutions and objectives and thus enhance the clarity of the heatmap. A multiobjective evolutionary optimizer is used to further enhance the simultaneous visualization of solutions in objective and parameter space. Two methods for visualizing multiobjective solutions in the plane are introduced. First, we use RadViz and exploit interpretations of barycentric coordinates for convex polygons and simplices to map a mutually nondominating set to the interior of a regular convex polygon in the plane, providing an intuitive representation of the solutions and objectives. Second, we introduce a new measure of the similarity of solutions—the dominance distance—which captures the order relations between solutions. This metric provides an embedding in Euclidean space, which is shown to yield coherent visualizations in two dimensions. The methods are illustrated on standard test problems and data from a benchmark many-objective problem

    Nonparametric Guidance of Autoencoder Representations using Label Information

    Get PDF
    While unsupervised learning has long been useful for density modeling, exploratory data analysis and visualization, it has become increasingly important for discovering features that will later be used for discriminative tasks. Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. One particularly effective way to perform such unsupervised learning has been to use autoencoder neural networks, which find latent representations that are constrained but nevertheless informative for reconstruction. However, pure unsupervised learning with autoencoders can find representations that may or may not be useful for the ultimate discriminative task. It is a continuing challenge to guide the training of an autoencoder so that it finds features which will be useful for predicting labels. Similarly, we often have a priori information regarding what statistical variation will be irrelevant to the ultimate discriminative task, and we would like to be able to use this for guidance as well. Although a typical strategy would be to include a parametric discriminative model as part of the autoencoder training, here we propose a nonparametric approach that uses a Gaussian process to guide the representation. By using a nonparametric model, we can ensure that a useful discriminative function exists for a given set of features, without explicitly instantiating it. We demonstrate the superiority of this guidance mechanism on four data sets, including a real-world application to rehabilitation research. We also show how our proposed approach can learn to explicitly ignore statistically significant covariate information that is label-irrelevant, by evaluating on the small NORB image recognition problem in which pose and lighting labels are available.Engineering and Applied Science
    corecore