18 research outputs found

    Projection-Based Clustering through Self-Organization and Swarm Intelligence: Combining Cluster Analysis with the Visualization of High-Dimensional Data

    Get PDF
    Cluster Analysis; Dimensionality Reduction; Swarm Intelligence; Visualization; Unsupervised Machine Learning; Data Science; Knowledge Discovery; 3D Printing; Self-Organization; Emergence; Game Theory; Advanced Analytics; High-Dimensional Data; Multivariate Data; Analysis of Structured Dat

    The ubiquitous self-organizing map for non-stationary data streams

    Get PDF

    Cartogram representations of self-organizing virtual geographies

    Get PDF
    Model interpretability is a problem for multivariate data in general and, very specifically, for dimensionality reduction techniques as applied to data visualization. The problem is even bigger for nonlinear dimensionality reduction (NLDR) methods, to which interpretability limitations are consubstantial. Data visualization is a key process for knowledge extraction from data that helps us to gain insights into the observed data structure through graphical representations and metaphors. NLDR techniques provide flexible visual insight, but the locally varying representation distor- tion they generate makes interpretation far from intuitive. For some NLDR models, indirect quantitative measures of this mapping distortion can be calculated explicitly and used as part of an interpretative post-processing of the results. In this Master Thesis, we apply a cartogram method, inspired on techniques of geographic representation, to the purpose of data visualization using NLDR models. In particular, we show how this method allows reintroducing the distortion, measured in the visual maps of several self-organizing clustering methods. The main capabilities and limitations of the cartogram visualization of multivariate data using standard and hierarchical self-organizing models were investigated in some detail with artificial data as well as with real information stemming from a neuro-oncology problem that involves the discrimination of human brain tumor types, a problem for which knowledge dis- covery techniques in general, and data visualization in particular should be useful tools

    A Data Science-Based Analysis Points at Distinct Patterns of Lipid Mediator Plasma Concentrations in Patients With Dementia

    Get PDF
    Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers

    Exploratory data analysis using self-organising maps defined in up to three dimensions

    Get PDF
    The SOM is an artificial neural network based on an unsupervised learning process that performs a nonlinear mapping of high dimensional input data onto an ordered and structured array of nodes, designated as the SOM output space. Being simultaneously a quantization algorithm and a projection algorithm, the SOM is able to summarize and map the data, allowing its visualization. Because using the most common visualization methods it is very difficult or even impossible to visualize the SOM defined with more than two dimensions, the SOM output space is generally a regular two dimensional grid of nodes. However, there are no theoretical problems in generating SOMs with higher dimensional output spaces. In this thesis we present evidence that the SOM output space defined in up to three dimensions can be used successfully for the exploratory analysis of spatial data, two-way data and three-way data. Although the differences between the methods that are proposed to visualize each group of data, the approach adopted is commonly based in the projection of colour codes, which are obtained from the output space of 3D SOMs, in some specific bi-dimensional surface, where data can be represented according to its own characteristics. This approach is, in some cases, also complemented with the simultaneous use of SOMs defined in one and two dimensions, so that patterns in data can be properly revealed. The results obtained by using this visualization strategy indicates not only the benefits of using the SOM defined in up to three dimensions but also shows the relevance of the combined and simultaneous use of different models of the SOM in exploratory data analysis
    corecore