601 research outputs found

    Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen's Self-Organizing Map

    Full text link
    The use of unsupervised artificial neural network techniques like the self-organizing map (SOM) algorithm has proven to be a useful tool in exploratory data analysis and clustering of multivariate data sets. In this study a variant of the SOM-algorithm is proposed, the GEO3DSOM, capable of explicitly incorporating three-dimensional spatial knowledge into the algorithm. The performance of the GEO3DSOM is compared to the performance of the standard SOM in analyzing an artificial data set and a hydrochemical data set. The hydrochemical data set consists of 131 groundwater samples collected in two detritic, phreatic, Cenozoic aquifers in Central Belgium. Both techniques succeed very well in providing more insight in the groundwater quality data set, visualizing the relationships between variables, highlighting the main differences between groups of samples and pointing out anomalous wells and well screens. The GEO3DSOM however has the advantage to provide an increased resolution while still maintaining a good generalization of the data set

    A Novel Framework for Interactive Visualization and Analysis of Hyperspectral Image Data

    Get PDF

    ColorPhylo: A Color Code to Accurately Display Taxonomic Classifications

    Get PDF
    Color may be very useful to visualise complex data. As far as taxonomy is concerned, color may help observing various species’ characteristics in correlation with classification. However, choosing the number of subclasses to display is often a complex task: on the one hand, assigning a limited number of colors to taxa of interest hides the structure imbedded in the subtrees of the taxonomy; on the other hand, differentiating a high number of taxa by giving them specific colors, without considering the underlying taxonomy, may lead to unreadable results since relationships between displayed taxa would not be supported by the color code. In the present paper, an automatic color coding scheme is proposed to visualise the levels of taxonomic relationships displayed as overlay on any kind of data plot. To achieve this goal, a dimensionality reduction method allows displaying taxonomic “distances” onto a Euclidean two-dimensional space. The resulting map is projected onto a 2D color space (the Hue, Saturation, Brightness colorimetric space with brightness set to 1). Proximity in the taxonomic classification corresponds to proximity on the map and is therefore materialised by color proximity. As a result, each species is related to a color code showing its position in the taxonomic tree. The so called ColorPhylo displays taxonomic relationships intuitively and can be combined with any biological result. A Matlab version of ColorPhylo is available at http://sy.lespi.free.fr/ColorPhylo-homepage.html. Meanwhile, an ad-hoc distance in case of taxonomy with unknown edge lengths is proposed

    CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis

    Get PDF
    CARMAweb (Comprehensive R-based Microarray Analysis web service) is a web application designed for the analysis of microarray data. CARMAweb performs data preprocessing (background correction, quality control and normalization), detection of differentially expressed genes, cluster analysis, dimension reduction and visualization, classification, and Gene Ontology-term analysis. This web application accepts raw data from a variety of imaging software tools for the most widely used microarray platforms: Affymetrix GeneChips, spotted two-color microarrays and Applied Biosystems (ABI) microarrays. R and packages from the Bioconductor project are used as an analytical engine in combination with the R function Sweave, which allows automatic generation of analysis reports. These report files contain all R commands used to perform the analysis and guarantee therefore a maximum transparency and reproducibility for each analysis. The web application is implemented in Java based on the latest J2EE (Java 2 Enterprise Edition) software technology. CARMAweb is freely available at

    Data exploration process based on the self-organizing map

    Get PDF
    With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining. The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims. The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library. Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe

    Visualization of clusters in geo-referenced data using three-dimensional self-organizing maps

    Get PDF
    Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de InformaçãoThe Self-Organizing Map (SOM) is an artificial neural network that performs simultaneously vector quantization and vector projection. Due to this characteristic, the SOM is an effective method for clustering analysis via visualization. The SOM can be visualized through the output space, generally a regular two-dimensional grid of nodes, and through the input space, emphasizing the vector quantization process. Among all the strategies for visualizing the SOM, we are particularly interested in those that allow dealing with spatial dependency, linking the SOM to the geographic visualization with color. One possible approach, commonly used, is the cartographic representation of data with label colors defined from the output space of a two-dimensional SOM. However, in the particular case of geo-referenced data, it is possible to consider the use of a three-dimensional SOM for this purpose, thus adding one more dimension in the analysis. In this dissertation is presented a method for clustering geo-referenced data that integrates the visualization of both perspectives of a three dimensional SOM: linking its output space to the cartographic representation through a ordered set of colors; and exploring the use of frontiers among geo-referenced elements, computed according to the distances in the input space between their Best Matching Units

    Clinflow:an interactive application for processing and exploring clinical data

    Get PDF
    Abstract. Clinical data is the most valuable resource in healthcare development, but it also comes with many challenges. When clinical researchers are required to combine medical expertise with statistical and programming knowledge, the need for data analysis tools arises. The aim of this thesis was to design ClinFlow, an application for clinical data processing and visualization based on user needs. R language and Shiny framework were selected for creating this tool. The goal was to give the means for the clinical researcher to conduct data analysis in an interactive environment, with no need for statistical programming knowledge. A case study using data from The Finnish Type 1 Diabetes Prediction and Prevention (DIPP) study was conducted to demonstrate the feasibility of this application. The initial results achieved in this case study support the previous research of the DIPP study. ClinFlow shows potential for becoming a useful data analysis tool for clinical research
    • …
    corecore