1,476 research outputs found

    How to use the Kohonen algorithm to simultaneously analyse individuals in a survey

    Full text link
    The Kohonen algorithm (SOM, Kohonen,1984, 1995) is a very powerful tool for data analysis. It was originally designed to model organized connections between some biological neural networks. It was also immediately considered as a very good algorithm to realize vectorial quantization, and at the same time pertinent classification, with nice properties for visualization. If the individuals are described by quantitative variables (ratios, frequencies, measurements, amounts, etc.), the straightforward application of the original algorithm leads to build code vectors and to associate to each of them the class of all the individuals which are more similar to this code-vector than to the others. But, in case of individuals described by categorical (qualitative) variables having a finite number of modalities (like in a survey), it is necessary to define a specific algorithm. In this paper, we present a new algorithm inspired by the SOM algorithm, which provides a simultaneous classification of the individuals and of their modalities.Comment: Special issue ESANN 0

    SOM-based algorithms for qualitative variables

    Full text link
    It is well known that the SOM algorithm achieves a clustering of data which can be interpreted as an extension of Principal Component Analysis, because of its topology-preserving property. But the SOM algorithm can only process real-valued data. In previous papers, we have proposed several methods based on the SOM algorithm to analyze categorical data, which is the case in survey data. In this paper, we present these methods in a unified manner. The first one (Kohonen Multiple Correspondence Analysis, KMCA) deals only with the modalities, while the two others (Kohonen Multiple Correspondence Analysis with individuals, KMCA\_ind, Kohonen algorithm on DISJonctive table, KDISJ) can take into account the individuals, and the modalities simultaneously.Comment: Special Issue apr\`{e}s WSOM 03 \`{a} Kitakiush

    Can Self-Organizing Maps accurately predict photometric redshifts?

    Full text link
    We present an unsupervised machine learning approach that can be employed for estimating photometric redshifts. The proposed method is based on a vector quantization approach called Self--Organizing Mapping (SOM). A variety of photometrically derived input values were utilized from the Sloan Digital Sky Survey's Main Galaxy Sample, Luminous Red Galaxy, and Quasar samples along with the PHAT0 data set from the PHoto-z Accuracy Testing project. Regression results obtained with this new approach were evaluated in terms of root mean square error (RMSE) to estimate the accuracy of the photometric redshift estimates. The results demonstrate competitive RMSE and outlier percentages when compared with several other popular approaches such as Artificial Neural Networks and Gaussian Process Regression. SOM RMSE--results (using Δ\Deltaz=zphot_{phot}--zspec_{spec}) for the Main Galaxy Sample are 0.023, for the Luminous Red Galaxy sample 0.027, Quasars are 0.418, and PHAT0 synthetic data are 0.022. The results demonstrate that there are non--unique solutions for estimating SOM RMSEs. Further research is needed in order to find more robust estimation techniques using SOMs, but the results herein are a positive indication of their capabilities when compared with other well-known methods.Comment: 5 pages, 3 figures, submitted to PAS

    Efficient estimators : the use of neural networks to construct pseudo panels

    Get PDF
    Pseudo panels constituted with repeated cross-sections are good substitutes to true panel data. But individuals grouped in a cohort are not the same for successive periods, and it results in a measurement error and inconsistent estimators. The solution is to constitute cohorts of large numbers of individuals but as homogeneous as possible. This paper explains a new way to do this: by using a self-organizing map, whose properties are well suited to achieve these objectives. It is applied to a set of Canadian surveys, in order to estimate income elasticities for 18 consumption functions..Pseudo panels ; self-organizing maps;

    Self-Organising Networks for Classification: developing Applications to Science Analysis for Astroparticle Physics

    Full text link
    Physics analysis in astroparticle experiments requires the capability of recognizing new phenomena; in order to establish what is new, it is important to develop tools for automatic classification, able to compare the final result with data from different detectors. A typical example is the problem of Gamma Ray Burst detection, classification, and possible association to known sources: for this task physicists will need in the next years tools to associate data from optical databases, from satellite experiments (EGRET, GLAST), and from Cherenkov telescopes (MAGIC, HESS, CANGAROO, VERITAS)

    Knowledge Extraction from Survey Data using Neural Networks

    Get PDF
    Surveys are an important tool for researchers. Survey attributes are typically discrete data measured on a Likert scale. Collected responses from the survey contain an enormous amount of data. It is increasingly important to develop powerful means for clustering such data and knowledge extraction that could help in decision-making. The process of clustering becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns and that may increase the complexity of the knowledge extraction process. Also, the outcome from the knowledge extraction process may not be satisfactory. The main focus of this research is to propose a method to solve the clustering problem of Likert-scale survey data and to propose an efficient knowledge extraction methodology that can work even if the number of unique patterns is large. The proposed method uses an unsupervised neural network for clustering, and an extended version of the conjunctive rule extraction algorithm has been proposed to extract knowledge in the form of rules. In order to verify the effectiveness of the proposed method, it is applied to two sets of Likert scale survey data, and results show that the proposed method produces rule sets that are comprehensive and concise without affecting the accuracy of the classifier

    Mapping the Galaxy Color-Redshift Relation: Optimal Photometric Redshift Calibration Strategies for Cosmology Surveys

    Get PDF
    Calibrating the photometric redshifts of >10^9 galaxies for upcoming weak lensing cosmology experiments is a major challenge for the astrophysics community. The path to obtaining the required spectroscopic redshifts for training and calibration is daunting, given the anticipated depths of the surveys and the difficulty in obtaining secure redshifts for some faint galaxy populations. Here we present an analysis of the problem based on the self-organizing map, a method of mapping the distribution of data in a high-dimensional space and projecting it onto a lower-dimensional representation. We apply this method to existing photometric data from the COSMOS survey selected to approximate the anticipated Euclid weak lensing sample, enabling us to robustly map the empirical distribution of galaxies in the multidimensional color space defined by the expected Euclid filters. Mapping this multicolor distribution lets us determine where - in galaxy color space - redshifts from current spectroscopic surveys exist and where they are systematically missing. Crucially, the method lets us determine whether a spectroscopic training sample is representative of the full photometric space occupied by the galaxies in a survey. We explore optimal sampling techniques and estimate the additional spectroscopy needed to map out the color-redshift relation, finding that sampling the galaxy distribution in color space in a systematic way can efficiently meet the calibration requirements. While the analysis presented here focuses on the Euclid survey, similar analysis can be applied to other surveys facing the same calibration challenge, such as DES, LSST, and WFIRST.Comment: ApJ accepted, 17 pages, 10 figure

    Evaluating a Self-Organizing Map for Clustering and Visualizing Optimum Currency Area Criteria

    Get PDF
    Optimum currency area (OCA) theory attempts to define the geographical region in which it would maximize economic efficiency to have a single currency. In this paper, the focus is on prospective and current members of the Economic and Monetary Union. For this task, a self-organizing neural network, the Self-organizing map (SOM), is combined with hierarchical clustering for a two-level approach to clustering and visualizing OCA criteria. The output of the SOM is a topologically preserved two-dimensional grid. The final models are evaluated based on both clustering tendencies and accuracy measures. Thereafter, the two-dimensional grid of the chosen model is used for visual assessment of the OCA criteria, while its clustering results are projected onto a geographic map.Self-organizing maps, Optimum Currency Area, projection, clustering, geospatial visualization
    • 

    corecore