1,476 research outputs found
How to use the Kohonen algorithm to simultaneously analyse individuals in a survey
The Kohonen algorithm (SOM, Kohonen,1984, 1995) is a very powerful tool for
data analysis. It was originally designed to model organized connections
between some biological neural networks. It was also immediately considered as
a very good algorithm to realize vectorial quantization, and at the same time
pertinent classification, with nice properties for visualization. If the
individuals are described by quantitative variables (ratios, frequencies,
measurements, amounts, etc.), the straightforward application of the original
algorithm leads to build code vectors and to associate to each of them the
class of all the individuals which are more similar to this code-vector than to
the others. But, in case of individuals described by categorical (qualitative)
variables having a finite number of modalities (like in a survey), it is
necessary to define a specific algorithm. In this paper, we present a new
algorithm inspired by the SOM algorithm, which provides a simultaneous
classification of the individuals and of their modalities.Comment: Special issue ESANN 0
SOM-based algorithms for qualitative variables
It is well known that the SOM algorithm achieves a clustering of data which
can be interpreted as an extension of Principal Component Analysis, because of
its topology-preserving property. But the SOM algorithm can only process
real-valued data. In previous papers, we have proposed several methods based on
the SOM algorithm to analyze categorical data, which is the case in survey
data. In this paper, we present these methods in a unified manner. The first
one (Kohonen Multiple Correspondence Analysis, KMCA) deals only with the
modalities, while the two others (Kohonen Multiple Correspondence Analysis with
individuals, KMCA\_ind, Kohonen algorithm on DISJonctive table, KDISJ) can take
into account the individuals, and the modalities simultaneously.Comment: Special Issue apr\`{e}s WSOM 03 \`{a} Kitakiush
Can Self-Organizing Maps accurately predict photometric redshifts?
We present an unsupervised machine learning approach that can be employed for
estimating photometric redshifts. The proposed method is based on a vector
quantization approach called Self--Organizing Mapping (SOM). A variety of
photometrically derived input values were utilized from the Sloan Digital Sky
Survey's Main Galaxy Sample, Luminous Red Galaxy, and Quasar samples along with
the PHAT0 data set from the PHoto-z Accuracy Testing project. Regression
results obtained with this new approach were evaluated in terms of root mean
square error (RMSE) to estimate the accuracy of the photometric redshift
estimates. The results demonstrate competitive RMSE and outlier percentages
when compared with several other popular approaches such as Artificial Neural
Networks and Gaussian Process Regression. SOM RMSE--results (using
z=z--z) for the Main Galaxy Sample are 0.023, for the
Luminous Red Galaxy sample 0.027, Quasars are 0.418, and PHAT0 synthetic data
are 0.022. The results demonstrate that there are non--unique solutions for
estimating SOM RMSEs. Further research is needed in order to find more robust
estimation techniques using SOMs, but the results herein are a positive
indication of their capabilities when compared with other well-known methods.Comment: 5 pages, 3 figures, submitted to PAS
Efficient estimators : the use of neural networks to construct pseudo panels
Pseudo panels constituted with repeated cross-sections are good substitutes to true panel data. But individuals grouped in a cohort are not the same for successive periods, and it results in a measurement error and inconsistent estimators. The solution is to constitute cohorts of large numbers of individuals but as homogeneous as possible. This paper explains a new way to do this: by using a self-organizing map, whose properties are well suited to achieve these objectives. It is applied to a set of Canadian surveys, in order to estimate income elasticities for 18 consumption functions..Pseudo panels ; self-organizing maps;
Self-Organising Networks for Classification: developing Applications to Science Analysis for Astroparticle Physics
Physics analysis in astroparticle experiments requires the capability of
recognizing new phenomena; in order to establish what is new, it is important
to develop tools for automatic classification, able to compare the final result
with data from different detectors. A typical example is the problem of Gamma
Ray Burst detection, classification, and possible association to known sources:
for this task physicists will need in the next years tools to associate data
from optical databases, from satellite experiments (EGRET, GLAST), and from
Cherenkov telescopes (MAGIC, HESS, CANGAROO, VERITAS)
Knowledge Extraction from Survey Data using Neural Networks
Surveys are an important tool for researchers. Survey attributes are typically discrete data measured on a Likert scale. Collected responses from the survey contain an enormous amount of data. It is increasingly important to develop powerful means for clustering such data and knowledge extraction that could help in decision-making. The process of clustering becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns and that may increase the complexity of the knowledge extraction process. Also, the outcome from the knowledge extraction process may not be satisfactory. The main focus of this research is to propose a method to solve the clustering problem of Likert-scale survey data and to propose an efficient knowledge extraction methodology that can work even if the number of unique patterns is large. The proposed method uses an unsupervised neural network for clustering, and an extended version of the conjunctive rule extraction algorithm has been proposed to extract knowledge in the form of rules. In order to verify the effectiveness of the proposed method, it is applied to two sets of Likert scale survey data, and results show that the proposed method produces rule sets that are comprehensive and concise without affecting the accuracy of the classifier
Mapping the Galaxy Color-Redshift Relation: Optimal Photometric Redshift Calibration Strategies for Cosmology Surveys
Calibrating the photometric redshifts of >10^9 galaxies for upcoming weak
lensing cosmology experiments is a major challenge for the astrophysics
community. The path to obtaining the required spectroscopic redshifts for
training and calibration is daunting, given the anticipated depths of the
surveys and the difficulty in obtaining secure redshifts for some faint galaxy
populations. Here we present an analysis of the problem based on the
self-organizing map, a method of mapping the distribution of data in a
high-dimensional space and projecting it onto a lower-dimensional
representation. We apply this method to existing photometric data from the
COSMOS survey selected to approximate the anticipated Euclid weak lensing
sample, enabling us to robustly map the empirical distribution of galaxies in
the multidimensional color space defined by the expected Euclid filters.
Mapping this multicolor distribution lets us determine where - in galaxy color
space - redshifts from current spectroscopic surveys exist and where they are
systematically missing. Crucially, the method lets us determine whether a
spectroscopic training sample is representative of the full photometric space
occupied by the galaxies in a survey. We explore optimal sampling techniques
and estimate the additional spectroscopy needed to map out the color-redshift
relation, finding that sampling the galaxy distribution in color space in a
systematic way can efficiently meet the calibration requirements. While the
analysis presented here focuses on the Euclid survey, similar analysis can be
applied to other surveys facing the same calibration challenge, such as DES,
LSST, and WFIRST.Comment: ApJ accepted, 17 pages, 10 figure
Evaluating a Self-Organizing Map for Clustering and Visualizing Optimum Currency Area Criteria
Optimum currency area (OCA) theory attempts to define the geographical region in which it would maximize economic efficiency to have a single currency. In this paper, the focus is on prospective and current members of the Economic and Monetary Union. For this task, a self-organizing neural network, the Self-organizing map (SOM), is combined with hierarchical clustering for a two-level approach to clustering and visualizing OCA criteria. The output of the SOM is a topologically preserved two-dimensional grid. The final models are evaluated based on both clustering tendencies and accuracy measures. Thereafter, the two-dimensional grid of the chosen model is used for visual assessment of the OCA criteria, while its clustering results are projected onto a geographic map.Self-organizing maps, Optimum Currency Area, projection, clustering, geospatial visualization
- âŠ