16,878 research outputs found
Recommended from our members
Methods of conceptual clustering and their relation to numerical taxonomy
Artificial Intelligence (AI) methods for machine learning can be viewed as forms of exploratory data analysis, even though they differ markedly from the statistical methods generally connoted by the term. The distinction between methods of machine learning and statistical data analysis is primarily due to differences in the way techniques of each type represent data and structure within data. That is, methods of machine learning are strongly biased toward symbolic (as opposed to numeric) data representations. We explore this difference within a limited context, devoting the bulk of our paper to the explication of conceptual clustering, an extension to the statistically based methods of numerical taxonomy. In conceptual clustering the formation of object clusters is dependent on the quality of 'higher-level' characterizations, termed concepts, of the clusters. The form of concepts used by existing conceptual clustering systems (sets of necessary and sufficient conditions) is described in some detail. This is followed by descriptions of several conceptual clustering techniques, along with sample output. We conclude with a discussion of how alternative concept representations might enhance the effectiveness of future conceptual clustering systems
"Selection of Input Parameters for Multivariate Classifiersin Proactive Machine Health Monitoring by Clustering Envelope Spectrum Harmonics"
In condition monitoring (CM) signal analysis the inherent problem of key characteristics being masked by noise can be addressed by analysis of the signal envelope. Envelope analysis of vibration signals is effective in extracting useful information for diagnosing different faults. However, the number of envelope features is generally too large to be effectively incorporated in system models. In this paper a novel method of extracting the pertinent information from such signals based on multivariate statistical techniques is developed which substantialy reduces the number of input parameters required for data classification models. This was achieved by clustering possible model variables into a number of homogeneous groups to assertain levels of interdependency. Representatives from each of the groups were selected for their power to discriminate between the categorical classes. The techniques established were applied to a reciprocating compressor rig wherein the target was identifying machine states with respect to operational health through comparison of signal outputs for healthy and faulty systems. The technique allowed near perfect fault classification. In addition methods for identifying seperable classes are investigated through profiling techniques, illustrated using Andrew’s Fourier curves
Point process-based modeling of multiple debris flow landslides using INLA: an application to the 2009 Messina disaster
We develop a stochastic modeling approach based on spatial point processes of
log-Gaussian Cox type for a collection of around 5000 landslide events provoked
by a precipitation trigger in Sicily, Italy. Through the embedding into a
hierarchical Bayesian estimation framework, we can use the Integrated Nested
Laplace Approximation methodology to make inference and obtain the posterior
estimates. Several mapping units are useful to partition a given study area in
landslide prediction studies. These units hierarchically subdivide the
geographic space from the highest grid-based resolution to the stronger
morphodynamic-oriented slope units. Here we integrate both mapping units into a
single hierarchical model, by treating the landslide triggering locations as a
random point pattern. This approach diverges fundamentally from the unanimously
used presence-absence structure for areal units since we focus on modeling the
expected landslide count jointly within the two mapping units. Predicting this
landslide intensity provides more detailed and complete information as compared
to the classically used susceptibility mapping approach based on relative
probabilities. To illustrate the model's versatility, we compute absolute
probability maps of landslide occurrences and check its predictive power over
space. While the landslide community typically produces spatial predictive
models for landslides only in the sense that covariates are spatially
distributed, no actual spatial dependence has been explicitly integrated so far
for landslide susceptibility. Our novel approach features a spatial latent
effect defined at the slope unit level, allowing us to assess the spatial
influence that remains unexplained by the covariates in the model
Finding Groups in Large Data Sets
This paper aims to give an overview of methods to find groups in large data sets, such as household expenditure survey data. These methods are grouped in three: cluster analysis, dimension reduction and basic explorative methods. The emphasis is put on a critical analysis and potential drawbacks, especially of inputs that have to be provided by the researcher. These may impose some structure not present in the data, thus defeating the purpose of revealing intrinsic patterns. In general, the more elaborate methods, such as cluster analysis, are delicate to apply, especially in the context of social sciences. Often, it may be best to limit oneself to more transparent approaches such as comparisons of basic statistics.
- …