90,720 research outputs found
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
Comparison Between Supervised and Unsupervised Classifications of Neuronal Cell Types: A Case Study
In the study of neural circuits, it becomes essential to discern the different neuronal cell types that build the circuit. Traditionally, neuronal cell types have been classified using qualitative descriptors. More recently, several attempts have been made to classify neurons quantitatively, using unsupervised clustering methods. While useful, these algorithms do not take advantage of previous information known to the investigator, which could improve the classification task. For neocortical GABAergic interneurons, the problem to discern among different cell types is particularly difficult and better methods are needed to perform objective classifications. Here we explore the use of supervised classification algorithms to classify neurons based on their morphological features, using a database of 128 pyramidal cells and 199 interneurons from mouse neocortex. To evaluate the performance of different algorithms we used, as a “benchmark,” the test to automatically distinguish between pyramidal cells and interneurons, defining “ground truth” by the presence or absence of an apical dendrite. We compared hierarchical clustering with a battery of different supervised classification algorithms, finding that supervised classifications outperformed hierarchical clustering. In addition, the selection of subsets of distinguishing features enhanced the classification accuracy for both sets of algorithms. The analysis of selected variables indicates that dendritic features were most useful to distinguish pyramidal cells from interneurons when compared with somatic and axonal morphological variables. We conclude that supervised classification algorithms are better matched to the general problem of distinguishing neuronal cell types when some information on these cell groups, in our case being pyramidal or interneuron, is known a priori. As a spin-off of this methodological study, we provide several methods to automatically distinguish neocortical pyramidal cells from interneurons, based on their morphologies
Exploring dependence between categorical variables: benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms
This manuscript is concerned with relating two approaches that can be used to
explore complex dependence structures between categorical variables, namely
Bayesian partitioning of the covariate space incorporating a variable selection
procedure that highlights the covariates that drive the clustering, and
log-linear modelling with interaction terms. We derive theoretical results on
this relation and discuss if they can be employed to assist log-linear model
determination, demonstrating advantages and limitations with simulated and real
data sets. The main advantage concerns sparse contingency tables. Inferences
from clustering can potentially reduce the number of covariates considered and,
subsequently, the number of competing log-linear models, making the exploration
of the model space feasible. Variable selection within clustering can inform on
marginal independence in general, thus allowing for a more efficient
exploration of the log-linear model space. However, we show that the clustering
structure is not informative on the existence of interactions in a consistent
manner. This work is of interest to those who utilize log-linear models, as
well as practitioners such as epidemiologists that use clustering models to
reduce the dimensionality in the data and to reveal interesting patterns on how
covariates combine.Comment: Preprin
Identifying smart design attributes for Industry 4.0 customization using a clustering Genetic Algorithm
Industry 4.0 aims at achieving mass customization at a
mass production cost. A key component to realizing this is accurate
prediction of customer needs and wants, which is however a
challenging issue due to the lack of smart analytics tools. This
paper investigates this issue in depth and then develops a predictive
analytic framework for integrating cloud computing, big data
analysis, business informatics, communication technologies, and
digital industrial production systems. Computational intelligence
in the form of a cluster k-means approach is used to manage
relevant big data for feeding potential customer needs and wants
to smart designs for targeted productivity and customized mass
production. The identification of patterns from big data is achieved
with cluster k-means and with the selection of optimal attributes
using genetic algorithms. A car customization case study shows
how it may be applied and where to assign new clusters with
growing knowledge of customer needs and wants. This approach
offer a number of features suitable to smart design in realizing
Industry 4.0
Attribute Identification and Predictive Customisation Using Fuzzy Clustering and Genetic Search for Industry 4.0 Environments
Today´s factory involves more services and customisation. A paradigm shift is towards “Industry 4.0” (i4) aiming at realising mass customisation at a mass production cost. However, there is a lack of tools for customer informatics. This paper addresses this issue and develops a predictive analytics framework integrating big data analysis and business informatics, using Computational Intelligence (CI). In particular, a fuzzy c-means is used for pattern recognition, as well as managing relevant big data for feeding potential customer needs and wants for improved productivity at the design stage for customised mass production. The selection of patterns from big data is performed using a genetic algorithm with fuzzy c-means, which helps with clustering and selection of optimal attributes. The case study shows that fuzzy c-means are able to assign new clusters with growing knowledge of customer needs and wants. The dataset has three types of entities: specification of various characteristics, assigned insurance risk rating, and normalised losses in use compared with other cars. The fuzzy c-means tool offers a number of features suitable for smart designs for an i4 environment
Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering
As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance
Application of artificial neural network in market segmentation: A review on recent trends
Despite the significance of Artificial Neural Network (ANN) algorithm to
market segmentation, there is a need of a comprehensive literature review and a
classification system for it towards identification of future trend of market
segmentation research. The present work is the first identifiable academic
literature review of the application of neural network based techniques to
segmentation. Our study has provided an academic database of literature between
the periods of 2000-2010 and proposed a classification scheme for the articles.
One thousands (1000) articles have been identified, and around 100 relevant
selected articles have been subsequently reviewed and classified based on the
major focus of each paper. Findings of this study indicated that the research
area of ANN based applications are receiving most research attention and self
organizing map based applications are second in position to be used in
segmentation. The commonly used models for market segmentation are data mining,
intelligent system etc. Our analysis furnishes a roadmap to guide future
research and aid knowledge accretion and establishment pertaining to the
application of ANN based techniques in market segmentation. Thus the present
work will significantly contribute to both the industry and academic research
in business and marketing as a sustainable valuable knowledge source of market
segmentation with the future trend of ANN application in segmentation.Comment: 24 pages, 7 figures,3 Table
- …