157,064 research outputs found
Incorporating deep visual features into multiobjective based multi-view search results clustering
Current paper explores the use of multi-view learning for search result clustering. A web-snippet
can be represented using multiple views. Apart from textual view cued by both the semantic
and syntactic information, a complementary view extracted from images contained in the websnippets is also utilized in the current framework. A single consensus partitioning is finally obtained after consulting these two individual views by the deployment of a multi-objective based
clustering technique. Several objective functions including the values of a cluster quality measure evaluating the goodness of partitionings obtained using different views and an agreementdisagreement index, quantifying the amount of oneness among multiple views in generating partitionings are optimized simultaneously using AMOSA. In order to detect the number of clusters
automatically, concepts of variable length solutions and a vast range of permutation operators
are introduced in the clustering process. Finally a set of alternative partitionings are obtained on
the final Pareto front by the proposed multi-view based multi-objective technique. Experimental
results by the proposed approach on several bench-mark test datasets with respect to different
performance metrics evidently establish the power of visual and text based views in achieving
better search result clustering
Multi-Objective Differential Evolution for Automatic Clustering with Application to Micro-Array Data Analysis
This paper applies the Differential Evolution (DE) algorithm to the task of automatic fuzzy clustering in a Multi-objective Optimization (MO) framework. It compares the performances of two multi-objective variants of DE over the fuzzy clustering problem, where two conflicting fuzzy validity indices are simultaneously optimized. The resultant Pareto optimal set of solutions from each algorithm consists of a number of non-dominated solutions, from which the user can choose the most promising ones according to the problem specifications. A real-coded representation of the search variables, accommodating variable number of cluster centers, is used for DE. The performances of the multi-objective DE-variants have also been contrasted to that of two most well-known schemes of MO clustering, namely the Non Dominated Sorting Genetic Algorithm (NSGA II) and Multi-Objective Clustering with an unknown number of Clusters K (MOCK). Experimental results using six artificial and four real life datasets of varying range of complexities indicate that DE holds immense promise as a candidate algorithm for devising MO clustering schemes
Analyzing the Impact of Genetic Parameters on Gene Grouping Genetic Algorithm and Clustering Genetic Algorithm
Genetic Algorithms are stochastic randomized procedures used to solve search and optimization problems. Many multi-population and multi-objective Genetic Algorithms are introduced by researchers to achieve improved performance. Gene Grouping Genetic Algorithm (GGGA) and Clustering Genetic Algorithm (CGA) are of such kinds which are proved to perform better than Standard Genetic Algorithm (SGA). This paper compares the performance of both these algorithms by varying the genetic parameters. The results show that GGGA provides good solutions, even to large-sized problems in reasonable computation time compared to CGA and SGA. Keywords: Stochastic, randomized, multi-population, Gene Grouping Genetic Algorithm, Clustering Genetic Algorithm
An immune algorithm based fuzzy predictive modeling mechanism using variable length coding and multi-objective optimization allied to engineering materials processing
In this paper, a systematic multi-objective fuzzy
modeling approach is proposed, which can be regarded
as a three-stage modeling procedure. In the first stage, an
evolutionary based clustering algorithm is developed to
extract an initial fuzzy rule base from the data. Based on
this model, a back-propagation algorithm with momentum
terms is used to refine the initial fuzzy model. The refined
model is then used to seed the initial population of an
immune inspired multi-objective optimization algorithm
in the third stage to obtain a set of fuzzy models with
improved transparency. To tackle the problem of
simultaneously optimizing the structure and parameters, a
variable length coding scheme is adopted to improve the
efficiency of the search. The proposed modeling approach
is applied to a real data set from the steel industry.
Results show that the proposed approach is capable of
eliciting not only accurate but also transparent fuzzy
models
Improving Clustering Methods By Exploiting Richness Of Text Data
Clustering is an unsupervised machine learning technique, which involves discovering different clusters (groups) of similar objects in unlabeled data and is generally considered to be a NP hard problem. Clustering methods are widely used in a verity of disciplines for analyzing different types of data, and a small improvement in clustering method can cause a ripple effect in advancing research of multiple fields.
Clustering any type of data is challenging and there are many open research questions. The clustering problem is exacerbated in the case of text data because of the additional challenges such as issues in capturing semantics of a document, handling rich features of text data and dealing with the well known problem of the curse of dimensionality.
In this thesis, we investigate the limitations of existing text clustering methods and address these limitations by providing five new text clustering methods--Query Sense Clustering (QSC), Dirichlet Weighted K-means (DWKM), Multi-View Multi-Objective Evolutionary Algorithm (MMOEA), Multi-objective Document Clustering (MDC) and Multi-Objective Multi-View Ensemble Clustering (MOMVEC). These five new clustering methods showed that the use of rich features in text clustering methods could outperform the existing state-of-the-art text clustering methods.
The first new text clustering method QSC exploits user queries (one of the rich features in text data) to generate better quality clusters and cluster labels.
The second text clustering method DWKM uses probability based weighting scheme to formulate a semantically weighted distance measure to improve the clustering results.
The third text clustering method MMOEA is based on a multi-objective evolutionary algorithm. MMOEA exploits rich features to generate a diverse set of candidate clustering solutions, and forms a better clustering solution using a cluster-oriented approach.
The fourth and the fifth text clustering method MDC and MOMVEC address the limitations of MMOEA. MDC and MOMVEC differ in terms of the implementation of their multi-objective evolutionary approaches.
All five methods are compared with existing state-of-the-art methods. The results of the comparisons show that the newly developed text clustering methods out-perform existing methods by achieving up to 16\% improvement for some comparisons. In general, almost all newly developed clustering algorithms showed statistically significant improvements over other existing methods.
The key ideas of the thesis highlight that exploiting user queries improves Search Result Clustering(SRC); utilizing rich features in weighting schemes and distance measures improves soft subspace clustering; utilizing multiple views and a multi-objective cluster oriented method improves clustering ensemble methods; and better evolutionary operators and objective functions improve multi-objective evolutionary clustering ensemble methods.
The new text clustering methods introduced in this thesis can be widely applied in various domains that involve analysis of text data. The contributions of this thesis which include five new text clustering methods, will not only help researchers in the data mining field but also to help a wide range of researchers in other fields
Hierarchical maximum likelihood clustering approach
Objective:
In this work, we focused on developing a clustering approach for biological data. In many biological
analyses, such as multi-omics data analysis and genome-wide
association studies (GWAS) analysis, it is crucial to find groups of data belonging to subtypes of diseases or tumors. Methods:
Conventionally, the k-means clustering algorithm is
overwhelmingly applied in many areas including biological
sciences. There are, however, several alternative clustering algorithms that can be applied, including support vector clustering. In this paper, taking into consideration the nature of biological data, we propose a maximum likelihood clustering scheme based on a hierarchical framework.
Results: This method can perform clustering even when the data belonging to different groups overlap. It can also perform clustering when the number of samples is lower than the data dimensionality.
Conclusion: The proposed scheme is free from selecting initial settings to begin the search process. In addition, it does not require the computation of the first and second derivative of likelihood functions, as is required by many other maximum likelihood based methods.
Significance: This algorithm uses distribution and centroid
information to cluster a sample and was applied to biological data. A Matlab implementation of this method can be downloaded from the web-link
http://www.riken.jp/en/research/labs/ims/med_sci_math/
- …