15,669 research outputs found
Recommended from our members
Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies
A new method of fuzzy clustering by using the combination of the firefly algorithm and the particle swarm optimization algorithm
Abstract: Fuzzy clustering algorithm is one of the data mining methods that is applied in different fields. According to the fuzzy clustering algorithm, each object is allocated to the clusters regarding its percentage of belonging to each of the clusters. Finding the cluster centers is one of the main objectives of the clustering and it is possible to apply the swarm intelligence methods in order to accurately find the cluster centers. The swarm intelligence algorithms have separately been applied for clustering and they received the optimized solution. In order to solve the problems of the fuzzy clustering algorithm, the combined method based on the firefly algorithm and the particle swarm optimization algorithm is applied. In order to determine the validity, the suggested method is tested on four standard data collections received from the valid site of UCI. The simulation results shows that the combination of two algorithms do the clustering more accurately than applying each of them separately
Recommended from our members
A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
open access articleThis article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter changes
Binary Particle Swarm Optimization based Biclustering of Web usage Data
Web mining is the nontrivial process to discover valid, novel, potentially
useful knowledge from web data using the data mining techniques or methods. It
may give information that is useful for improving the services offered by web
portals and information access and retrieval tools. With the rapid development
of biclustering, more researchers have applied the biclustering technique to
different fields in recent years. When biclustering approach is applied to the
web usage data it automatically captures the hidden browsing patterns from it
in the form of biclusters. In this work, swarm intelligent technique is
combined with biclustering approach to propose an algorithm called Binary
Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The
main objective of this algorithm is to retrieve the global optimal bicluster
from the web usage data. These biclusters contain relationships between web
users and web pages which are useful for the E-Commerce applications like web
advertising and marketing. Experiments are conducted on real dataset to prove
the efficiency of the proposed algorithms
- …