Search CORE

199,426 research outputs found

Parallel Hierarchical Affinity Propagation with MapReduce

Author: Haber Rana
Mijatovic Nenad
Peter Adrian M.
Rose Dillon Mark
Rouly Jean Michel
Publication venue
Publication date: 28/03/2014
Field of study

The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques

arXiv.org e-Print Archive

Crossref

Recommended from our members

Monitoring conceptual development with text mining technologies: CONSPECT

Author: Buelow Katja
Haley Debra
Wild Fridolin
Publication venue
Publication date: 01/10/2010
Field of study

This paper evaluates CONSPECT, a service that analyses states in a learner’s conceptual development. It combines two technologies – Latent Semantic Analysis to analyse text and Network Analysis (NA) to provide visualisations – into a technique called Meaningful Interaction Analysis (MIA). CONSPECT was designed to help both online learners and their tutors monitor their conceptual development. This paper reports on the validation experiments undertaken to determine how well LSA matches first year medical students in clustering concepts and in annotating text. The validation used several techniques, including card sorting and Likert scales. CONSPECT produces almost ‘peer’ quality results and what remains to be tested is whether it improves with more advanced learners. One of the experiments showed an average 0.7 correlation between humans and CONSPECT

Open Research Online (The Open University)

Visualization and clustering for SNMP intrusion detection

Author: Corchado Emilio
Herrero Alvaro
Sánchez Raúl
Publication venue: 'Informa UK Limited'
Publication date: 01/10/2013
Field of study

Accurate intrusion detection is still an open challenge. The present work aims at being one step toward that purpose by studying the combination of clustering and visualization techniques. To do that, the mobile visualization connectionist agent-based intrusion detection system (MOVICAB-IDS), previously proposed as a hybrid intelligent IDS based on visualization techniques, is upgraded by adding automatic response thanks to clustering methods. To check the validity of the proposed clustering extension, it has been applied to the identification of different anomalous situations related to the simple network management network protocol by using real-life data sets. Different ways of applying neural projection and clustering techniques are studied in the present article. Through the experimental validation it is shown that the proposed techniques could be compatible and consequently applied to a continuous network flow for intrusion detectionSpanish Ministry of Economy and Competitiveness with ref: TIN2010-21272-C02-01 (funded by the European Regional Development Fund) and SA405A12-2 from Junta de Castilla y Leon

Repositorio Institucional de la Universidad de Burgos

Deep Learning vs Spectral Clustering into an active clustering with pairwise constraints propagation

Author: Benoit Alexandre
Ionescu Bogdan
Lambert Patrick
Voiron Nicolas
Publication venue: HAL CCSD
Publication date: 15/06/2016
Field of study

International audienceIn our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods

Hal - Université Grenoble Alpes

HAL Université de Savoie

Fuzzy and non-fuzzy approaches for digital image classification

Author: Diykh Mohammed
Li Yan
Publication venue: Asian Research Publishing Network (A R P N)
Publication date: 28/02/2016
Field of study

This paper classifies different digital images using two types of clustering algorithms. The first type is the fuzzy clustering methods, while the second type considers the non-fuzzy methods. For the performance comparisons, we apply four clustering algorithms with two from the fuzzy type and the other two from the non-fuzzy (partitonal) clustering type. The automatic partitional clustering algorithm and the partitional k-means algorithm are chosen as the two examples of the non-fuzzy clustering techniques, while the automatic fuzzy algorithm and the fuzzy C-means clustering algorithm are taken as the examples of the fuzzy clustering techniques. The evaluation among the four algorithms are done by implementing these algorithms to three different types of image databases, based on the comparison criteria of: dataset size, cluster number, execution time and classification accuracy and k-cross validation. The experimental results demonstrate that the non-fuzzy algorithms have higher accuracies in compared to the fuzzy algorithms, especially when dealing with large data sizes and different types of images. Three types of image databases of human face images, handwritten digits and natural scenes are used for the performance evaluation

University of Southern Queensland ePrints

Analysis of FMRI Exams Through Unsupervised Learning and Evaluation Index

Author: Martinelli Samuele
Publication venue: country:Italy
Publication date: 01/01/2020
Field of study

In the last few years, the clustering of time series has seen significant growth and has proven effective in providing useful information in various domains of use. This growing interest in time series clustering is the result of the effort made by the scientific community in the context of time data mining. For these reasons, the first phase of the thesis focused on the study of the data obtained from fMRI exams carried out in task-based and resting state mode, using and comparing different clustering algorithms: SelfOrganizing map (SOM), the Growing Neural Gas (GNG) and Neural Gas (NG) which are crisp-type algorithms, a fuzzy algorithm, the Fuzzy C algorithm, was also used (FCM). The evaluation of the results obtained by using clustering algorithms was carried out using the Davies Bouldin evaluation index (DBI or DB index). Clustering evaluation is the second topic of this thesis. To evaluate the validity of the clustering, there are specific techniques, but none of these is already consolidated for the study of fMRI exams. Furthermore, the evaluation of evaluation techniques is still an open research field. Eight clustering validation indexes (CVIs) applied to fMRI data clustering will be analysed. The validation indices that have been used are Pakhira Bandyopadhyay Maulik Index (crisp and fuzzy), Fukuyama Sugeno Index, Rezaee Lelieveldt Reider Index, Wang Sun Jiang Index, Xie Beni Index, Davies Bouldin Index, Soft Davies Bouldin Index. Furthermore, an evaluation of the evaluation indices will be carried out, which will take into account the sub-optimal performance obtained by the indices, through the introduction of new metrics. Finally, a new methodology for the evaluation of CVIs will be introduced, which will use an ANFIS model

Archivio istituzionale della ricerca - Università dell'Insubria