Search CORE

5,666 research outputs found

A MapReduce Algorithm for Minimum Vertex Cover Problems and Its Randomization

Author: Kinjo Daiki
Nakamura Morikazu
Yoshida Takeo
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 25/03/2021
Field of study

MapReduce is a programming paradigm for large-scale distributed information processing. This paper proposes a MapReduce algorithm for the minimum vertex cover problem, which is known to be NP-hard. The MapReduce algorithm can efficiently obtain a minimal vertex cover in a small number of rounds. We show the effectiveness of the algorithm, through experimental evaluation and comparison with exact and approximate algorithms that it demonstrates high quality in a small number of MapReduce rounds. We also confirm from experimentation that the algorithm has good scalability, allowing high-quality solutions under restricted computation times due to increased graph size. Moreover, we extend our algorithm to randomized one to obtain good expected approximate ratio

Feature selection in high-dimensional dataset using MapReduce

Author: Bontempi Gianluca
Borgne Yann-Aël Le
Reggiani Claudio
Publication venue
Publication date: 07/09/2017
Field of study

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features

arXiv.org e-Print Archive

DI-fusion

Parallel Hierarchical Affinity Propagation with MapReduce

Author: Haber Rana
Mijatovic Nenad
Peter Adrian M.
Rose Dillon Mark
Rouly Jean Michel
Publication venue
Publication date: 28/03/2014
Field of study

The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques

arXiv.org e-Print Archive

An ontology enhanced parallel SVM for scalable spam filter training

Author: Bauer
Blanco
Blanzieri
Blei
Breiman
Cao
Caruana
Chawla
Colas
Cristianini
Dean
Do
Gansterer
Godwin Caruana
Graf
Hall
Huang
Kearns
Kim
Maozhen Li
Mei
Platt
Suykens
Taura
Vapnik
Wang
Woodsend
Yang Liu
Zanghirati
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

Brunel University Research Archive