Performance Analysis of a Scalable Naïve Bayes Classifier on MapReduce and Beyond MapReduce

Abstract

Many real world areas from different sourcesgenerate the big data with large volume of highvelocity, complex and variable data. Big databecomes a challenge when they are difficult toprocess and extract knowledge using traditionalanalysis tools. Therefore the scalable machinelearning algorithms are needed for processing suchbig data. Recently Hadoop MapReduce frameworkhas been adapted for parallel computing. MapReducemay not fit for most of the real world dataapplications. For large scale machine learning ondistributed system, Spark has finally become muchmore viable beyond MapReduce. Although both ofthese frameworks are Apache-hosted data analyticframework, their performance varies significantlybased on the use case under their implementation.This paper aims to analyze the performance ofscalable Naïve Bayes classifier (SNB) which isimplemented on MapReduce and Beyond MapReduceover different real world datasets. The comparisonresults show that SNB on Beyond MapReduceprovides minimal processing time than SNB onMapReduce for efficiently big data classification

    Similar works

    Full text

    thumbnail-image

    Available Versions