2 research outputs found
Defining Big Data
ABSTRACT
As Big Data becomes better understood, there is a need for a comprehensive definition of Big Data to support work in fields such as data quality for Big Data. Existing definitions of Big Data define Big Data by comparison with existing, usually relational, definitions, or define Big Data in terms of data characteristics or use an approach which combines data characteristics with the Big Data environment. In this paper we examine existing definitions of Big Data and discuss the strengths and limitations of the different approaches, with particular reference to issues related to data quality in Big Data. We identify the issues presented by incomplete or inconsistent definitions. We propose an alternative definition and relate this definition to our work on quality in Big Dat
Performance evaluation of Apache Mahout for mining large datasets
The main purpose of this project is to evaluate the performance of the Apache Mahout library, that contains data mining algorithms for data processing, using a twitter dataset. Performance is evaluated in terms of processing time, in-memory usage, I/O performance and algorithmic accuracy