2 research outputs found

    Defining Big Data

    Get PDF
    ABSTRACT As Big Data becomes better understood, there is a need for a comprehensive definition of Big Data to support work in fields such as data quality for Big Data. Existing definitions of Big Data define Big Data by comparison with existing, usually relational, definitions, or define Big Data in terms of data characteristics or use an approach which combines data characteristics with the Big Data environment. In this paper we examine existing definitions of Big Data and discuss the strengths and limitations of the different approaches, with particular reference to issues related to data quality in Big Data. We identify the issues presented by incomplete or inconsistent definitions. We propose an alternative definition and relate this definition to our work on quality in Big Dat

    Performance evaluation of Apache Mahout for mining large datasets

    Get PDF
    The main purpose of this project is to evaluate the performance of the Apache Mahout library, that contains data mining algorithms for data processing, using a twitter dataset. Performance is evaluated in terms of processing time, in-memory usage, I/O performance and algorithmic accuracy
    corecore