81,355 research outputs found

    BAC: A bagged associative classifier for big data frameworks

    Get PDF
    Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets

    A scalable approach to fuzzy rough nearest neighbour classification with ordered weighted averaging operators

    Get PDF
    Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fuzzy rough sets for feature and prototype selection, and have often not involved very large datasets. We fill this gap by presenting the first Big Data extension of an algorithm that uses fuzzy rough sets directly to classify test instances, a distributed implementation of FRNN-OWA in Apache Spark. Through a series of systematic tests involving generated datasets, we demonstrate that it can achieve a speedup effectively equal to the number of computing cores used, meaning that it can scale to arbitrarily large datasets

    Methodology for knowledge extraction from mobility big data

    Get PDF
    The spread of mobile devices with several sensors, together with mo-bile communication, provides huge volumes of real-time data (big data) about users’ mobility habits, which should be correctly analysed to extract useful knowledge. In our research we explore a data mining approach based on a Naïve Bayes (NB) classifier applied to different sources of big data. To achieve this goal, we propose a methodology based on four processes that collects data and merges different data sources into pre-defined data classes. We can apply this methodology to different big data sources and extract a diversity of knowledge that can be applied to the development of dedicated applications and decision processes in the area of intelligent transportation systems, such as route advice, CO2 emissions reduction through fuel savings, and provision of smart advice for public transportation usage
    corecore