Search CORE

81,355 research outputs found

BAC: A bagged associative classifier for big data frameworks

Author: Apiletti Daniele
Garza Paolo
Venturini Luca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A scalable approach to fuzzy rough nearest neighbour classification with ordered weighted averaging operators

Author: Cornelis Chris
Lenz Oliver Urs
Peralta Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fuzzy rough sets for feature and prototype selection, and have often not involved very large datasets. We fill this gap by presenting the first Big Data extension of an algorithm that uses fuzzy rough sets directly to classify test instances, a distributed implementation of FRNN-OWA in Apache Spark. Through a series of systematic tests involving generated datasets, we demonstrate that it can achieve a speedup effectively equal to the number of computing cores used, meaning that it can scale to arbitrarily large datasets

Ghent University Academic Bibliography

Methodology for knowledge extraction from mobility big data

Author: Afonso José A.
Afonso João L.
Ferreira João C.
Monteiro Vítor Duarte Fernandes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The spread of mobile devices with several sensors, together with mo-bile communication, provides huge volumes of real-time data (big data) about users’ mobility habits, which should be correctly analysed to extract useful knowledge. In our research we explore a data mining approach based on a Naïve Bayes (NB) classifier applied to different sources of big data. To achieve this goal, we propose a methodology based on four processes that collects data and merges different data sources into pre-defined data classes. We can apply this methodology to different big data sources and extract a diversity of knowledge that can be applied to the development of dedicated applications and decision processes in the area of intelligent transportation systems, such as route advice, CO2 emissions reduction through fuel savings, and provision of smart advice for public transportation usage

Repositório Científico do Instituto Politécnico de Lisboa

Universidade do Minho: RepositoriUM