Search CORE

54 research outputs found

Homogeneous and heterogeneous distributed classification for pocket data mining

Author: Aldridge P.
Bramer M.
Gaber M. M.
Liu H.
May D.
Stahl Frederic
Yu P. S.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques

Central Archive at the University of Reading

Portsmouth University Research Portal (Pure)

Pocket Data Mining: the next generation in predictive analytics

Author: Gaber M.
Publication venue
Publication date: 19/04/2012
Field of study

Portsmouth University Research Portal (Pure)

Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams

Author: Paquet Eric
Pesaranghader Ali
Viktor Herna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2017
Field of study

The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifiers with diverse learning styles and different drift detectors. Intuitively, the current 'best' (classifier, detector) pair is application dependent and may change as a result of the stream evolution. Our research builds on this observation. We introduce the \mbox{Tornado} framework that implements a reservoir of diverse classifiers, together with a variety of drift detection algorithms. In our framework, all (classifier, detector) pairs proceed, in parallel, to construct models against the evolving data streams. At any point in time, we select the pair which currently yields the best performance. We further incorporate two novel stacking-based drift detection methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The experimental evaluation confirms that the current 'best' (classifier, detector) pair is not only heavily dependent on the characteristics of the stream, but also that this selection evolves as the stream flows. Further, our \mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure

arXiv.org e-Print Archive

NRC Publications Archive

Random Prism: An Alternative to Random Forests.

Author: Bramer Max
Stahl Frederic
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting

Central Archive at the University of Reading

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Mining Data Streams using Option Trees

Author: Reddy Dr. P. Chenna
Yusuf B. Reshma
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/07/2012
Field of study

Many organizations today have more than very large databases. The databases also grow without limit at a rate of several million records per day. Data streams are ubiquitous and have become an important research topic in the last two decades. Mining these continuous data streams brings unique opportunities, but also new challenges. For their predictive nonparametric analysis, Hoeffding-based trees are often a method of choice, which offers a possibility of any-time predictions. Although one of their main problems is the delay in learning progress due to the presence of equally discriminative attributes. Options are a natural way to deal with this problem. In this paper, Option trees which build upon regular trees is presented by adding splitting options in the internal nodes to improve accuracy, stability and reduce ambiguity. Adaptive Hoeffding option tree algorithm is reviewed and results based on accuracy and processing speed of algorithm under various memory limits is presented. The accuracy of Hoeffding Option tree is compared with Hoeffding trees and adaptive Hoeffding option tree under circumstantial conditions. Keywords: data stream, hoeffding trees, option trees, adaptive hoeffding option trees, large database

International Institute for Science, Technology and Education (IISTE): E-Journals

Activity recognition from smartphone sensing data

Author: Lopes Alexandre de Oliveira
Publication venue
Publication date: 01/01/2012
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

Online Machine Learning Algorithms Review and Comparison in Healthcare

Author: Jagirdar Nikhil Mukund
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 15/12/2018
Field of study

Currently, the healthcare industry uses Big Data for essential patient care information. Electronic Health Records (EHR) store massive data and are continuously updated with information such as laboratory results, medication, and clinical events. There are various methods by which healthcare data is generated and collected, including databases, healthcare websites, mobile applications, wearable technologies, and sensors. The continuous flow of data will improve healthcare service, medical diagnostic research and, ultimately, patient care. Thus, it is important to implement advanced data analysis techniques to obtain more precise prediction results.Machine Learning (ML) has acquired an important place in Big Healthcare Data (BHD). ML has the capability to run predictive analysis, detect patterns or red flags, and connect dots to enhance personalized treatment plans. Because predictive models have dependent and independent variables, ML algorithms perform mathematical calculations to find the best suitable mathematical equations to predict dependent variables using a given set of independent variables. These model performances depend on datasets and response, or dependent, variable types such as binary or multi-class, supervised or unsupervised.The current research analyzed incremental, or streaming or online, algorithm performance with offline or batch learning (these terms are used interchangeably) using performance measures such as accuracy, model complexity, and time consumption. Batch learning algorithms are provided with the specific dataset, which always constrains the size of the dataset depending on memory consumption. In the case of incremental algorithms, data arrive sequentially, which is determined by hyperparameter optimization such as chunk size, tree split, or hoeffding bond. The model complexity of an incremental learning algorithm is based on a number of parameters, which in turn determine memory consumption

University of Tennessee, Knoxville: Trace

Faster Hoeffding Racing: Bernstein Races via Jackknife Estimates

Author: A. Antos
B. Efron
C. McDiarmid
E. Even-Dar
J.-Y. Audibert
J.-Y. Audibert
J.M. Steele
L. Paninski
M. Arcones
R. Jin
S. Boucheron
S.N. Bernstein
T. Peel
W. Hoeffding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

On Making Feasible Smartphone-Based Human Activity Recognition

Author: Francisco Miguel de Lamares Martins Barbosa
Publication venue
Publication date: 12/07/2019
Field of study

Repositório Aberto da Universidade do Porto

Secure and Usable User-in-a-Context Continuous Authentication in Smartphones Leveraging Non-Assisted Sensors

Author: Fuentes García-Romero de Tejada José María de
González Manzano Lorena
Ribagorda Garnacho Arturo
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Smartphones are equipped with a set of sensors that describe the environment (e.g., GPS, noise, etc.) and their current status and usage (e.g., battery consumption, accelerometer readings, etc.). Several works have already addressed how to leverage such data for user-in-a-context continuous authentication, i.e., determining if the porting user is the authorized one and resides in his regular physical environment. This can be useful for an early reaction against robbery or impersonation. However, most previous works depend on assisted sensors, i.e., they rely upon immutable elements (e.g., cell towers, satellites, magnetism), thus being ineffective in their absence. Moreover, they focus on accuracy aspects, neglecting usability ones. For this purpose, in this paper, we explore the use of four non-assisted sensors, namely battery, transmitted data, ambient light and noise. Our approach leverages data stream mining techniques and offers a tunable security-usability trade-off. We assess the accuracy, immediacy, usability and readiness of the proposal. Results on 50 users over 24 months show that battery readings alone achieve 97.05% of accuracy and 81.35% for audio, light and battery all together. Moreover, when usability is at stake, robbery is detected in 100 s for the case of battery and in 250 s when audio, light and battery are applied. Remarkably, these figures are obtained with moderate training and storage needs, thus making the approach suitable for current devices.This work has been partially supported by MINECO grants TIN2013-46469-R (SPINY), TIN2016-79095-C2-2-R (SMOG-DEV); CAM grant S2013/ICE-3095 (CIBERDINE), co-funded with European FEDER funds

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo