54 research outputs found

    Homogeneous and heterogeneous distributed classification for pocket data mining

    Get PDF
    Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques

    Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams

    Full text link
    The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifiers with diverse learning styles and different drift detectors. Intuitively, the current 'best' (classifier, detector) pair is application dependent and may change as a result of the stream evolution. Our research builds on this observation. We introduce the \mbox{Tornado} framework that implements a reservoir of diverse classifiers, together with a variety of drift detection algorithms. In our framework, all (classifier, detector) pairs proceed, in parallel, to construct models against the evolving data streams. At any point in time, we select the pair which currently yields the best performance. We further incorporate two novel stacking-based drift detection methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The experimental evaluation confirms that the current 'best' (classifier, detector) pair is not only heavily dependent on the characteristics of the stream, but also that this selection evolves as the stream flows. Further, our \mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure

    Random Prism: An Alternative to Random Forests.

    Get PDF
    Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting

    Mining Data Streams using Option Trees

    Get PDF
    Many organizations today have more than very large databases. The databases also grow without limit at a rate of several million records per day. Data streams are ubiquitous and have become an important research topic in the last two decades. Mining these continuous data streams brings unique opportunities, but also new challenges. For their predictive nonparametric analysis, Hoeffding-based trees are often a method of choice, which offers a possibility of any-time predictions. Although one of their main problems is the delay in learning progress due to the presence of equally discriminative attributes. Options are a natural way to deal with this problem. In this paper, Option trees which build upon regular trees is presented by adding splitting options in the internal nodes to improve accuracy, stability and reduce ambiguity. Adaptive Hoeffding option tree algorithm is reviewed and results based on accuracy and processing speed of algorithm under various memory limits is presented. The accuracy of Hoeffding Option tree is compared with Hoeffding trees and adaptive Hoeffding  option tree under circumstantial conditions. Keywords: data stream, hoeffding trees, option trees, adaptive hoeffding option trees, large database

    Activity recognition from smartphone sensing data

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Online Machine Learning Algorithms Review and Comparison in Healthcare

    Get PDF
    Currently, the healthcare industry uses Big Data for essential patient care information. Electronic Health Records (EHR) store massive data and are continuously updated with information such as laboratory results, medication, and clinical events. There are various methods by which healthcare data is generated and collected, including databases, healthcare websites, mobile applications, wearable technologies, and sensors. The continuous flow of data will improve healthcare service, medical diagnostic research and, ultimately, patient care. Thus, it is important to implement advanced data analysis techniques to obtain more precise prediction results.Machine Learning (ML) has acquired an important place in Big Healthcare Data (BHD). ML has the capability to run predictive analysis, detect patterns or red flags, and connect dots to enhance personalized treatment plans. Because predictive models have dependent and independent variables, ML algorithms perform mathematical calculations to find the best suitable mathematical equations to predict dependent variables using a given set of independent variables. These model performances depend on datasets and response, or dependent, variable types such as binary or multi-class, supervised or unsupervised.The current research analyzed incremental, or streaming or online, algorithm performance with offline or batch learning (these terms are used interchangeably) using performance measures such as accuracy, model complexity, and time consumption. Batch learning algorithms are provided with the specific dataset, which always constrains the size of the dataset depending on memory consumption. In the case of incremental algorithms, data arrive sequentially, which is determined by hyperparameter optimization such as chunk size, tree split, or hoeffding bond. The model complexity of an incremental learning algorithm is based on a number of parameters, which in turn determine memory consumption

    Secure and Usable User-in-a-Context Continuous Authentication in Smartphones Leveraging Non-Assisted Sensors

    Get PDF
    Smartphones are equipped with a set of sensors that describe the environment (e.g., GPS, noise, etc.) and their current status and usage (e.g., battery consumption, accelerometer readings, etc.). Several works have already addressed how to leverage such data for user-in-a-context continuous authentication, i.e., determining if the porting user is the authorized one and resides in his regular physical environment. This can be useful for an early reaction against robbery or impersonation. However, most previous works depend on assisted sensors, i.e., they rely upon immutable elements (e.g., cell towers, satellites, magnetism), thus being ineffective in their absence. Moreover, they focus on accuracy aspects, neglecting usability ones. For this purpose, in this paper, we explore the use of four non-assisted sensors, namely battery, transmitted data, ambient light and noise. Our approach leverages data stream mining techniques and offers a tunable security-usability trade-off. We assess the accuracy, immediacy, usability and readiness of the proposal. Results on 50 users over 24 months show that battery readings alone achieve 97.05% of accuracy and 81.35% for audio, light and battery all together. Moreover, when usability is at stake, robbery is detected in 100 s for the case of battery and in 250 s when audio, light and battery are applied. Remarkably, these figures are obtained with moderate training and storage needs, thus making the approach suitable for current devices.This work has been partially supported by MINECO grants TIN2013-46469-R (SPINY), TIN2016-79095-C2-2-R (SMOG-DEV); CAM grant S2013/ICE-3095 (CIBERDINE), co-funded with European FEDER funds
    corecore