Search CORE

324 research outputs found

Tie-breaking in Hoeffding trees

Author: Holmes Geoffrey
Pfahringer Bernhard
Richard Kirkby
Publication venue: ECML/PKDD
Publication date: 01/01/2005
Field of study

A thorough examination of the performance of Hoeffding trees, state-of-the-art in classification for data streams, on a range of datasets reveals that tie breaking, an essential but supposedly rare procedure, is employed much more than expected. Testing with a lightweight method for handling continuous attributes, we find that the excessive invocation of tie breaking causes performance to degrade significantly on complex and noisy data. Investigating ways to reduce the number of tie breaks, we propose an adaptive method that overcomes the problem while not significantly affecting performance on simpler datasets

Recommended from our members

Application of Advanced Early Warning Systems with Adaptive Protection

Author: Ashrafi Frank
Babski-Reeves Kari
Blumstein Carl
Centeno Virgilio
Cibulka Lloyd
King Roger
Madani Vahid
Thorp James
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

This project developed and field-tested two methods of Adaptive Protection systems utilizing synchrophasor data. One method detects conditions of system stress that can lead to unintended relay operation, and initiates a supervisory signal to modify relay response in real time to avoid false trips. The second method detects the possibility of false trips of impedance relays as stable system swings “encroach” on the relays’ impedance zones, and produces an early warning so that relay engineers can re-evaluate relay settings. In addition, real-time synchrophasor data produced by this project was used to develop advanced visualization techniques for display of synchrophasor data to utility operators and engineers

eScholarship - University of California

Activity recognition from smartphone sensing data

Author: Lopes Alexandre de Oliveira
Publication venue
Publication date: 01/01/2012
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams

Author: Paquet Eric
Pesaranghader Ali
Viktor Herna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2017
Field of study

The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifiers with diverse learning styles and different drift detectors. Intuitively, the current 'best' (classifier, detector) pair is application dependent and may change as a result of the stream evolution. Our research builds on this observation. We introduce the \mbox{Tornado} framework that implements a reservoir of diverse classifiers, together with a variety of drift detection algorithms. In our framework, all (classifier, detector) pairs proceed, in parallel, to construct models against the evolving data streams. At any point in time, we select the pair which currently yields the best performance. We further incorporate two novel stacking-based drift detection methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The experimental evaluation confirms that the current 'best' (classifier, detector) pair is not only heavily dependent on the characteristics of the stream, but also that this selection evolves as the stream flows. Further, our \mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure

arXiv.org e-Print Archive

Scalable real-time classification of data streams with concept drift

Author: Aggarwal
Ben-Haim
Bramer
Bramer
Cortes
Dawid
Diego
Domingos
Domingos
Frederic Stahl
Gaber
Gaber
Gama
Gama
Gama
Han
João Bártolo Gomes
Le
Mark Tennant
Morales
Omer Rana
Quinlan
Ranjan
Ross
Sidhu
Stisen
Tennant
Tennant
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm’s speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study

Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases

Author: Latouche Pierre
Ouadah Sarah
Robin Stéphane
Publication venue
Publication date: 29/07/2019
Field of study

The degrees are a classical and relevant way to study the topology of a network. They can be used to assess the goodness-of-fit for a given random graph model. In this paper we introduce goodness-of-fit tests for two classes of models. First, we consider the case of independent graph models such as the heterogeneous Erd\"os-R\'enyi model in which the edges have different connection probabilities. Second, we consider a generic model for exchangeable random graphs called the W-graph. The stochastic block model and the expected degree distribution model fall within this framework. We prove the asymptotic normality of the degree mean square under these independent and exchangeable models and derive formal tests. We study the power of the proposed tests and we prove the asymptotic normality under specific sparsity regimes. The tests are illustrated on real networks from social sciences and ecology, and their performances are assessed via a simulation study

arXiv.org e-Print Archive

Hal-Diderot

Symmetric Rank Covariances: a Generalised Framework for Nonparametric Measures of Dependence

Author: Drton Mathias
Meinshausen Nicolai
Weihs Luca
Publication venue
Publication date: 18/08/2017
Field of study

The need to test whether two random vectors are independent has spawned a large number of competing measures of dependence. We are interested in nonparametric measures that are invariant under strictly increasing transformations, such as Kendall's tau, Hoeffding's D, and the more recently discovered Bergsma--Dassios sign covariance. Each of these measures exhibits symmetries that are not readily apparent from their definitions. Making these symmetries explicit, we define a new class of multivariate nonparametric measures of dependence that we refer to as Symmetric Rank Covariances. This new class generalises all of the above measures and leads naturally to multivariate extensions of the Bergsma--Dassios sign covariance. Symmetric Rank Covariances may be estimated unbiasedly using U-statistics for which we prove results on computational efficiency and large-sample behavior. The algorithms we develop for their computation include, to the best of our knowledge, the first efficient algorithms for the well-known Hoeffding's D statistic in the multivariate setting

arXiv.org e-Print Archive

Repository for Publications and Research Data

Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey

Author: Biswal Biswajit
Muallem Asmah
Pan Jan W.
Shetty Sachin
Zhao Juan
Publication venue: ODU Digital Commons
Publication date: 01/01/2017
Field of study

This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection