Search CORE

38,647 research outputs found

Batch-Incremental Learning for Mining Data Streams

Author: Bainbridge David
Holmes Geoffrey
Kirkby Richard Brendon
Publication venue: 'University of Waikato'
Publication date: 01/01/2004
Field of study

The data stream model for data mining places harsh restrictions on a learning algorithm. First, a model must be induced incrementally. Second, processing time for instances must keep up with their speed of arrival. Third, a model may only use a constant amount of memory, and must be ready for prediction at any point in time. We attempt to overcome these restrictions by presenting a data stream classification algorithm where the data is split into a stream of disjoint batches. Single batches of data can be processed one after the other by any standard non-incremental learning algorithm. Our approach uses ensembles of decision trees. These tree ensembles are iteratively merged into a single interpretable model of constant maximal size. Using benchmark datasets the algorithm is evaluated for accuracy against state-of-the-art algorithms that make use of the entire dataset

Research Commons@Waikato

Private Incremental Regression

Author: Chaudhuri K.
Chaudhuri K.
Fard M. M.
Gordon Y.
Jain P.
Jain P.
Jain P.
Kabáan A.
Kasiviswanathan S.
Kifer D.
Ledoux M.
Maillard O.
Mishra N.
Shalev-Shwartz S.
Talwar K.
Thakurta A. G.
Thakurta A. G.
Vapnik V.
Williams O.
Publication venue
Publication date: 04/01/2017
Field of study

Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of

\approx\sqrt{d}

, where

d

is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems.Comment: To appear in PODS 201

arXiv.org e-Print Archive

Crossref

A Note on Batch and Incremental Learnability

Author: Sharma Arun
Publication venue: Academic Press.
Publication date: 30/06/1998
Field of study

AbstractAccording to Gold's criterion of identification in the limit, a learner, presented with data about a concept, is allowed to make a finite number of incorrect hypotheses before converging to a correct hypothesis. If, on the other hand, the learner is allowed to make only one conjecture which has to be correct, the resulting criterion of success is known as finite identification Identification in the limit may be viewed as an idealized model for incremental learning whereas finite identification may be viewed as an idealized model for batch learning. The present paper establishes a surprising fact that the collections of recursively enumerable languages that can be finite identified (batch learned in the ideal case) from both positive and negative data can also be identified in the limit (incrementally learned in the ideal case) from only positive data. It is often difficult to extract insights about practical learning systems from abstract theorems in inductive inference. However, this result may be seen as carrying a moral for the design of learning systems, as it yields, in theidealcase of no inaccuracies, an algorithm for converting batch systems that learn from both positive and negative data into incremental systems that learn from only positive data without any loss in learning power. This is achieved by the incremental system simulating the batch system in incremental fashion and using the heuristic of “localized closed-world assumption” to generate negative data

Elsevier - Publisher Connector

Incremental Learning of Nonparametric Bayesian Mixture Models

Author: Gomes Ryan
Perona Pietro
Welling Max
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Clustering is a fundamental task in many vision applications. To date, most clustering algorithms work in a batch setting and training examples must be gathered in a large group before learning can begin. Here we explore incremental clustering, in which data can arrive continuously. We present a novel incremental model-based clustering algorithm based on nonparametric Bayesian methods, which we call Memory Bounded Variational Dirichlet Process (MB-VDP). The number of clusters are determined flexibly by the data and the approach can be used to automatically discover object categories. The computational requirements required to produce model updates are bounded and do not grow with the amount of data processed. The technique is well suited to very large datasets, and we show that our approach outperforms existing online alternatives for learning nonparametric Bayesian mixture models

CiteSeerX

Caltech Authors

Batch and incremental learning of decision trees

Author: He Z.
Publication venue
Publication date: 01/01/2008
Field of study

Repository TU/e

Pure OAI Repository