Search CORE

8 research outputs found

Stochastic Streams: Sample Complexity vs. Space Complexity

Author: Crouch Michael
McGregor Andrew
Valiant Gregory
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th Annual European Symposium on Algorithms (ESA 2016)
Publication date: 01/01/2016
Field of study

We address the trade-off between the computational resources needed to process a large data set and the number of samples available from the data set. Specifically, we consider the following abstraction: we receive a potentially infinite stream of IID samples from some unknown distribution D, and are tasked with computing some function f(D). If the stream is observed for time t, how much memory, s, is required to estimate f(D)? We refer to t as the sample complexity and s as the space complexity. The main focus of this paper is investigating the trade-offs between the space and sample complexity. We study these trade-offs for several canonical problems studied in the data stream model: estimating the collision probability, i.e., the second moment of a distribution, deciding if a graph is connected, and approximating the dimension of an unknown subspace. Our results are based on techniques for simulating different classical sampling procedures in this model, emulating random walks given a sequence of IID samples, as well as leveraging a characterization between communication bounded protocols and statistical query algorithms

Dagstuhl Research Online Publication Server

Recommended from our members

Streaming Algorithms Via Reductions

Author: Crouch Michael S
Publication venue: ScholarWorks@UMass Amherst
Publication date: 12/11/2014
Field of study

In the streaming algorithms model of computation we must process data in order and without enough memory to remember the entire input. We study reductions between problems in the streaming model with an eye to using reductions as an algorithm design technique. Our contributions include: * Linear Transformation reductions, which compose with existing linear sketch techniques. We use these for small-space algorithms for numeric measurements of distance-from-periodicity, finding the period of a numeric stream, and detecting cyclic shifts. * The first streaming graph algorithms in the sliding window\u27 model, where we must consider only the most recent L elements for some fixed threshold L. We develop basic algorithms for connectivity and unweighted maximum matching, then develop a variety of other algorithms via reductions to these problems. * A new reduction from maximum weighted matching to maximum unweighted matching. This reduction immediately yields improved approximation guarantees for maximum weighted matching in the semistreaming, sliding window, and MapReduce models, and extends to the more general problem of finding maximum independent sets in p-systems. * Algorithms in a stream-of-samples model which exhibit clear sample vs. space tradeoffs. These algorithms are also inspired by examining reductions. We provide algorithms for calculating F_k frequency moments and graph connectivity

ScholarWorks@UMass Amherst

Boosting decision stumps for dynamic feature selection on data streams

Author: Barddal Jean Paul
Bifet Albert
Enembreck Fabrício
Gomes Heitor Murilo
Pfahringer Bernhard
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Feature selection targets the identification of which features of a dataset are relevant to the learning task. It is also widely known and used to improve computation times, reduce computation requirements, and to decrease the impact of the curse of dimensionality and enhancing the generalization rates of classifiers. In data streams, classifiers shall benefit from all the items above, but more importantly, from the fact that the relevant subset of features may drift over time. In this paper, we propose a novel dynamic feature selection method for data streams called Adaptive Boosting for Feature Selection (ABFS). ABFS chains decision stumps and drift detectors, and as a result, identifies which features are relevant to the learning task as the stream progresses with reasonable success. In addition to our proposed algorithm, we bring feature selection-specific metrics from batch learning to streaming scenarios. Next, we evaluate ABFS according to these metrics in both synthetic and real-world scenarios. As a result, ABFS improves the classification rates of different types of learners and eventually enhances computational resources usage

Research Commons@Waikato

Ensemble learning with discrete classifiers on small devices

Author: Buschjäger Sebastian
Publication venue
Publication date: 01/01/2022
Field of study

Machine learning has become an integral part of everyday life ranging from applications in AI-powered search queries to (partial) autonomous driving. Many of the advances in machine learning and its application have been possible due to increases in computation power, i.e., by reducing manufacturing sizes while maintaining or even increasing energy consumption. However, 2-3 nm manufacturing is within reach, making further miniaturization increasingly difficult while thermal design power limits are simultaneously reached, rendering entire parts of the chip useless for certain computational loads. In this thesis, we investigate discrete classifier ensembles as a resource-efficient alternative that can be deployed to small devices that only require small amounts of energy. Discrete classifiers are classifiers that can be applied -- and oftentimes also trained -- without the need for costly floating-point operations. Hence, they are ideally suited for deployment to small devices with limited resources. The disadvantage of discrete classifiers is that their predictive performance often lacks behind their floating-point siblings. Here, the combination of multiple discrete classifiers into an ensemble can help to improve the predictive performance while still having a manageable resource consumption. This thesis studies discrete classifier ensembles from a theoretical point of view, an algorithmic point of view, and a practical point of view. In the theoretical investigation, the bias-variance decomposition and the double-descent phenomenon are examined. The bias-variance decomposition of the mean-squared error is re-visited and generalized to an arbitrary twice-differentiable loss function, which serves as a guiding tool throughout the thesis. Similarly, the double-descent phenomenon is -- for the first time -- studied comprehensively in the context of tree ensembles and specifically random forests. Contrary to established literature, the experiments in this thesis indicate that there is no double-descent in random forests. While the training of ensembles is well-studied in literature, the deployment to small devices is often neglected. Additionally, the training of ensembles on small devices has not been considered much so far. Hence, the algorithmic part of this thesis focuses on the deployment of discrete classifiers and the training of ensembles on small devices. First, a novel combination of ensemble pruning (i.e., removing classifiers from the ensemble) and ensemble refinement (i.e., re-training of classifiers in the ensemble) is presented, which uses a novel proximal gradient descent algorithm to minimize a combined loss function. The resulting algorithm removes unnecessary classifiers from an already trained ensemble while improving the performance of the remaining classifiers at the same time. Second, this algorithm is extended to the more challenging setting of online learning in which the algorithm receives training examples one by one. The resulting shrub ensembles algorithm allows the training of ensembles in an online fashion while maintaining a strictly bounded memory consumption. It outperforms existing state-of-the-art algorithms under resource constraints and offers competitive performance in the general case. Last, this thesis studies the deployment of decision tree ensembles to small devices by optimizing their memory layout. The key insight here is that decision trees have a probabilistic inference time because different observations can take different paths from the root to a leaf. By estimating the probability of visiting a particular node in the tree, one can place it favorably in the memory to maximize the caching behavior and, thus, increase its performance without changing the model. Last, several real-world applications of tree ensembles and Binarized Neural Networks are presented

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Continuous sampling from distributed streams

Author: Babcock B.
Cormode G.
Cormode G.
Gibbons P.
Graham Cormode
Huang L.
Ke Yi
Muthukrishnan S.
Qin Zhang
S. Muthukrishnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

New Directions in Online Learning: Boosting, Partial Information, and Non-Stationarity

Author: Jung Young Hun
Publication venue
Publication date: 01/01/2020
Field of study

Online learning, where a learning algorithm fits a model on-the-fly with streaming data, has become an important research area in machine learning. Batch learning, where the entire data set has to be available to the learning algorithm, is not always a suitable paradigm for the big data era. It is increasingly common in many practical situations, such as online ads prediction or control of self-driving cars, that data instances naturally arrive in a sequential manner. In these situations, researchers want to update their model in an online fashion. This dissertation pursues several topics at the frontier of online learning research. In Chapter 2 and Chapter 3, the journey starts with online boosting. Online boosting studies how to combine multiple online weak learners to get a stronger learner. Chapter 2 considers online multi-class classification problems. Chapter 3 focuses on the more challenging multi-label ranking problem where there are multiple correct labels and the learner outputs a ranking of labels based on their relevance. In both chapters, an optimal algorithm and an adaptive algorithm are proposed. The optimal algorithms require a minimal number of weak learners to attain the desired accuracy. The adaptive algorithms are practically more useful since they do not require a priori knowledge about the strength of weak learners and are more computationally efficient. The adaptive algorithms are not statistically optimal but they still come with reasonable performance guarantees. The empirical results on real data sets support the theoretical findings and the proposed boosting algorithms outperformed existing competitors on benchmark data sets. Chapter 4 considers the partial information setting, where the learner does not receive the true labels. Partial feedback is common in practice as obtaining complete feedback can be costly. The chapter revisits the boosting algorithms that are presented in Chapter 2 and Chapter 3 and extends them to work with partial information feedback. Despite the learner receiving much less information, comparable performance guarantees can be made. Later in Chapter 5 and Chapter 6, we move on to another interesting area in online learning called restless bandit problems. Unlike the classical (stochastic) multi-armed bandit problems where the reward distributions are unknown but stationary, in restless bandit problems the distributions can change over time. This extra layer of complexity allows us to study more complicated models, but the analysis becomes even more difficult. In restless bandit problems, it is assumed that each arm has a state that evolves according to an unknown Markov process, and the reward distribution depends on the arm's current state. This setting can be thought of as a sub-class of reinforcement learning and the partial observability inherent in this problem makes the analysis very challenging. The well known Thompson Sampling algorithm is analyzed and a Bayesian regret bound for it is derived. Chapter 5 considers the episodic case where the system periodically resets. Chapter 6 extends the analysis to the more challenging non-episodic (i.e., infinite time horizon) case. In both settings, Thompson Sampling algorithms (with slight modifications) enjoy sub-linear regret bounds, and the empirical results on simulated data support this fact. The experiments also suggest the possibility that the algorithm can be used in the frequentist setting even though the theoretical bounds are only shown for the Bayesian regret.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155110/1/yhjung_1.pd

Deep Blue Documents at the University of Michigan