2,089 research outputs found

    Time-Varying Gaussian Process Bandit Optimization

    Get PDF
    We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally.Comment: To appear in AISTATS 201

    What Are Investors Afraid of? Finding the Big Bad Wolf

    Get PDF
    open2The aim of ļ¬nancial institutions and regulators is to ļ¬nd an eļ¬€ective way to measure the risk proļ¬le of diļ¬€erent segments of investors. Both economists and psychologists developed several methodologies to elicit and assess individual risk attitude, but these are not perfect and show several drawbacks when used in practice. Thanks to a unique database of around 15,000 investors,thispapercombinessurvey-basedevidencewithrevealedpreferencesbaseduponobserved asset allocation. This paper conļ¬rms some results known in the literature like the gender and age diļ¬€erencesinrisk-taking. Moreover,thebehavioralclusteringapproachusedfortheanalysisisuseful in an inferential framework. The segments built starting from the questionnaire permit to ā€œforecastā€ the individual risk attitude that is described by the individual choices in terms of asset allocation. Loss aversion per se is a relevant variable in explaining ļ¬nancial risk-taking.openBarbara Alemanni; Pierpaolo UbertiAlemanni, Barbara; Uberti, Pierpaol

    Black-Box Parallelization for Machine Learning

    Get PDF
    The landscape of machine learning applications is changing rapidly: large centralized datasets are replaced by high volume, high velocity data streams generated by a vast number of geographically distributed, loosely connected devices, such as mobile phones, smart sensors, autonomous vehicles or industrial machines. Current learning approaches centralize the data and process it in parallel in a cluster or computing center. This has three major disadvantages: (i) it does not scale well with the number of data-generating devices since their growth exceeds that of computing centers, (ii) the communication costs for centralizing the data are prohibitive in many applications, and (iii) it requires sharing potentially privacy-sensitive data. Pushing computation towards the data-generating devices alleviates these problems and allows to employ their otherwise unused computing power. However, current parallel learning approaches are designed for tightly integrated systems with low latency and high bandwidth, not for loosely connected distributed devices. Therefore, I propose a new paradigm for parallelization that treats the learning algorithm as a black box, training local models on distributed devices and aggregating them into a single strong one. Since this requires only exchanging models instead of actual data, the approach is highly scalable, communication-efficient, and privacy-preserving. Following this paradigm, this thesis develops black-box parallelizations for two broad classes of learning algorithms. One approach can be applied to incremental learning algorithms, i.e., those that improve a model in iterations. Based on the utility of aggregations it schedules communication dynamically, adapting it to the hardness of the learning problem. In practice, this leads to a reduction in communication by orders of magnitude. It is analyzed for (i) online learning, in particular in the context of in-stream learning, which allows to guarantee optimal regret and for (ii) batch learning based on empirical risk minimization where optimal convergence can be guaranteed. The other approach is applicable to non-incremental algorithms as well. It uses a novel aggregation method based on the Radon point that allows to achieve provably high model quality with only a single aggregation. This is achieved in polylogarithmic runtime on quasi-polynomially many processors. This relates parallel machine learning to Nick's class of parallel decision problems and is a step towards answering a fundamental open problem about the abilities and limitations of efficient parallel learning algorithms. An empirical study on real distributed systems confirms the potential of the approaches in realistic application scenarios

    Regret Guarantees for Online Receding Horizon Learning Control

    Full text link
    We address the problem of controlling an unknown linear dynamical system with general cost functions and affine constraints on the control input through online learning. Our goal is to develop an algorithm that minimizes the regret, which is defined as the difference between the cumulative cost incurred by the algorithm and that of a receding horizon controller (RHC) with full knowledge of the system and state and that satisfies the control input constraints. Such performance metric is harder than minimizing the regret w.r.t. the best linear feedback controller commonly adopted in the literature, because the linear controllers might be sub-optimal or violate the constraints throughout. By exploring the conditions under which sub-linear regret is guaranteed, we propose an online receding horizon controller that learns the unknown system parameter from the sequential observation along with the necessary perturbation for exploration. We show that the proposed controller's performance is upper bounded by O~(T3/4)\tilde{\mathcal{O}}(T^{3/4}) for both regret and cumulative constraint violation when the controller has preview of the cost functions for the interval that doubles in size from one interval to the next. We also show that improved upper bound of O~(T2/3)\tilde{\mathcal{O}}(T^{2/3}) can be achieved for both regret and cumulative constraint violation when the controller has full preview of the cost functions.Comment: arXiv admin note: text overlap with arXiv:2010.1132

    A Reinforcement Learning-Based User-Assisted Caching Strategy for Dynamic Content Library in Small Cell Networks

    Get PDF
    This paper studies the problem of joint edge cache placement and content delivery in cache-enabled small cell networks in the presence of spatio-temporal content dynamics unknown a priori. The small base stations (SBSs) satisfy usersā€™ content requests either directly from their local caches, or by retrieving from other SBSsā€™ caches or from the content server. In contrast to previous approaches that assume a static content library at the server, this paper considers a more realistic non-stationary content library, where new contents may emerge over time at different locations. To keep track of spatio-temporal content dynamics, we propose that the new contents cached at users can be exploited by the SBSs to timely update their flexible cache memories in addition to their routine off-peak main cache updates from the content server. To take into account the variations in traffic demands as well as the limited caching space at the SBSs, a user-assisted caching strategy is proposed based on reinforcement learning principles to progressively optimize the caching policy with the target of maximizing the weighted network utility in the long run. Simulation results verify the superior performance of the proposed caching strategy against various benchmark designs
    • ā€¦
    corecore