2,148 research outputs found

    Online Network Source Optimization with Graph-Kernel MAB

    Full text link
    We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity

    Toward Data Efficient Online Sequential Learning

    Get PDF
    Can machines optimally take sequential decisions over time? Since decades, researchers have been seeking an answer to this question, with the ultimate goal of unlocking the potential of artificial general intelligence (AGI) for a better and sustainable society. Many are the sectors that would be boosted by machines being able to take efficient sequential decisions over time. Let think at real-world applications such as personalized systems in entertainment (content systems) but also in healthcare (personalized therapy), smart cities (traffic control, flooding prevention), robots (control and planning), etc.. However, letting machines taking proper decisions in real-life is a highly challenging task. This is caused by the uncertainty behind such decisions (uncertainty on the actual reward, on the context, on the environment, etc.). A viable solution is to learn by experience (i.e., by trial and error), letting the machines uncover the uncertainty while taking decisions, and refining its strategy accordingly. However, such refinement is usually highly data-hungry (data-inefficiency), requiring a large amount of application specified data, leading to very slow learning processes -- hence very slow convergence to optimal strategies (curse of dimensionality). Luckily, data is usually intrinsically structured. Identifying and exploiting such structure substantially improves the data-efficiency of sequential learning algorithms. This is the key hypothesis underpinning the research in this thesis, in which novel structural learning methodologies are proposed for decision-making strategies problems such as Recommendation System (RS), Multi-armed Bandit (MAB) and Reinforcement Learning (RL), with the ultimate goal of making the learning process more (data)-efficient. Specifically, we tackle such goal from the perspective of modelling the problem structure as graphs, embedding tools from graph signal processing into decision learning theory. As the first step, we study the application of graph-clustering techniques for RS, in which the curse of dimensionality is addressed by grouping data into clusters via graph-clustering techniques. Next, we exploit spectral graph structure for MAB problems, representing online learning problems. A key challenge is to learn sequentially the unknown bandit vector. Exploiting the smoothness-prior (i.e., bandit vector smooth on a given underpinning graph), we study theoretically the Laplacian-regularized estimator and provide both empirical evidences and theoretical analysis on the benefits of exploiting the graph structure in MABs. Then, we focus on the theoretical understanding of the Laplacian-regularized estimator. To this end, we derive a theoretical error upper bound on the estimator, which illustrates the impact of the alignment between the data and the graph structure as well as the graph spectrum on the estimation accuracy. We then move to RL problems, focusing on the specific problem of learning a proper representation of the state-action (representation learning problem). Motivated by the fact that a good representation should be informative of the value function, we seek a learning algorithm able to preserve continuity between the value function and the representation space. Showing that state values are intrinsically correlated to the state transition dynamic structure and the diffusion of the reward on the MDP graph, we build a new loss function based on the newly defined diffusion distance and we propose a novel method to learn state representation with such desirable property. In summary, in this thesis we address both theoretically and empirically important online sequential learning problems leveraging on the intrinsic data structure, showing the gain of the proposed solutions toward more data-efficient sequential learning strategies

    Online and Statistical Learning in Networks

    Get PDF
    Learning, prediction and identification has been a main topic of interest in science and engineering for many years. Common in all these problems is an agent that receives the data to perform prediction and identification procedures. The agent might process the data individually, or might interact in a network of agents. The goal of this thesis is to address problems that lie at the interface of statistical processing of data, online learning and network science with a focus on developing distributed algorithms. These problems have wide-spread applications in several domains of systems engineering and computer science. Whether in individual or group, the main task of the agent is to understand how to treat data to infer the unknown parameters of the problem. To this end, the first part of this thesis addresses statistical processing of data. We start with the problem of distributed detection in multi-agent networks. In contrast to the existing literature which focuses on asymptotic learning, we provide a finite-time analysis using a notion of Kullback-Leibler cost. We derive bounds on the cost in terms of network size, spectral gap and relative entropy of data distribution. Next, we turn to focus on an inverse-type problem where the network structure is unknown, and the outputs of a dynamics (e.g. consensus dynamics) are given. We propose several network reconstruction algorithms by measuring the network response to the inputs. Our algorithm reconstructs the Boolean structure (i.e., existence and directions of links) of a directed network from a series of dynamical responses. The second part of the thesis centers around online learning where data is received in a sequential fashion. As an example of collaborative learning, we consider the stochastic multi-armed bandit problem in a multi-player network. Players explore a pool of arms with payoffs generated from player-dependent distributions. Pulling an arm, each player only observes a noisy payoff of the chosen arm. The goal is to maximize a global welfare or to find the best global arm. Hence, players exchange information locally to benefit from side observations. We develop a distributed online algorithm with a logarithmic regret with respect to the best global arm, and generalize our results to the case that availability of arms varies over time. We then return to individual online learning where one learner plays against an adversary. We develop a fully adaptive algorithm that takes advantage of a regularity of the sequence of observations, retains worst-case performance guarantees, and performs well against complex benchmarks. Our method competes with dynamic benchmarks in which regret guarantee scales with regularity of the sequence of cost functions and comparators. Notably, the regret bound adapts to the smaller complexity measure in the problem environment

    On nonparametric and semiparametric testing for multivariate linear time series

    Full text link
    We formulate nonparametric and semiparametric hypothesis testing of multivariate stationary linear time series in a unified fashion and propose new test statistics based on estimators of the spectral density matrix. The limiting distributions of these test statistics under null hypotheses are always normal distributions, and they can be implemented easily for practical use. If null hypotheses are false, as the sample size goes to infinity, they diverge to infinity and consequently are consistent tests for any alternative. The approach can be applied to various null hypotheses such as the independence between the component series, the equality of the autocovariance functions or the autocorrelation functions of the component series, the separability of the covariance matrix function and the time reversibility. Furthermore, a null hypothesis with a nonlinear constraint like the conditional independence between the two series can be tested in the same way.Comment: Published in at http://dx.doi.org/10.1214/08-AOS610 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Criticality in Formal Languages and Statistical Physics

    Full text link
    We show that the mutual information between two symbols, as a function of the number of symbols between the two, decays exponentially in any probabilistic regular grammar, but can decay like a power law for a context-free grammar. This result about formal languages is closely related to a well-known result in classical statistical mechanics that there are no phase transitions in dimensions fewer than two. It is also related to the emergence of power-law correlations in turbulence and cosmological inflation through recursive generative processes. We elucidate these physics connections and comment on potential applications of our results to machine learning tasks like training artificial recurrent neural networks. Along the way, we introduce a useful quantity which we dub the rational mutual information and discuss generalizations of our claims involving more complicated Bayesian networks.Comment: Replaced to match final published version. Discussion improved, references adde
    • …
    corecore