312 research outputs found

    On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

    Get PDF
    There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (LRI)-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the LRI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608-620, 2011). The key innovation of this paper is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are ε{lunate}-optimal in stationary environments

    The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

    Get PDF
    The fundamental phenomenon that has been used to enhance the convergence speed of learning automata (LA) is that of incorporating the running maximum likelihood (ML) estimates of the action reward probabilities into the probability updating rules for selecting the actions. The frontiers of this field have been recently expanded by replacing the ML estimates with their corresponding Bayesian counterparts that incorporate the properties of the conjugate priors. These constitute the Bayesian pursuit algorithm (BPA), and the discretized Bayesian pursuit algorithm. Although these algorithms have been designed and efficiently implemented, and are, arguably, the fastest and most accurate LA reported in the literature, the proofs of their ϵϵ-optimal convergence has been unsolved. This is precisely the intent of this paper. In this paper, we present a single unifying analysis by which the proofs of both the continuous and discretized schemes are proven. We emphasize that unlike the ML-based pursuit schemes, the Bayesian schemes have to not only consider the estimates themselves but also the distributional forms of their conjugate posteriors and their higher order moments—all of which render the proofs to be particularly challenging. As far as we know, apart from the results themselves, the methodologies of this proof have been unreported in the literature—they are both pioneering and novel

    On the Theory and Applications of Hierarchical Learning Automata and Object Migration Automata

    Get PDF
    Paper III, IV and VIII are excluded due to copyright.The paradigm of Artificial Intelligence (AI) and the sub-group of Machine Learning (ML) have attracted exponential interest in our society in recent years. The domain of ML contains numerous methods, and it is desirable (and in one sense, mandatory) that these methods are applicable and valuable to real-life challenges. Learning Automata (LA) is an intriguing and classical direction within ML. In LA, non-human agents can find optimal solutions to various problems through the concept of learning. The LA instances learn through Agent-Environment interactions, where advantageous behavior is rewarded, and disadvantageous behavior is penalized. Consequently, the agent eventually, and hopefully, learns the optimal action from a set of actions. LA has served as a foundation for Reinforcement Learning (RL), and the field of LA has been studied for decades. However, many improvements can still be made to render these algorithms to be even more convenient and effective. In this dissertation, we record our research contributions to the design and applications within the field of LA. Our research includes novel improvements to the domain of hierarchical LA, major advancements to the family of Object Migration Automata (OMA) algorithms, and a novel application of LA, where it was utilized to solve challenges in a mobile radio communication system. More specifically, we introduced the novel Hierarchical Discrete Pursuit Automaton (HDPA), which significantly improved the state of the art in terms of effectiveness for problems with high accuracy requirements, and we mathematically proved its ϵ-optimality. In addition, we proposed the Action Distribution Enhanced (ADE) approach to hierarchical LA schemes which significantly reduced the number of iterations required before the machines reached convergence. The existing schemes in the OMA paradigm, which are able to solve partitioning problems, could only solve problems with equally-sized partitions. Therefore, we proposed two novel methods that could handle unequally-sized partitions. In addition, we rigorously summarized the OMA domain, outlined its potential benefits to society, and listed further development cases for future researchers in the field. With regard to applications, we proposed an OMA-based approach to the grouping and power allocation in Non-orthogonal Multiple Access (NOMA) systems, demonstrating the applicability of the OMA and its advantage in solving fairly complicated stochastic problems. The details of these contributions and their published scientific impacts will be summarized in this dissertation, before we present some of the research contributions in their entirety.publishedVersio

    Towards Thompson Sampling for Complex Bayesian Reasoning

    Get PDF
    Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

    Intelligent Learning Automata-based Strategies Applied to Personalized Service Provisioning in Pervasive Environments

    Get PDF
    Doktorgradsavhandling i informasjons- og kommunikasjonsteknologi, Universitetet i Agder, Grimstad, 201

    On Novel Variants of the Hierarchical Stochastic Searching on the Line

    Get PDF
    Master's thesis Mechatronics MAS500 - University of Agder, 2012Konfidensiell til / confidential until 01.07.201

    D4.2 Intelligent D-Band wireless systems and networks initial designs

    Get PDF
    This deliverable gives the results of the ARIADNE project's Task 4.2: Machine Learning based network intelligence. It presents the work conducted on various aspects of network management to deliver system level, qualitative solutions that leverage diverse machine learning techniques. The different chapters present system level, simulation and algorithmic models based on multi-agent reinforcement learning, deep reinforcement learning, learning automata for complex event forecasting, system level model for proactive handovers and resource allocation, model-driven deep learning-based channel estimation and feedbacks as well as strategies for deployment of machine learning based solutions. In short, the D4.2 provides results on promising AI and ML based methods along with their limitations and potentials that have been investigated in the ARIADNE project

    A Robotic System for Learning Visually-Driven Grasp Planning (Dissertation Proposal)

    Get PDF
    We use findings in machine learning, developmental psychology, and neurophysiology to guide a robotic learning system\u27s level of representation both for actions and for percepts. Visually-driven grasping is chosen as the experimental task since it has general applicability and it has been extensively researched from several perspectives. An implementation of a robotic system with a gripper, compliant instrumented wrist, arm and vision is used to test these ideas. Several sensorimotor primitives (vision segmentation and manipulatory reflexes) are implemented in this system and may be thought of as the innate perceptual and motor abilities of the system. Applying empirical learning techniques to real situations brings up such important issues as observation sparsity in high-dimensional spaces, arbitrary underlying functional forms of the reinforcement distribution and robustness to noise in exemplars. The well-established technique of non-parametric projection pursuit regression (PPR) is used to accomplish reinforcement learning by searching for projections of high-dimensional data sets that capture task invariants. We also pursue the following problem: how can we use human expertise and insight into grasping to train a system to select both appropriate hand preshapes and approaches for a wide variety of objects, and then have it verify and refine its skills through trial and error. To accomplish this learning we propose a new class of Density Adaptive reinforcement learning algorithms. These algorithms use statistical tests to identify possibly interesting regions of the attribute space in which the dynamics of the task change. They automatically concentrate the building of high resolution descriptions of the reinforcement in those areas, and build low resolution representations in regions that are either not populated in the given task or are highly uniform in outcome. Additionally, the use of any learning process generally implies failures along the way. Therefore, the mechanics of the untrained robotic system must be able to tolerate mistakes during learning and not damage itself. We address this by the use of an instrumented, compliant robot wrist that controls impact forces
    • …
    corecore