Search CORE

324,292 research outputs found

Empirical Bernstein Inequalities for U-Statistics

Author: Anthoine Sandrine
Peel Thomas
Ralaivola Liva
Publication venue: HAL CCSD
Publication date: 06/12/2010
Field of study

International audienceWe present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernstein-type inequality for U-statistics derived by Arcones. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient pro- cedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions

HAL AMU

A Transfer Learning Approach for UAV Path Design with Connectivity Outage Constraint

Author: Ahmadi Hamed
Arvaneh Mahnaz
Fontanesi Gianluca
Zhu Anding
Publication venue
Publication date: 07/11/2022
Field of study

The connectivity-aware path design is crucial in the effective deployment of autonomous Unmanned Aerial Vehicles (UAVs). Recently, Reinforcement Learning (RL) algorithms have become the popular approach to solving this type of complex problem, but RL algorithms suffer slow convergence. In this paper, we propose a Transfer Learning (TL) approach, where we use a teacher policy previously trained in an old domain to boost the path learning of the agent in the new domain. As the exploration processes and the training continue, the agent refines the path design in the new domain based on the subsequent interactions with the environment. We evaluate our approach considering an old domain at sub-6 GHz and a new domain at millimeter Wave (mmWave). The teacher path policy, previously trained at sub-6 GHz path, is the solution to a connectivity-aware path problem that we formulate as a constrained Markov Decision Process (CMDP). We employ a Lyapunov-based model-free Deep Q-Network (DQN) to solve the path design at sub-6 GHz that guarantees connectivity constraint satisfaction. We empirically demonstrate the effectiveness of our approach for different urban environment scenarios. The results demonstrate that our proposed approach is capable of reducing the training time considerably at mmWave.Comment: 14 pages,8 figures, journal pape

arXiv.org e-Print Archive

Developing Train Station Parking Algorithms: New Frameworks Based on Fuzzy Reinforcement Learning

Author: Chen Dewang
Li Wei
Xian Kai
Yin Jiateng
Publication venue
Publication date: 03/08/2019
Field of study

Train station parking (TSP) accuracy is important to enhance the efficiency of train operation and the safety of passengers for urban rail transit. However, TSP is always subject to a series of uncertain factors such as extreme weather and uncertain conditions of rail track resistances. To increase the parking accuracy, robustness, and self-learning ability, we propose new train station parking frameworks by using the reinforcement learning (RL) theory combined with the information of balises. Three algorithms were developed, involving a stochastic optimal selection algorithm (SOSA), a Q-learning algorithm (QLA), and a fuzzy function based Q-learning algorithm (FQLA) in order to reduce the parking error in urban rail transit. Meanwhile, five braking rates are adopted as the action vector of the three algorithms and some statistical indices are developed to evaluate parking errors. Simulation results based on real-world data show that the parking errors of the three algorithms are all within the "mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M1"""mml:mrow""mml:mo"±"/mml:mo""/mml:mrow""/mml:math"30cm, which meet the requirement of urban rail transit. Document type: Articl

Scipedia

Recommended from our members

Intelligent and bandwidth-efficient medium access control protocols for IEEE 802.11p-based Vehicular Ad hoc Networks

Author: Pressas Andreas
Publication venue
Publication date: 08/06/2020
Field of study

Vehicle-to-Vehicle (V2V) technology aims to enable safer and more sophisticated transportation via the spontaneous formation of Vehicular Ad hoc Networks (VANETs). This type of wireless networks allows the exchange of kinematic and other data among vehicles, for the primary purpose of safer and more efficient driving, as well as efficient traffic management and other third-party services. Their infrastructure-less, unbounded nature allows the formation of dense networks that present a channel sharing issue, which is harder to tackle than in conventional WLANs. This thesis focuses on optimising channel access strategies, which is important for the efficient usage of the available wireless bandwidth and the successful deployment of VANETs. To start with, the default channel access control method for V2V is evaluated hardware via modifying the appropriate wireless interface Linux driver to enable finer on-the-fly control of IEEE 802.11p access control layer parameters. More complex channel sharing scenarios are evaluated via simulations and findings on the behaviour of the access control mechanism are presented. A complete channel sharing efficiency assessment is conducted, including throughput, fairness and latency measurements. A new IEEE 802.11p-compatible Q-Learning-based access control approach that improves upon the studied protocol is presented. The stations feature algorithms that “learn” how to act optimally in VANETs in order to maximise their achieved packet delivery and minimise bandwidth wastage. The feasibility of Q-Learning to be used as the base of selflearning protocols for IEEE 802.11p-based V2V communication access control in dense environments is investigated in terms of parameter tuning, necessary time of exploration, achieving latency requirements, scaling, multi-hop and accommodation of simultaneous applications. Additionally, the novel Collection Contention Estimation (CCE) mechanism for Q-Learning-based access control is presented. By embedding it on the Q-Learning agents, faster convergence, higher throughput, better service separation and short-term fairness are achieved in simulated network deployments. The acquired new insights on the network performance of the proposed algorithms can provide precise guidelines for efficient designs of practical, reliable, fair and ultra-low latency V2V communication systems for dense topologies. These results can potentially have an impact across a range of related areas, including various types of wireless networks and resource allocation for these, network protocol and transceiver design as well as QLearning applicability and considerations for correct use

Sussex Research Online

SMIX( $\lambda$ ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Author: Tan Xiaoyang
Wang Yuhui
Wen Chao
Yao Xinghu
Publication venue
Publication date: 03/04/2020
Field of study

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This paper proposes an approach, named SMIX(

{\lambda}

), to address the issue using an efficient off-policy centralized training method within a flexible learner search space. As importance sampling for such off-policy training is both computationally costly and numerically unstable, we proposed to use the

{\lambda}

-return as a proxy to compute the TD error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the

{Q(\lambda)}

approach from an unified expectation correction viewpoint, we show that the proposed SMIX(

{\lambda}

) is equivalent to

{Q(\lambda)}

and hence shares its convergence properties, while without being suffered from the aforementioned curse of dimensionality problem inherent in MARL. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark demonstrate that our approach not only outperforms several state-of-the-art MARL methods by a large margin, but also can be used as a general tool to improve the overall performance of other CTDE-type algorithms by enhancing their CVFs

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Author: Gupta Sunil
Nguyen-Tang Thanh
Tran-The Hung
Venkatesh Svetha
Publication venue
Publication date: 11/07/2021
Field of study

We study the statistical theory of offline reinforcement learning (RL) with deep ReLU network function approximation. We analyze a variant of fitted-Q iteration (FQI) algorithm under a new dynamic condition that we call Besov dynamic closure, which encompasses the conditions from prior analyses for deep neural network function approximation. Under Besov dynamic closure, we prove that the FQI-type algorithm enjoys the sample complexity of

\tilde{\mathcal{O}}\left( \kappa^{1 + d/\alpha} \cdot \epsilon^{-2 - 2d/\alpha} \right)

where

\kappa

is a distribution shift measure,

d

is the dimensionality of the state-action space,

\alpha

is the (possibly fractional) smoothness parameter of the underlying MDP, and

\epsilon

is a user-specified precision. This is an improvement over the sample complexity of

\tilde{\mathcal{O}}\left( K \cdot \kappa^{2 + d/\alpha} \cdot \epsilon^{-2 - d/\alpha} \right)

in the prior result [Yang et al., 2019] where

K

is an algorithmic iteration number which is arbitrarily large in practice. Importantly, our sample complexity is obtained under the new general dynamic condition and a data-dependent structure where the latter is either ignored in prior algorithms or improperly handled by prior analyses. This is the first comprehensive analysis for offline RL with deep ReLU network function approximation under a general setting.Comment: A short version published in the ICML Workshop on Reinforcement Learning Theory, 202

arXiv.org e-Print Archive

Oppositional Reinforcement Learning with Applications

Author: Shokri Maryam
Publication venue: 'University of Waterloo'
Publication date: 05/09/2008
Field of study

Machine intelligence techniques contribute to solving real-world problems. Reinforcement learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for the applications, for which the model of the environment is not available to the agent. In real-world applications, intelligent agents generally face a very large state space which limits the usability of reinforcement learning. The condition for convergence of reinforcement learning implies that each state-action pair must be visited infinite times, a condition which can be considered impossible to be satisfied in many practical situations. The goal of this work is to propose a class of new techniques to overcome this problem for off-policy, step-by-step (incremental) and model-free reinforcement learning with discrete state and action space. The focus of this research is using the design characteristics of RL agent to improve its performance regarding the running time while maintaining an acceptable level of accuracy. One way of improving the performance of the intelligent agents is using the model of environment. In this work, a special type of knowledge about the agent actions is employed to improve its performance because in many applications the model of environment may only be known partially or not at all. The concept of opposition is employed in the framework of reinforcement learning to achieve this goal. One of the components of RL agent is the action. For each action we define its associate opposite action. The actions and opposite actions are implemented in the framework of reinforcement learning to update the value function resulting in a faster convergence. At the beginning of this research the concept of opposition is incorporated in the components of reinforcement learning, states, actions, and reinforcement signal which results in introduction of the oppositional target domain estimation algorithm, OTE. OTE reduces the search and navigation area and accelerates the speed of search for a target. The OTE algorithm is limited to the applications, in which the model of the environment is provided for the agent. Hence, further investigation is conducted to extend the concept of opposition to the model-free reinforcement learning algorithms. This extension contributes to the generating of several algorithms based on using the concept of opposition for Q(lambda) technique. The design of reinforcement learning agent depends on the application. The emphasize of this research is on the characteristics of the actions. Hence, the primary challenge of this work is design and incorporation of the opposite actions in the framework of RL agents. In this research, three different applications, namely grid navigation, elevator control problem, and image thresholding are implemented to address this challenge in context of different applications. The design challenges and some solutions to overcome the problems and improve the algorithms are also investigated. The opposition-based Q(lambda) algorithms are tested for the applications mentioned earlier. The general idea behind the opposition-based Q(lambda) algorithms is that in Q-value updating, the agent updates the value of an action in a given state. Hence, if the agent knows the value of opposite action then instead of one value, the agent can update two Q-values at the same time without taking its corresponding opposite action causing an explicit transition to opposite state. If the agent knows both values of action and its opposite action for a given state, then it can update two Q-values. This accelerates the learning process in general and the exploration phase in particular. Several algorithms are outlined in this work. The OQ(lambda) will be introduced to accelerate Q(lambda) algorithm in discrete state spaces. The NOQ(lambda) method is an extension of OQ(lambda) to operate in a broader range of non-deterministic environments. The update of the opposition trace in OQ(lambda) depends on the next state of the opposite action (which generally is not taken by the agent). This limits the usability of this technique to the deterministic environments because the next state should be known to the agent. NOQ(lambda) will be presented to update the opposition trace independent of knowing the next state for the opposite action. The results show the improvement of the performance in terms of running time for the proposed algorithms comparing to the standard Q(lambda) technique

University of Waterloo's Institutional Repository

Storage System Management Using Reinforcement Learning Techniques and Nonlinear Models

Author: Mahootchi Masoud
Publication venue: 'University of Waterloo'
Publication date: 01/01/2009
Field of study

In this thesis, modeling and optimization in the field of storage management under stochastic condition will be investigated using two different methodologies: Simulation Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement Learning (RL), and Nonlinear Modeling Techniques (NMT). For the first set of methods, simulation plays a fundamental role in evaluating the control policy: learning techniques are used to deliver sub-optimal policies at the end of a learning process. These iterative methods use the interaction of agents with the stochastic environment through taking actions and observing different states. To converge to the steady-state condition where policies and value functions do not change significantly with the continuation of the learning process, all or most important states must be visited sufficiently. This might be prohibitively time-consuming for large-scale problems. To make these techniques more efficient both in terms of computation time and robust optimal policies, the idea of Opposition-Based Learning (OBL-Type I and Type II) is employed to modify/extend popular RL techniques including Q-Learning, Q(λ), sarsa, and sarsa(λ). Several new algorithms are developed using this idea. It is also illustrated that, function approximation techniques such as neural networks can contribute to the process of learning. The state-of-the-art implementations usually consider the maximization of expected value of accumulated reward. Extending these techniques to consider risk and solving some well-known control problems are important contributions of this thesis. Furthermore, the new nonlinear modeling for reservoir management using indicator functions and randomized policy introduced by Fletcher and Ponnambalam, is extended to stochastic releases in multi-reservoir systems. In this extension, two different approaches for defining the release policies are proposed. In addition, the main restriction of considering the normal distribution for inflow is relaxed by using a beta-equivalent general distribution. A five-reservoir case study from India is used to demonstrate the benefits of these new developments. Using a warehouse management problem as an example, application of the proposed method to other storage management problems is outlined

University of Waterloo's Institutional Repository