36 research outputs found

    Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

    Full text link
    Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling mechanism used in TRPO is in fact the natural "RL version" of traditional trust-region methods from convex analysis. We first analyze TRPO in the planning setting, in which we have access to the model and the entire state space. Then, we consider sample-based TRPO and establish O~(1/N)\tilde O(1/\sqrt{N}) convergence rate to the global optimum. Importantly, the adaptive scaling mechanism allows us to analyze TRPO in regularized MDPs for which we prove fast rates of O~(1/N)\tilde O(1/N), much like results in convex optimization. This is the first result in RL of better rates when regularizing the instantaneous cost or reward.Comment: Published at AAAI-2020 58 page

    Using Wikipedia to boost collaborative filtering techniques

    Full text link
    One important challenge in the field of recommender systems is the sparsity of available data. This problem limits the ability of recommender systems to provide accurate predictions of user ratings. We overcome this problem by using the publicly available user generated information contained in Wikipedia. We identify similarities between items by mapping them to Wikipedia pages and finding similarities in the text and commonalities in the links and categories of each page. These similarities can be used in the recommendation process and improve ranking predictions. We find that this method is most effective in cases where ratings are extremely sparse or nonexistent. Preliminary experimental results on the MovieLens dataset are encouraging

    Diffusive and Ballistic Transport in Ultra-thin InSb Nanowire Devices Using a Few-layer-Graphene-AlOx Gate

    Full text link
    Quantum devices based on InSb nanowires (NWs) are a prime candidate system for realizing and exploring topologically-protected quantum states and for electrically-controlled spin-based qubits. The influence of disorder on achieving reliable topological regimes has been studied theoretically, highlighting the importance of optimizing both growth and nanofabrication. In this work we investigate both aspects. We developed InSb nanowires with ultra-thin diameters, as well as a new gating approach, involving few-layer graphene (FLG) and Atomic Layer Deposition (ALD)-grown AlOx. Low-temperature electronic transport measurements of these devices reveal conductance plateaus and Fabry-P\'erot interference, evidencing phase-coherent transport in the regime of few quantum modes. The approaches developed in this work could help mitigate the role of material and fabrication-induced disorder in semiconductor-based quantum devices.Comment: 14 pages, 5 figure
    corecore