5,346 research outputs found

    Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

    Full text link
    Lying on the heart of intelligent decision-making systems, how policy is represented and optimized is a fundamental problem. The root challenge in this problem is the large scale and the high complexity of policy space, which exacerbates the difficulty of policy learning especially in real-world scenarios. Towards a desirable surrogate policy space, recently policy representation in a low-dimensional latent space has shown its potential in improving both the evaluation and optimization of policy. The key question involved in these studies is by what criterion we should abstract the policy space for desired compression and generalization. However, both the theory on policy abstraction and the methodology on policy representation learning are less studied in the literature. In this work, we make very first efforts to fill up the vacancy. First, we propose a unified policy abstraction theory, containing three types of policy abstraction associated to policy features at different levels. Then, we generalize them to three policy metrics that quantify the distance (i.e., similarity) of policies, for more convenient use in learning policy representation. Further, we propose a policy representation learning approach based on deep metric learning. For the empirical study, we investigate the efficacy of the proposed policy metrics and representations, in characterizing policy difference and conveying policy generalization respectively. Our experiments are conducted in both policy optimization and evaluation problems, containing trust-region policy optimization (TRPO), diversity-guided evolution strategy (DGES) and off-policy evaluation (OPE). Somewhat naturally, the experimental results indicate that there is no a universally optimal abstraction for all downstream learning problems; while the influence-irrelevance policy abstraction can be a generally preferred choice.Comment: Preprint versio

    The Preparation and Bioactivity Research of Agaro-Oligosaccharides

    Get PDF
    Agaro-oligosaccharides were hydrolytically obtained from agar using hydrochloric acid, citric acid, and cationic exchange resin (solid acid). The FT-IR and NMR data showed that the hydrolysate has the structure of agaro-oligomers. Orthogonal matrix method was applied to optimize the preparation conditions based on α-naphthylamine end-labeled HPLC analysis method. The optimal way for oligosaccharides with different degree of polymerization (DP) was achieved by using solid acid degradation, which could give high yield and avoid solution neutralization process. Agaro-oligosaccharides with high purity were consequently obtained by activated carbon column isolation. Furthermore, the antioxidant and -glucosidase inhibitory activity of three fractions were also investigated. The result indicated that 8 % ethanol-eluted fraction showed highest activity against α-glucosidase with IC50 of 8.84 mg/mL, while 25 % ethanol-eluted fraction possessed excellent antioxidant ability

    The Fixed Points of Solutions of Some q

    Get PDF
    The purpose of this paper is to investigate the fixed points of solutions f(z) of some q-difference equations and obtain some results about the exponents of convergence of fixed points of f(z) and f(qjz)  (j∈N+), q-differences Δqf(z)=f(qz)-f(z), and q-divided differences Δqf(z)/f(z)

    Reinforced Lin-Kernighan-Helsgaun Algorithms for the Traveling Salesman Problems

    Full text link
    TSP is a classical NP-hard combinatorial optimization problem with many practical variants. LKH is one of the state-of-the-art local search algorithms for the TSP. LKH-3 is a powerful extension of LKH that can solve many TSP variants. Both LKH and LKH-3 associate a candidate set to each city to improve the efficiency, and have two different methods, α\alpha-measure and POPMUSIC, to decide the candidate sets. In this work, we first propose a Variable Strategy Reinforced LKH (VSR-LKH) algorithm, which incorporates three reinforcement learning methods (Q-learning, Sarsa, Monte Carlo) with LKH, for the TSP. We further propose a new algorithm called VSR-LKH-3 that combines the variable strategy reinforcement learning method with LKH-3 for typical TSP variants, including the TSP with time windows (TSPTW) and Colored TSP (CTSP). The proposed algorithms replace the inflexible traversal operations in LKH and LKH-3 and let the algorithms learn to make a choice at each search step by reinforcement learning. Both LKH and LKH-3, with either α\alpha-measure or POPMUSIC, can be significantly improved by our methods. Extensive experiments on 236 widely-used TSP benchmarks with up to 85,900 cities demonstrate the excellent performance of VSR-LKH. VSR-LKH-3 also significantly outperforms the state-of-the-art heuristics for TSPTW and CTSP.Comment: arXiv admin note: text overlap with arXiv:2107.0687

    On the Deficiencies of Some Differential-Difference Polynomials

    Get PDF
    The characteristic functions of differential-difference polynomials are investigated, and the result can be viewed as a differential-difference analogue of the classic Valiron-Mokhon’ko Theorem in some sense and applied to investigate the deficiencies of some homogeneous or nonhomogeneous differential-difference polynomials. Some special differential-difference polynomials are also investigated and these results on the value distribution can be viewed as differential-difference analogues of some classic results of Hayman and Yang. Examples are given to illustrate our results at the end of this paper
    • …
    corecore