5,346 research outputs found
Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes
Lying on the heart of intelligent decision-making systems, how policy is
represented and optimized is a fundamental problem. The root challenge in this
problem is the large scale and the high complexity of policy space, which
exacerbates the difficulty of policy learning especially in real-world
scenarios. Towards a desirable surrogate policy space, recently policy
representation in a low-dimensional latent space has shown its potential in
improving both the evaluation and optimization of policy. The key question
involved in these studies is by what criterion we should abstract the policy
space for desired compression and generalization. However, both the theory on
policy abstraction and the methodology on policy representation learning are
less studied in the literature. In this work, we make very first efforts to
fill up the vacancy. First, we propose a unified policy abstraction theory,
containing three types of policy abstraction associated to policy features at
different levels. Then, we generalize them to three policy metrics that
quantify the distance (i.e., similarity) of policies, for more convenient use
in learning policy representation. Further, we propose a policy representation
learning approach based on deep metric learning. For the empirical study, we
investigate the efficacy of the proposed policy metrics and representations, in
characterizing policy difference and conveying policy generalization
respectively. Our experiments are conducted in both policy optimization and
evaluation problems, containing trust-region policy optimization (TRPO),
diversity-guided evolution strategy (DGES) and off-policy evaluation (OPE).
Somewhat naturally, the experimental results indicate that there is no a
universally optimal abstraction for all downstream learning problems; while the
influence-irrelevance policy abstraction can be a generally preferred choice.Comment: Preprint versio
The Preparation and Bioactivity Research of Agaro-Oligosaccharides
Agaro-oligosaccharides were hydrolytically obtained from agar using hydrochloric acid, citric acid, and cationic exchange resin (solid acid). The FT-IR and NMR data showed that the hydrolysate has the structure of agaro-oligomers. Orthogonal matrix method was applied to optimize the preparation conditions based on α-naphthylamine end-labeled HPLC analysis method. The optimal way for oligosaccharides with different degree of polymerization (DP) was achieved by using solid acid degradation, which could give high yield and avoid solution neutralization process. Agaro-oligosaccharides with high purity were consequently obtained by activated carbon column isolation. Furthermore, the antioxidant and -glucosidase inhibitory activity of three fractions were also investigated. The result indicated that 8 % ethanol-eluted fraction showed highest activity against α-glucosidase with IC50 of 8.84 mg/mL, while 25 % ethanol-eluted fraction possessed excellent antioxidant ability
The Fixed Points of Solutions of Some q
The purpose of this paper is to investigate the fixed points of solutions f(z) of some q-difference equations and obtain some results about the exponents of convergence of fixed points of f(z) and f(qjz)  (j∈N+), q-differences Δqf(z)=f(qz)-f(z), and q-divided differences Δqf(z)/f(z)
Reinforced Lin-Kernighan-Helsgaun Algorithms for the Traveling Salesman Problems
TSP is a classical NP-hard combinatorial optimization problem with many
practical variants. LKH is one of the state-of-the-art local search algorithms
for the TSP. LKH-3 is a powerful extension of LKH that can solve many TSP
variants. Both LKH and LKH-3 associate a candidate set to each city to improve
the efficiency, and have two different methods, -measure and POPMUSIC,
to decide the candidate sets. In this work, we first propose a Variable
Strategy Reinforced LKH (VSR-LKH) algorithm, which incorporates three
reinforcement learning methods (Q-learning, Sarsa, Monte Carlo) with LKH, for
the TSP. We further propose a new algorithm called VSR-LKH-3 that combines the
variable strategy reinforcement learning method with LKH-3 for typical TSP
variants, including the TSP with time windows (TSPTW) and Colored TSP (CTSP).
The proposed algorithms replace the inflexible traversal operations in LKH and
LKH-3 and let the algorithms learn to make a choice at each search step by
reinforcement learning. Both LKH and LKH-3, with either -measure or
POPMUSIC, can be significantly improved by our methods. Extensive experiments
on 236 widely-used TSP benchmarks with up to 85,900 cities demonstrate the
excellent performance of VSR-LKH. VSR-LKH-3 also significantly outperforms the
state-of-the-art heuristics for TSPTW and CTSP.Comment: arXiv admin note: text overlap with arXiv:2107.0687
On the Deficiencies of Some Differential-Difference Polynomials
The characteristic functions of differential-difference polynomials are investigated, and the result can be viewed as a differential-difference analogue of the classic Valiron-Mokhon’ko Theorem in some sense and applied to investigate the deficiencies of some homogeneous or nonhomogeneous differential-difference polynomials. Some special differential-difference polynomials are also investigated and these results on the value distribution can be viewed as differential-difference analogues of some classic results of Hayman and Yang. Examples are given to illustrate our results at the end of this paper
- …