Search CORE

89 research outputs found

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Author: Hambly Ben
Xu Renyuan
Yang Huining
Publication venue
Publication date: 01/01/2021
Field of study

We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems with constraints. We illustrate the performance of the algorithm with two examples. The first example is the optimal liquidation of a holding in an asset. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework and that it is more robust with respect to model mis-specification when compared to a model-based approach. The second example is an LQR system in a higher dimensional setting with synthetic data.Comment: 49 pages, 9 figure

arXiv.org e-Print Archive

Oxford University Research Archive

Policy gradient methods find the Nash equilibrium in N-player general-sum linear-quadratic games

Author: Hambly Benjamin
Xu Renyuan
Yang Huining
Publication venue: Journal of Machine Learning Research
Publication date: 01/04/2023
Field of study

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove convergence of the method we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence

Oxford University Research Archive

Photo-Induced Depolymerisation: Recent Advances and Future Challenges

Author: Boyer Cyrille
Chen Tao
Chu Yingying
Li Jingquan
Wang Huining
Xu Jiangtao
Publication venue: WILEY-V C H VERLAG GMBH
Publication date: 11/09/2019
Field of study

Facing the growing environmental issues provoked by the use of nondegradable polymers in many fields (for example, packing, building, and clothing), tremendous efforts have been made to explore photodegradable materials to alleviate the increase in plastic pollution. Photodegradable materials would exploit significant advantages presented by the use of light, such as abundance, safety and the ability to easily tune intensity and wavelength. In particular, photo-induced depolymerisation has received increasing attention, which could enable polymers to degrade to their original monomers or small molecules under certain photoirradiation conditions. Most importantly, the obtained molecules or monomers via photo-induced depolymerisation could be conveniently recycled or further transformed to other high-value-added products, which is of great benefit for environmental protection. This Review summarizes recent advances in the growing field of photo-induced depolymerisation and also considers future challenges that must be addressed. It aims to encourage new researchers to enter this flourishing area and presents a brief guide to the field

UNSWorks

Correlation model between mesostructure and gradation of asphalt mixture based on statistical method

Author: Bo Liu
Chao Xing
Dawei Wang
Huining Xu
Kai Zhang
Yiqiu Tan
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/01/2023
Field of study

Asphalt mixture has complex gradation and mesostructure. Accurate prediction of the relationship between gradation and mesostructure is of great significance for the establishment of mesostructure numerical simulation model and image-based gradation detection. In this paper, featurization, stepwise regression, econometric hypothesis test are utilized for establishing the predicting models. Firstly, asphalt mixtures with 64 kinds of gradation are scanned by Computed Tomography (CT) to obtain the mesostructure images; Then a series of mesostructure parameters of voids and aggregates are put forward. On this basis, the relationship model between gradation and mesostructure is established and verified by featurization and statistical modeling method. The results show that for predicting the passing percentage of the 4.75 mm sieve and the mean value of average distance between aggregate centroids for 9.5–4.75 mm aggregates, the prediction error of passing percentage is acceptable. It illustrates that the relationship model between gradation and mesostructure established by statistical method is effective, and it is significance for material design and testing under the condition of big data in the future

Directory of Open Access Journals

Recommended from our members

Application of multidisciplinary analysis to gene expression.

Author: Andries Erik (University of New Mexico, Albuquerque, NM)
Ar Kerem (University of New Mexico, Albuquerque, NM)
Cowie Jim R. (New Mexico State University, Las Cruces, NM)
Davidson George S.
Fields Chris (New Mexico State University, Las Cruces, NM)
Haaland David Michael
Helman Paul (University of New Mexico, Albuquerque, NM)
Kang Huining (University of New Mexico, Albuquerque, NM)
Martin Shawn Bryan
Mosquera-Caro Monica P. (University of New Mexico, Albuquerque, NM)
Murphy Maurice H. (University of New Mexico, Albuquerque, NM)
Potter Jeffrey (University of New Mexico, Albuquerque, NM)
Sibirtsev Valeriy (New Mexico State University, Las Cruces, NM)
Wang Xuefel (University of New Mexico, Albuquerque, NM)
Willman Cheryl L. (University of New Mexico, Albuquerque, NM)
Xu Yuexian (University of New Mexico, Albuquerque, NM)
Publication venue: Sandia National Laboratories
Publication date: 01/01/2004
Field of study

Molecular analysis of cancer, at the genomic level, could lead to individualized patient diagnostics and treatments. The developments to follow will signal a significant paradigm shift in the clinical management of human cancer. Despite our initial hopes, however, it seems that simple analysis of microarray data cannot elucidate clinically significant gene functions and mechanisms. Extracting biological information from microarray data requires a complicated path involving multidisciplinary teams of biomedical researchers, computer scientists, mathematicians, statisticians, and computational linguists. The integration of the diverse outputs of each team is the limiting factor in the progress to discover candidate genes and pathways associated with the molecular biology of cancer. Specifically, one must deal with sets of significant genes identified by each method and extract whatever useful information may be found by comparing these different gene lists. Here we present our experience with such comparisons, and share methods developed in the analysis of an infant leukemia cohort studied on Affymetrix HG-U95A arrays. In particular, spatial gene clustering, hyper-dimensional projections, and computational linguistics were used to compare different gene lists. In spatial gene clustering, different gene lists are grouped together and visualized on a three-dimensional expression map, where genes with similar expressions are co-located. In another approach, projections from gene expression space onto a sphere clarify how groups of genes can jointly have more predictive power than groups of individually selected genes. Finally, online literature is automatically rearranged to present information about genes common to multiple groups, or to contrast the differences between the lists. The combination of these methods has improved our understanding of infant leukemia. While the complicated reality of the biology dashed our initial, optimistic hopes for simple answers from microarrays, we have made progress by combining very different analytic approaches

UNT Digital Library

Investigation of the pulse dynamics in fiber lasers

Author: Xu Huining
Publication venue
Publication date: 01/01/2014
Field of study

Nowadays fiber laser has been used in many fields including material processing, telecommunications, spectroscopy, medicine and directed energy weapons [1]. One of the most common used designs for fiber laser is mode-locked fiber laser. It is able to generate ultra-short pulse via active or passive mode locking. In this paper we will make use of this mode-locked fiber laser with nonlinear polarization rotation technique to do some numeric simulation and experiment measurements. Ultra-short pulse is the pulse generated by the optical fiber and it has attracted many attentions in recent years. Self-similar pulse is a kind of ultra-short pulse. It is a live solution for the Nonlinear Schrodinger Equation. In our numerical simulation we will use split step method to solve the nonlinear Schrodinger equation and generate a pulse with self-similarity property. For the theory part we will introduce some concepts of nonlinear fiber optics basic parameters. For simulation part we use Matlab to simulate pulse transmission inside the mode lock fiber laser. Through this numeric result we can found out some specific characteristics of the self-similar pulse. Lastly we will use experiment to verify and measure the spectrum and waveform of a mode lock fiber laser.Bachelor of Engineerin

DR-NTU (Digital Repository of NTU)

Study on the Parametric of Polynomial Motion Law of the Symmetrical Cam Follower Lifting Profile

Author: Wang Huining
Xu Fang
Zhou Zhigang
Publication venue: Editorial Office of Journal of Mechanical Transmission
Publication date: 01/01/2016
Field of study

Aiming at the problem of the largest quasi- velocity and quasi- acceleration of the polynomial motion law of only considering the boundary conditions,the lift profile of the cam follower is divided into two single convex spline curve,the sensitive parameters that affects the acceleration curve and the control point that can control the acceleration curve in the spline boundary point are found by studying the quasi- displacement curve,the one mapping quasi- velocity graph and the quadratic mapping quasi- acceleration graph. The method of symbolic- graphic combination are used to establish the parametric model of polynomial motion law of the lift profile of the symmetrical cam follower and the optimal standard displacement equation are obtained in different conditions by this model. Taking the optimal standard displacement equation as the motion law of the cam follower lift curve,the designers can obtain the cam curve meeting the requirement of minimum lift acceleration of follower in the condition of just knowing the pushing angle and lift range

Directory of Open Access Journals

Equilibrium customers strategies in the Markovian working vacation queue with setup times

Author: Huining Wang
Shuo Wang
Xiuli Xu
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2019
Field of study

Crossref

Recent advances in reinforcement learning in finance

Author: Hambly Benjamin
Xu Renyuan
Yang Huining
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 27/02/2023
Field of study

The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value-based and policy-based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. We then discuss in detail the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising. Our survey concludes by pointing out a few possible future directions for research

arXiv.org e-Print Archive

Oxford University Research Archive

Effects of freeze-thaw cycles on fatigue performance of asphalt mixture and development of fatigue-freeze-thaw (FFT) uniform equation

Author: Fan Zepeng
Tan Yiqiu
Xiao Jiazhe
Xu Huining
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Publikationsserver der RWTH Aachen University