89 research outputs found
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
We explore reinforcement learning methods for finding the optimal policy in
the linear quadratic regulator (LQR) problem. In particular, we consider the
convergence of policy gradient methods in the setting of known and unknown
parameters. We are able to produce a global linear convergence guarantee for
this approach in the setting of finite time horizon and stochastic state
dynamics under weak assumptions. The convergence of a projected policy gradient
method is also established in order to handle problems with constraints. We
illustrate the performance of the algorithm with two examples. The first
example is the optimal liquidation of a holding in an asset. We show results
for the case where we assume a model for the underlying dynamics and where we
apply the method to the data directly. The empirical evidence suggests that the
policy gradient method can learn the global optimal solution for a larger class
of stochastic systems containing the LQR framework and that it is more robust
with respect to model mis-specification when compared to a model-based
approach. The second example is an LQR system in a higher dimensional setting
with synthetic data.Comment: 49 pages, 9 figure
Policy gradient methods find the Nash equilibrium in N-player general-sum linear-quadratic games
We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove convergence of the method we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence
Photo-Induced Depolymerisation: Recent Advances and Future Challenges
Facing the growing environmental issues provoked by the use of nondegradable polymers in many fields (for example, packing, building, and clothing), tremendous efforts have been made to explore photodegradable materials to alleviate the increase in plastic pollution. Photodegradable materials would exploit significant advantages presented by the use of light, such as abundance, safety and the ability to easily tune intensity and wavelength. In particular, photo-induced depolymerisation has received increasing attention, which could enable polymers to degrade to their original monomers or small molecules under certain photoirradiation conditions. Most importantly, the obtained molecules or monomers via photo-induced depolymerisation could be conveniently recycled or further transformed to other high-value-added products, which is of great benefit for environmental protection. This Review summarizes recent advances in the growing field of photo-induced depolymerisation and also considers future challenges that must be addressed. It aims to encourage new researchers to enter this flourishing area and presents a brief guide to the field
Correlation model between mesostructure and gradation of asphalt mixture based on statistical method
Asphalt mixture has complex gradation and mesostructure. Accurate prediction of the relationship between gradation and mesostructure is of great significance for the establishment of mesostructure numerical simulation model and image-based gradation detection. In this paper, featurization, stepwise regression, econometric hypothesis test are utilized for establishing the predicting models. Firstly, asphalt mixtures with 64 kinds of gradation are scanned by Computed Tomography (CT) to obtain the mesostructure images; Then a series of mesostructure parameters of voids and aggregates are put forward. On this basis, the relationship model between gradation and mesostructure is established and verified by featurization and statistical modeling method. The results show that for predicting the passing percentage of the 4.75 mm sieve and the mean value of average distance between aggregate centroids for 9.5–4.75 mm aggregates, the prediction error of passing percentage is acceptable. It illustrates that the relationship model between gradation and mesostructure established by statistical method is effective, and it is significance for material design and testing under the condition of big data in the future
Recommended from our members
Application of multidisciplinary analysis to gene expression.
Molecular analysis of cancer, at the genomic level, could lead to individualized patient diagnostics and treatments. The developments to follow will signal a significant paradigm shift in the clinical management of human cancer. Despite our initial hopes, however, it seems that simple analysis of microarray data cannot elucidate clinically significant gene functions and mechanisms. Extracting biological information from microarray data requires a complicated path involving multidisciplinary teams of biomedical researchers, computer scientists, mathematicians, statisticians, and computational linguists. The integration of the diverse outputs of each team is the limiting factor in the progress to discover candidate genes and pathways associated with the molecular biology of cancer. Specifically, one must deal with sets of significant genes identified by each method and extract whatever useful information may be found by comparing these different gene lists. Here we present our experience with such comparisons, and share methods developed in the analysis of an infant leukemia cohort studied on Affymetrix HG-U95A arrays. In particular, spatial gene clustering, hyper-dimensional projections, and computational linguistics were used to compare different gene lists. In spatial gene clustering, different gene lists are grouped together and visualized on a three-dimensional expression map, where genes with similar expressions are co-located. In another approach, projections from gene expression space onto a sphere clarify how groups of genes can jointly have more predictive power than groups of individually selected genes. Finally, online literature is automatically rearranged to present information about genes common to multiple groups, or to contrast the differences between the lists. The combination of these methods has improved our understanding of infant leukemia. While the complicated reality of the biology dashed our initial, optimistic hopes for simple answers from microarrays, we have made progress by combining very different analytic approaches
Investigation of the pulse dynamics in fiber lasers
Nowadays fiber laser has been used in many fields including material processing,
telecommunications, spectroscopy, medicine and directed energy weapons [1].
One of the most common used designs for fiber laser is mode-locked fiber laser. It is
able to generate ultra-short pulse via active or passive mode locking. In this paper we
will make use of this mode-locked fiber laser with nonlinear polarization rotation
technique to do some numeric simulation and experiment measurements.
Ultra-short pulse is the pulse generated by the optical fiber and it has attracted many
attentions in recent years. Self-similar pulse is a kind of ultra-short pulse. It is a live
solution for the Nonlinear Schrodinger Equation. In our numerical simulation we will
use split step method to solve the nonlinear Schrodinger equation and generate a
pulse with self-similarity property.
For the theory part we will introduce some concepts of nonlinear fiber optics basic
parameters. For simulation part we use Matlab to simulate pulse transmission inside
the mode lock fiber laser. Through this numeric result we can found out some
specific characteristics of the self-similar pulse. Lastly we will use experiment to
verify and measure the spectrum and waveform of a mode lock fiber laser.Bachelor of Engineerin
Study on the Parametric of Polynomial Motion Law of the Symmetrical Cam Follower Lifting Profile
Aiming at the problem of the largest quasi- velocity and quasi- acceleration of the polynomial motion law of only considering the boundary conditions,the lift profile of the cam follower is divided into two single convex spline curve,the sensitive parameters that affects the acceleration curve and the control point that can control the acceleration curve in the spline boundary point are found by studying the quasi- displacement curve,the one mapping quasi- velocity graph and the quadratic mapping quasi- acceleration graph. The method of symbolic- graphic combination are used to establish the parametric model of polynomial motion law of the lift profile of the symmetrical cam follower and the optimal standard displacement equation are obtained in different conditions by this model. Taking the optimal standard displacement equation as the motion law of the cam follower lift curve,the designers can obtain the cam curve meeting the requirement of minimum lift acceleration of follower in the condition of just knowing the pushing angle and lift range
Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and
computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions,
new developments from reinforcement learning (RL) are able to make full use of the large amount
of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in
finance. We give an introduction to Markov decision processes, which is the setting for many of the
commonly used RL approaches. Various algorithms are then introduced with a focus on value-based
and policy-based methods that do not require any model assumptions. Connections are made with
neural networks to extend the framework to encompass deep RL algorithms. We then discuss in
detail the application of these RL algorithms in a variety of decision-making problems in finance,
including optimal execution, portfolio optimization, option pricing and hedging, market making,
smart order routing, and robo-advising. Our survey concludes by pointing out a few possible future
directions for research
- …