89 research outputs found

    Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

    Full text link
    We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems with constraints. We illustrate the performance of the algorithm with two examples. The first example is the optimal liquidation of a holding in an asset. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework and that it is more robust with respect to model mis-specification when compared to a model-based approach. The second example is an LQR system in a higher dimensional setting with synthetic data.Comment: 49 pages, 9 figure

    Policy gradient methods find the Nash equilibrium in N-player general-sum linear-quadratic games

    Get PDF
    We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove convergence of the method we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence

    Photo-Induced Depolymerisation: Recent Advances and Future Challenges

    Full text link
    Facing the growing environmental issues provoked by the use of nondegradable polymers in many fields (for example, packing, building, and clothing), tremendous efforts have been made to explore photodegradable materials to alleviate the increase in plastic pollution. Photodegradable materials would exploit significant advantages presented by the use of light, such as abundance, safety and the ability to easily tune intensity and wavelength. In particular, photo-induced depolymerisation has received increasing attention, which could enable polymers to degrade to their original monomers or small molecules under certain photoirradiation conditions. Most importantly, the obtained molecules or monomers via photo-induced depolymerisation could be conveniently recycled or further transformed to other high-value-added products, which is of great benefit for environmental protection. This Review summarizes recent advances in the growing field of photo-induced depolymerisation and also considers future challenges that must be addressed. It aims to encourage new researchers to enter this flourishing area and presents a brief guide to the field

    Correlation model between mesostructure and gradation of asphalt mixture based on statistical method

    Get PDF
    Asphalt mixture has complex gradation and mesostructure. Accurate prediction of the relationship between gradation and mesostructure is of great significance for the establishment of mesostructure numerical simulation model and image-based gradation detection. In this paper, featurization, stepwise regression, econometric hypothesis test are utilized for establishing the predicting models. Firstly, asphalt mixtures with 64 kinds of gradation are scanned by Computed Tomography (CT) to obtain the mesostructure images; Then a series of mesostructure parameters of voids and aggregates are put forward. On this basis, the relationship model between gradation and mesostructure is established and verified by featurization and statistical modeling method. The results show that for predicting the passing percentage of the 4.75 mm sieve and the mean value of average distance between aggregate centroids for 9.5–4.75 mm aggregates, the prediction error of passing percentage is acceptable. It illustrates that the relationship model between gradation and mesostructure established by statistical method is effective, and it is significance for material design and testing under the condition of big data in the future

    Investigation of the pulse dynamics in fiber lasers

    No full text
    Nowadays fiber laser has been used in many fields including material processing, telecommunications, spectroscopy, medicine and directed energy weapons [1]. One of the most common used designs for fiber laser is mode-locked fiber laser. It is able to generate ultra-short pulse via active or passive mode locking. In this paper we will make use of this mode-locked fiber laser with nonlinear polarization rotation technique to do some numeric simulation and experiment measurements. Ultra-short pulse is the pulse generated by the optical fiber and it has attracted many attentions in recent years. Self-similar pulse is a kind of ultra-short pulse. It is a live solution for the Nonlinear Schrodinger Equation. In our numerical simulation we will use split step method to solve the nonlinear Schrodinger equation and generate a pulse with self-similarity property. For the theory part we will introduce some concepts of nonlinear fiber optics basic parameters. For simulation part we use Matlab to simulate pulse transmission inside the mode lock fiber laser. Through this numeric result we can found out some specific characteristics of the self-similar pulse. Lastly we will use experiment to verify and measure the spectrum and waveform of a mode lock fiber laser.Bachelor of Engineerin

    Study on the Parametric of Polynomial Motion Law of the Symmetrical Cam Follower Lifting Profile

    No full text
    Aiming at the problem of the largest quasi- velocity and quasi- acceleration of the polynomial motion law of only considering the boundary conditions,the lift profile of the cam follower is divided into two single convex spline curve,the sensitive parameters that affects the acceleration curve and the control point that can control the acceleration curve in the spline boundary point are found by studying the quasi- displacement curve,the one mapping quasi- velocity graph and the quadratic mapping quasi- acceleration graph. The method of symbolic- graphic combination are used to establish the parametric model of polynomial motion law of the lift profile of the symmetrical cam follower and the optimal standard displacement equation are obtained in different conditions by this model. Taking the optimal standard displacement equation as the motion law of the cam follower lift curve,the designers can obtain the cam curve meeting the requirement of minimum lift acceleration of follower in the condition of just knowing the pushing angle and lift range

    Recent advances in reinforcement learning in finance

    No full text
    The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value-based and policy-based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. We then discuss in detail the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising. Our survey concludes by pointing out a few possible future directions for research
    • …
    corecore