1,003 research outputs found

    Performance modelling for system-level design

    Get PDF
    xii+208hlm.;24c

    Shape-constrained Estimation of Value Functions

    Full text link
    We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming and applied probability in general. We incorporate "soft information" into the estimation algorithm, such as knowledge of convexity, monotonicity, or Lipchitz constants. In the presence of such information, a nonparametric estimator for the value function can be computed that is provably consistent as the simulated time horizon tends to infinity. As an application, we implement our method on price tolling agreement contracts in energy markets

    Automatic regenerative simulation via non-reversible simulated tempering

    Full text link
    Simulated Tempering (ST) is an MCMC algorithm for complex target distributions that operates on a path between the target and a more amenable reference distribution. Crucially, if the reference enables i.i.d. sampling, ST is regenerative and can be parallelized across independent tours. However, the difficulty of tuning ST has hindered its widespread adoption. In this work, we develop a simple nonreversible ST (NRST) algorithm, a general theoretical analysis of ST, and an automated tuning procedure for ST. A core contribution that arises from the analysis is a novel performance metric -- Tour Effectiveness (TE) -- that controls the asymptotic variance of estimates from ST for bounded test functions. We use the TE to show that NRST dominates its reversible counterpart. We then develop an automated tuning procedure for NRST algorithms that targets the TE while minimizing computational cost. This procedure enables straightforward integration of NRST into existing probabilistic programming languages. We provide extensive experimental evidence that our tuning scheme improves the performance and robustness of NRST algorithms on a diverse set of probabilistic models

    On Reward Structures of Markov Decision Processes

    Full text link
    A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications, rewards are central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning. Specifically, we study the sample complexity of policy evaluation and develop a novel estimator with an instance-specific error bound of O~(Ï„sn)\tilde{O}(\sqrt{\frac{\tau_s}{n}}) for estimating a single state value. Under the online regret minimization setting, we refine the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provide a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, we model hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, we modify a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, we develop a planning algorithm that computationally efficiently finds Pareto-optimal stochastic policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and arXiv:2002.06299; minor edit

    Twentieth conference on stochastic processes and their applications

    Get PDF

    Advanced Range Estimation for Electric Busses with Physics Informed Machine Learning

    Get PDF
    Given the growing focus on environmentally sustainable practices and the desire for cost effective solutions, electric buses have caught the eye of many public transportation companies. To make electric buses an ideal addition to a fleet, they must complete required routes in all conditions, making accurate range finding of these buses an invaluable tool. A current approach for range estimation is to develop energy-based models of components and integrate them in a larger model that predicts the overall battery power draw, estimating the remaining range available. Such an analytical model is limited by the variety of extraneous variables affecting the system (traffic, temperature, passenger count), individual components which are difficult to model accurately, as well as finite access to required data and parameters for calibration and verification. In this context, the proposed research aims to improve the state of the art of range estimation for electric vehicles by combining data driven machine learning techniques with physics-based analysis (PBA). This combined model is applied to a case study of the regenerative braking in electric buses. First, a feed forward neural network model was trained to estimate regenerative braking based on available experimental data, then this network was integrated into a physics-based bus model. This implementation was then used to assess the capabilities of the combined model to account for various lapses in data quality, and how the overall accuracy can be improved from using a strictly analytical model. The combined model resulted in a clear improvement of the regenerative braking modeling, and therefore an improvement in the analytical modeling of the electric bus.No embargoAcademic Major: Mechanical Engineerin
    • …
    corecore