1,003 research outputs found
Performance modelling for system-level design
xii+208hlm.;24c
Shape-constrained Estimation of Value Functions
We present a fully nonparametric method to estimate the value function, via
simulation, in the context of expected infinite-horizon discounted rewards for
Markov chains. Estimating such value functions plays an important role in
approximate dynamic programming and applied probability in general. We
incorporate "soft information" into the estimation algorithm, such as knowledge
of convexity, monotonicity, or Lipchitz constants. In the presence of such
information, a nonparametric estimator for the value function can be computed
that is provably consistent as the simulated time horizon tends to infinity. As
an application, we implement our method on price tolling agreement contracts in
energy markets
Automatic regenerative simulation via non-reversible simulated tempering
Simulated Tempering (ST) is an MCMC algorithm for complex target
distributions that operates on a path between the target and a more amenable
reference distribution. Crucially, if the reference enables i.i.d. sampling, ST
is regenerative and can be parallelized across independent tours. However, the
difficulty of tuning ST has hindered its widespread adoption. In this work, we
develop a simple nonreversible ST (NRST) algorithm, a general theoretical
analysis of ST, and an automated tuning procedure for ST. A core contribution
that arises from the analysis is a novel performance metric -- Tour
Effectiveness (TE) -- that controls the asymptotic variance of estimates from
ST for bounded test functions. We use the TE to show that NRST dominates its
reversible counterpart. We then develop an automated tuning procedure for NRST
algorithms that targets the TE while minimizing computational cost. This
procedure enables straightforward integration of NRST into existing
probabilistic programming languages. We provide extensive experimental evidence
that our tuning scheme improves the performance and robustness of NRST
algorithms on a diverse set of probabilistic models
On Reward Structures of Markov Decision Processes
A Markov decision process can be parameterized by a transition kernel and a
reward function. Both play essential roles in the study of reinforcement
learning as evidenced by their presence in the Bellman equations. In our
inquiry of various kinds of "costs" associated with reinforcement learning
inspired by the demands in robotic applications, rewards are central to
understanding the structure of a Markov decision process and reward-centric
notions can elucidate important concepts in reinforcement learning.
Specifically, we study the sample complexity of policy evaluation and develop
a novel estimator with an instance-specific error bound of
for estimating a single state value. Under
the online regret minimization setting, we refine the transition-based MDP
constant, diameter, into a reward-based constant, maximum expected hitting
cost, and with it, provide a theoretical explanation for how a well-known
technique, potential-based reward shaping, could accelerate learning with
expert knowledge. In an attempt to study safe reinforcement learning, we model
hazardous environments with irrecoverability and proposed a quantitative notion
of safe learning via reset efficiency. In this setting, we modify a classic
algorithm to account for resets achieving promising preliminary numerical
results. Lastly, for MDPs with multiple reward functions, we develop a planning
algorithm that computationally efficiently finds Pareto-optimal stochastic
policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and
arXiv:2002.06299; minor edit
Recommended from our members
Periodic Little's law
In this dissertation, we develop the theory of the periodic Little's law (PLL) as well as discussing one of its applications. As extensions of the famous Little's law, the PLL applies to the queueing systems where the underlying processes are strictly or asymptotically periodic. We give a sample-path version, a steady-state stochastic version and a central-limit-theorem version of the PLL in the first part. We also discuss closely related issues such as sufficient conditions for the central-limit-theorem version of the PLL and the weak convergence in countably infinite dimensional vector space which is unconventional in queueing theory.
The PLL provides a way to estimate the occupancy level indirectly. We show how to construct a real-time predictor for the occupancy level inspired by the PLL as an example of its applications, which has better forecasting performance than the direct estimators
Advanced Range Estimation for Electric Busses with Physics Informed Machine Learning
Given the growing focus on environmentally sustainable practices and the desire for cost effective solutions, electric buses have caught the eye of many public transportation companies. To make electric buses an ideal addition to a fleet, they must complete required routes in all conditions, making accurate range finding of these buses an invaluable tool. A current approach for range estimation is to develop energy-based models of components and integrate them in a larger model that predicts the overall battery power draw, estimating the remaining range available. Such an analytical model is limited by the variety of extraneous variables affecting the system (traffic, temperature, passenger count), individual components which are difficult to model accurately, as well as finite access to required data and parameters for calibration and verification. In this context, the proposed research aims to improve the state of the art of range estimation for electric vehicles by combining data driven machine learning techniques with physics-based analysis (PBA). This combined model is applied to a case study of the regenerative braking in electric buses. First, a feed forward neural network model was trained to estimate regenerative braking based on available experimental data, then this network was integrated into a physics-based bus model. This implementation was then used to assess the capabilities of the combined model to account for various lapses in data quality, and how the overall accuracy can be improved from using a strictly analytical model. The combined model resulted in a clear improvement of the regenerative braking modeling, and therefore an improvement in the analytical modeling of the electric bus.No embargoAcademic Major: Mechanical Engineerin
Recommended from our members
Data-driven Decisions in Service Systems
This thesis makes contributions to help provide data-driven (or evidence-based) decision support to service systems, especially hospitals. Three selected topics are presented.
First, we discuss how Little's Law, which relates average limits and expected values of stationary distributions, can be applied to service systems data that are collected over a finite time interval. To make inferences based on the indirect estimator of average waiting times, we propose methods for estimating confidence intervals and for adjusting estimates to reduce bias. We show our new methods are effective using simulations and data from a US bank call center.
Second, we address important issues that need to be taken into account when testing whether real arrival data can be modeled by nonhomogeneous Poisson processes (NHPPs). We apply our method to data from a US bank call center and a hospital emergency department and demonstrate that their arrivals come from NHPPs.
Lastly, we discuss an approach to standardize the Intensive Care Unit admission process, which currently lacks a well-defined criteria. Using data from nearly 200,000 hospitalizations, we discuss how we can quantify the impact of Intensive Care Unit admission on individual patient's clinical outcomes. We then use this quantified impact and a stylized model to discuss optimal admission policies. We use simulation to compare the performance of our proposed optimal policies to the current admission policy, and show that the gain can be significant
- …