11,419 research outputs found
Peak Estimation of Time Delay Systems using Occupation Measures
This work proposes a method to compute the maximum value obtained by a state
function along trajectories of a Delay Differential Equation (DDE). An example
of this task is finding the maximum number of infected people in an epidemic
model with a nonzero incubation period. The variables of this peak estimation
problem include the stopping time and the original history (restricted to a
class of admissible histories). The original nonconvex DDE peak estimation
problem is approximated by an infinite-dimensional Linear Program (LP) in
occupation measures, inspired by existing measure-based methods in peak
estimation and optimal control. This LP is approximated from above by a
sequence of Semidefinite Programs (SDPs) through the moment-Sum of Squares
(SOS) hierarchy. Effectiveness of this scheme in providing peak estimates for
DDEs is demonstrated with provided examplesComment: 34 pages, 14 figures, 3 table
Technology for Low Resolution Space Based RSO Detection and Characterisation
Space Situational Awareness (SSA) refers to all activities to detect, identify and track objects in Earth orbit. SSA is critical to all current and future space activities and protect space assets by providing access control, conjunction warnings, and monitoring status of active satellites. Currently SSA methods and infrastructure are not sufficient to account for the proliferations of space debris. In response to the need for better SSA there has been many different areas of research looking to improve SSA most of the requiring dedicated ground or space-based infrastructure. In this thesis, a novel approach for the characterisation of RSO’s (Resident Space Objects) from passive low-resolution space-based sensors is presented with all the background work performed to enable this novel method. Low resolution space-based sensors are common on current satellites, with many of these sensors being in space using them passively to detect RSO’s can greatly augment SSA with out expensive infrastructure or long lead times. One of the largest hurtles to overcome with research in the area has to do with the lack of publicly available labelled data to test and confirm results with. To overcome this hurtle a simulation software, ORBITALS, was created. To verify and validate the ORBITALS simulator it was compared with the Fast Auroral Imager images, which is one of the only publicly available low-resolution space-based images found with auxiliary data. During the development of the ORBITALS simulator it was found that the generation of these simulated images are computationally intensive when propagating the entire space catalog. To overcome this an upgrade of the currently used propagation method, Specialised General Perturbation Method 4th order (SGP4), was performed to allow the algorithm to run in parallel reducing the computational time required to propagate entire catalogs of RSO’s. From the results it was found that the standard facet model with a particle swarm optimisation performed the best estimating an RSO’s attitude with a 0.66 degree RMSE accuracy across a sequence, and ~1% MAPE accuracy for the optical properties. This accomplished this thesis goal of demonstrating the feasibility of low-resolution passive RSO characterisation from space-based platforms in a simulated environment
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Learning in Repeated Multi-Unit Pay-As-Bid Auctions
Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and
Procurement Auctions, which all involve the auctioning of homogeneous multiple
units, we consider the problem of learning how to bid in repeated multi-unit
pay-as-bid auctions. In each of these auctions, a large number of (identical)
items are to be allocated to the largest submitted bids, where the price of
each of the winning bids is equal to the bid itself. The problem of learning
how to bid in pay-as-bid auctions is challenging due to the combinatorial
nature of the action space. We overcome this challenge by focusing on the
offline setting, where the bidder optimizes their vector of bids while only
having access to the past submitted bids by other bidders. We show that the
optimal solution to the offline problem can be obtained using a polynomial time
dynamic programming (DP) scheme. We leverage the structure of the DP scheme to
design online learning algorithms with polynomial time and space complexity
under full information and bandit feedback settings. We achieve an upper bound
on regret of and respectively, where is the number of units demanded by the
bidder, is the total number of auctions, and is the size of
the discretized bid space. We accompany these results with a regret lower
bound, which match the linear dependency in . Our numerical results suggest
that when all agents behave according to our proposed no regret learning
algorithms, the resulting market dynamics mainly converge to a welfare
maximizing equilibrium where bidders submit uniform bids. Lastly, our
experiments demonstrate that the pay-as-bid auction consistently generates
significantly higher revenue compared to its popular alternative, the uniform
price auction.Comment: 51 pages, 12 Figure
An American Knightmare: Joker, Fandom, and Malicious Movie Meaning-Making
This monograph concerns the long-standing communication problem of how individuals can identify and resist the influence of unethical public speakers. Scholarship on the issue of what Socrates & Plato called the “Evil Lover” – i.e., the ill-intended rhetor – began with the Greek philosophers, but has carried into [post]Modern anxieties. For instance, the study of Nazi propaganda machines, and the rhetoric of Hitler himself, rejuvenated interest in the study of speech and communication in the U.S. and Europe. Whereas unscrupulous sophists used lectures and legal forums, and Hitler used a microphone, contemporary Evil Lovers primarily draw on new, internet-related tools to share their malicious influence. These new tools of influence are both more far-reaching and more subtle than the traditional practices of listening to a designated speaker appearing at an overtly political event. Rhetorician Ashley Hinck has recently noted the ways that popular culture – communication about texts which are commonly accessible and shared – are now significant sites through which citizens learn moral and political values. Accordingly, the talk of internet influencers who interpret popular texts for other fans has the potential to constitute strong persuasive power regarding ethics and civic responsibility.
The present work identifies and responds to a particular case example of popular culture text that has been recently, and frequently, leveraged in moral and civic discourses: Todd Phillips’ Joker. Specifically, this study takes a hermeneutic approach to understanding responses, especially those explicitly invoking political ideology, to Joker as a method of examining civic meaning-making. A special emphasis is placed on the online film criticisms of Joker from white nationalist movie fans, who clearly exemplify ways that media responses can be leveraged by unethical speakers (i.e., Evil Lovers) and subtly diffused. The study conveys that these racist movie fans can embed values related to “trolling,” incelism, and xenophobia into otherwise seemingly innocuous talk about film. While the sharing of such speech does not immediately mean its positive reception, this kind of communication yet constitutes a new and understudied attack on democratic values such as justice and equity. The case of white nationalist movie fan film criticism therefore reflects a particular brand of communicative strategy for contemporary Evil Lovers in communicating unethical messages under the covert guise of mundane movie talk
Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
We study robust reinforcement learning (RL) with the goal of determining a
well-performing policy that is robust against model mismatch between the
training simulator and the testing environment. Previous policy-based robust RL
algorithms mainly focus on the tabular setting under uncertainty sets that
facilitate robust policy evaluation, but are no longer tractable when the
number of states scales up. To this end, we propose two novel uncertainty set
formulations, one based on double sampling and the other on an integral
probability metric. Both make large-scale robust RL tractable even when one
only has access to a simulator. We propose a robust natural actor-critic (RNAC)
approach that incorporates the new uncertainty sets and employs function
approximation. We provide finite-time convergence guarantees for the proposed
RNAC algorithm to the optimal robust policy within the function approximation
error. Finally, we demonstrate the robust performance of the policy learned by
our proposed RNAC approach in multiple MuJoCo environments and a real-world
TurtleBot navigation task
Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance
This initial version of this document was written back in 2014 for the sole
purpose of providing fundamentals of reliability theory as well as to identify
the theoretical types of machinery for the prediction of
durability/availability of erasure-coded storage systems. Since the definition
of a "system" is too broad, we specifically focus on warm/cold storage systems
where the data is stored in a distributed fashion across different storage
units with or without continuous operation. The contents of this document are
dedicated to a review of fundamentals, a few major improved stochastic models,
and several contributions of my work relevant to the field. One of the
contributions of this document is the introduction of the most general form of
Markov models for the estimation of mean time to failure. This work was
partially later published in IEEE Transactions on Reliability. Very good
approximations for the closed-form solutions for this general model are also
investigated. Various storage configurations under different policies are
compared using such advanced models. Later in a subsequent chapter, we have
also considered multi-dimensional Markov models to address detached
drive-medium combinations such as those found in optical disk and tape storage
systems. It is not hard to anticipate such a system structure would most likely
be part of future DNA storage libraries. This work is partially published in
Elsevier Reliability and System Safety. Topics that include simulation
modelings for more accurate estimations are included towards the end of the
document by noting the deficiencies of the simplified canonical as well as more
complex Markov models, due mainly to the stationary and static nature of
Markovinity. Throughout the document, we shall focus on concurrently maintained
systems although the discussions will only slightly change for the systems
repaired one device at a time.Comment: 58 pages, 20 figures, 9 tables. arXiv admin note: substantial text
overlap with arXiv:1911.0032
A Deep Neural Network Algorithm for Linear-Quadratic Portfolio Optimization with MGARCH and Small Transaction Costs
We analyze a fixed-point algorithm for reinforcement learning (RL) of optimal
portfolio mean-variance preferences in the setting of multivariate generalized
autoregressive conditional-heteroskedasticity (MGARCH) with a small penalty on
trading. A numerical solution is obtained using a neural network (NN)
architecture within a recursive RL loop. A fixed-point theorem proves that NN
approximation error has a big-oh bound that we can reduce by increasing the
number of NN parameters. The functional form of the trading penalty has a
parameter that controls the magnitude of transaction costs. When
is small, we can implement an NN algorithm based on the expansion of
the solution in powers of . This expansion has a base term equal to a
myopic solution with an explicit form, and a first-order correction term that
we compute in the RL loop. Our expansion-based algorithm is stable, allows for
fast computation, and outputs a solution that shows positive testing
performance
Essays on a study of statistical power in economics.
Knowing the statistical power of an empirical analysis after it is completed can be very useful. Among other things, it can help one determine whether a finding of statistical insignificance is due to a small effect size or insufficient statistical power. This thesis consists of five studies linked together by my attempts to study how best to calculate ex post statistical power. Chapter One provides an introduction and the background for this thesis. In Chapter Two, I detail what is meant by statistical power (including ex ante and ex post power) and why it is important to researchers. I also identify the various factors that affect statistical power.
Chapter Three explains why ex post power has a “bad reputation.” A common practice for calculating ex post power employs an inappropriate method known as “observed power.” “Observed power” uses the estimated effect size as the assumed true effect size and then calculates the associated power. Though widely used, this method has been demonstrated to produce biased estimates of statistical power (Yuan & Maxwell, 2005). I present two approaches for calculating ex post power suggested by researchers to avoid the problems of “observed power”.
Chapter Four begins by replicating a recent paper by Brown, Lambert and Wojan (2019). BLW use a bootstrapping procedure to calculate ex post power and apply it to a benefit-cost analysis of a U.S. conservation program. I reproduce BLW’s results. I call their method for calculating ex post power BLW1. I then propose a variant of their method, which I call BLW2. I use Monte Carlo experiments to compare both methods in a simple data environment where there is no clustering.
In Chapter Five, I detail two more methods for calculating ex post power. The first procedure is taken from a blog post by David McKenzie and Owen Ozier (2019). I call this approach the SE-ES Method (for Standard Error – Effect Size). I then propose yet another variant of the BLW method, BLW3, which uses a wild-cluster bootstrap for handling clustered data. Chapter Five subjects all four methods to an extensive set of Monte Carlo experiments to assess their reliability in calculating ex post statistical power. I find that the SE-ES method is superior to BLW’s method (BLW1, BLW2 and BLW3) and has good overall performance.
Chapter Six applies the SE-ES method to a set of 23 development studies that were funded by the International Initiative for Impact Evaluation (3ie), a non-profit organization that supports research on ways to help the poor in low- and middle-income countries. I analyze the ex post power of these studies and explore factors that may be responsible for differences between ex ante and ex post statistical power. Chapter Seven concludes this thesis. It provides an overview of my chapters, as well as a summary of my main findings
- …