973 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Bridging RL Theory and Practice with the Effective Horizon
Deep reinforcement learning (RL) works impressively in some environments and
fails catastrophically in others. Ideally, RL theory should be able to provide
an understanding of why this is, i.e. bounds predictive of practical
performance. Unfortunately, current theory does not quite have this ability. We
compare standard deep RL algorithms to prior sample complexity prior bounds by
introducing a new dataset, BRIDGE. It consists of 155 MDPs from common deep RL
benchmarks, along with their corresponding tabular representations, which
enables us to exactly compute instance-dependent bounds. We find that prior
bounds do not correlate well with when deep RL succeeds vs. fails, but discover
a surprising property that does. When actions with the highest Q-values under
the random policy also have the highest Q-values under the optimal policy, deep
RL tends to succeed; when they don't, deep RL tends to fail. We generalize this
property into a new complexity measure of an MDP that we call the effective
horizon, which roughly corresponds to how many steps of lookahead search are
needed in order to identify the next optimal action when leaf nodes are
evaluated with random rollouts. Using BRIDGE, we show that the effective
horizon-based bounds are more closely reflective of the empirical performance
of PPO and DQN than prior sample complexity bounds across four metrics. We also
show that, unlike existing bounds, the effective horizon can predict the
effects of using reward shaping or a pre-trained exploration policy
Learning and Control of Dynamical Systems
Despite the remarkable success of machine learning in various domains in recent years, our understanding of its fundamental limitations remains incomplete. This knowledge gap poses a grand challenge when deploying machine learning methods in critical decision-making tasks, where incorrect decisions can have catastrophic consequences. To effectively utilize these learning-based methods in such contexts, it is crucial to explicitly characterize their performance. Over the years, significant research efforts have been dedicated to learning and control of dynamical systems where the underlying dynamics are unknown or only partially known a priori, and must be inferred from collected data. However, much of these classical results have focused on asymptotic guarantees, providing limited insights into the amount of data required to achieve desired control performance while satisfying operational constraints such as safety and stability, especially in the presence of statistical noise.
In this thesis, we study the statistical complexity of learning and control of unknown dynamical systems. By utilizing recent advances in statistical learning theory, high-dimensional statistics, and control theoretic tools, we aim to establish a fundamental understanding of the number of samples required to achieve desired (i) accuracy in learning the unknown dynamics, (ii) performance in the control of the underlying system, and (iii) satisfaction of the operational constraints such as safety and stability. We provide finite-sample guarantees for these objectives and propose efficient learning and control algorithms that achieve the desired performance at these statistical limits in various dynamical systems. Our investigation covers a broad range of dynamical systems, starting from fully observable linear dynamical systems to partially observable linear dynamical systems, and ultimately, nonlinear systems.
We deploy our learning and control algorithms in various adaptive control tasks in real-world control systems and demonstrate their strong empirical performance along with their learning, robustness, and stability guarantees. In particular, we implement one of our proposed methods, Fourier Adaptive Learning and Control (FALCON), on an experimental aerodynamic testbed under extreme turbulent flow dynamics in a wind tunnel. The results show that FALCON achieves state-of-the-art stabilization performance and consistently outperforms conventional and other learning-based methods by at least 37%, despite using 8 times less data. The superior performance of FALCON arises from its physically and theoretically accurate modeling of the underlying nonlinear turbulent dynamics, which yields rigorous finite-sample learning and performance guarantees. These findings underscore the importance of characterizing the statistical complexity of learning and control of unknown dynamical systems.</p
Towards the reduction of greenhouse gas emissions : models and algorithms for ridesharing and carbon capture and storage
Avec la ratification de l'Accord de Paris, les pays se sont engagés à limiter le réchauffement climatique bien en dessous de 2, de préférence à 1,5 degrés Celsius, par rapport aux niveaux préindustriels. À cette fin, les émissions anthropiques de gaz à effet de serre (GES, tels que CO2) doivent être réduites pour atteindre des émissions nettes de carbone nulles d'ici 2050. Cet objectif ambitieux peut être atteint grâce à différentes stratégies d'atténuation des GES, telles que l'électrification, les changements de comportement des consommateurs, l'amélioration de l'efficacité énergétique des procédés, l'utilisation de substituts aux combustibles fossiles (tels que la bioénergie ou l'hydrogène), le captage et le stockage du carbone (CSC), entre autres. Cette thèse vise à contribuer à deux de ces stratégies : le covoiturage (qui appartient à la catégorie des changements de comportement du consommateur) et la capture et le stockage du carbone. Cette thèse fournit des modèles mathématiques et d'optimisation et des algorithmes pour la planification opérationnelle et tactique des systèmes de covoiturage, et des heuristiques pour la planification stratégique d'un réseau de captage et de stockage du carbone.
Dans le covoiturage, les émissions sont réduites lorsque les individus voyagent ensemble au lieu de conduire seuls. Dans ce contexte, cette thèse fournit de nouveaux modèles mathématiques pour représenter les systèmes de covoiturage, allant des problèmes d'affectation stochastique à deux étapes aux problèmes d'empaquetage d'ensembles stochastiques à deux étapes qui peuvent représenter un large éventail de systèmes de covoiturage. Ces modèles aident les décideurs dans leur planification opérationnelle des covoiturages, où les conducteurs et les passagers doivent être jumelés pour le covoiturage à court terme. De plus, cette thèse explore la planification tactique des systèmes de covoiturage en comparant différents modes de fonctionnement du covoiturage et les paramètres de la plateforme (par exemple, le partage des revenus et les pénalités). De nouvelles caractéristiques de problèmes sont étudiées, telles que l'incertitude du conducteur et du passager, la flexibilité de réappariement et la réservation de l'offre de conducteur via les frais de réservation et les pénalités. En particulier, la flexibilité de réappariement peut augmenter l'efficacité d'une plateforme de covoiturage, et la réservation de l'offre de conducteurs via les frais de réservation et les pénalités peut augmenter la satisfaction des utilisateurs grâce à une compensation garantie si un covoiturage n'est pas fourni. Des expériences computationnelles détaillées sont menées et des informations managériales sont fournies.
Malgré la possibilité de réduction des émissions grâce au covoiturage et à d'autres stratégies d'atténuation, des études macroéconomiques mondiales montrent que même si plusieurs stratégies d'atténuation des GES sont utilisées simultanément, il ne sera probablement pas possible d'atteindre des émissions nettes nulles d'ici 2050 sans le CSC. Ici, le CO2 est capturé à partir des sites émetteurs et transporté vers des réservoirs géologiques, où il est injecté pour un stockage à long terme. Cette thèse considère un problème de planification stratégique multipériode pour l'optimisation d'une chaîne de valeur CSC. Ce problème est un problème combiné de localisation des installations et de conception du réseau où une infrastructure CSC est prévue pour les prochaines décennies. En raison des défis informatiques associés à ce problème, une heuristique est introduite, qui est capable de trouver de meilleures solutions qu'un solveur commercial de programmation mathématique, pour une fraction du temps de calcul. Cette heuristique comporte des phases d'intensification et de diversification, une génération améliorée de solutions réalisables par programmation dynamique, et une étape finale de raffinement basée sur un modèle restreint. Dans l'ensemble, les contributions de cette thèse sur le covoiturage et le CSC fournissent des modèles de programmation mathématique, des algorithmes et des informations managériales qui peuvent aider les praticiens et les parties prenantes à planifier des émissions nettes nulles.With the ratification of the Paris Agreement, countries committed to limiting global warming to well below 2, preferably to 1.5 degrees Celsius, compared to pre-industrial levels. To this end, anthropogenic greenhouse gas (GHG) emissions (such as CO2) must be reduced to reach net-zero carbon emissions by 2050. This ambitious target may be met by means of different GHG mitigation strategies, such as electrification, changes in consumer behavior, improving the energy efficiency of processes, using substitutes for fossil fuels (such as bioenergy or hydrogen), and carbon capture and storage (CCS). This thesis aims at contributing to two of these strategies: ridesharing (which belongs to the category of changes in consumer behavior) and carbon capture and storage. This thesis provides mathematical and optimization models and algorithms for the operational and tactical planning of ridesharing systems, and heuristics for the strategic planning of a carbon capture and storage network.
In ridesharing, emissions are reduced when individuals travel together instead of driving alone. In this context, this thesis provides novel mathematical models to represent ridesharing systems, ranging from two-stage stochastic assignment problems to two-stage stochastic set packing problems that can represent a wide variety of ridesharing systems. These models aid decision makers in their operational planning of rideshares, where drivers and riders have to be matched for ridesharing on the short-term. Additionally, this thesis explores the tactical planning of ridesharing systems by comparing different modes of ridesharing operation and platform parameters (e.g., revenue share and penalties). Novel problem characteristics are studied, such as driver and rider uncertainty, rematching flexibility, and reservation of driver supply through booking fees and penalties. In particular, rematching flexibility may increase the efficiency of a ridesharing platform, and the reservation of driver supply through booking fees and penalties may increase user satisfaction through guaranteed compensation if a rideshare is not provided. Extensive computational experiments are conducted and managerial insights are given.
Despite the opportunity to reduce emissions through ridesharing and other mitigation strategies, global macroeconomic studies show that even if several GHG mitigation strategies are used simultaneously, achieving net-zero emissions by 2050 will likely not be possible without CCS. Here, CO2 is captured from emitter sites and transported to geological reservoirs, where it is injected for long-term storage. This thesis considers a multiperiod strategic planning problem for the optimization of a CCS value chain. This problem is a combined facility location and network design problem where a CCS infrastructure is planned for the next decades. Due to the computational challenges associated with that problem, a slope scaling heuristic is introduced, which is capable of finding better solutions than a state-of-the-art general-purpose mathematical programming solver, at a fraction of the computational time. This heuristic has intensification and diversification phases, improved generation of feasible solutions through dynamic programming, and a final refining step based on a restricted model. Overall, the contributions of this thesis on ridesharing and CCS provide mathematical programming models, algorithms, and managerial insights that may help practitioners and stakeholders plan for net-zero emissions
Bayesian Optimal Experimental Design for Constitutive Model Calibration
Computational simulation is increasingly relied upon for high-consequence
engineering decisions, and a foundational element to solid mechanics
simulations, such as finite element analysis (FEA), is a credible constitutive
or material model. Calibration of these complex models is an essential step;
however, the selection, calibration and validation of material models is often
a discrete, multi-stage process that is decoupled from material
characterization activities, which means the data collected does not always
align with the data that is needed. To address this issue, an integrated
workflow for delivering an enhanced characterization and calibration procedure
(Interlaced Characterization and Calibration (ICC)) is introduced. This
framework leverages Bayesian optimal experimental design (BOED) to select the
optimal load path for a cruciform specimen in order to collect the most
informative data for model calibration. The critical first piece of algorithm
development is to demonstrate the active experimental design for a fast model
with simulated data. For this demonstration, a material point simulator that
models a plane stress elastoplastic material subject to bi-axial loading was
chosen. The ICC framework is demonstrated on two exemplar problems in which
BOED is used to determine which load step to take, e.g., in which direction to
increment the strain, at each iteration of the characterization and calibration
cycle. Calibration results from data obtained by adaptively selecting the load
path within the ICC algorithm are compared to results from data generated under
two naive static load paths that were chosen a priori based on human intuition.
In these exemplar problems, data generated in an adaptive setting resulted in
calibrated model parameters with reduced measures of uncertainty compared to
the static settings.Comment: 39 pages, 13 figure
EMERGING APPLICATIONS IN THE MEASUREMENT OF BODY COMPOSITION AND THEIR RELATIONSHIPS TO DISEASE RISK
Ph.D
Recommended from our members
Macroeconomic Expectations and Noisy Memory
A large empirical literature has documented that people often react too much to recent information compared to the rational benchmark. In this thesis, I propose an explanation for overreaction based on the idea of limited memory. Using information-theoretic constraints, I formalize that past knowledge is recalled with random errors (hence the ``noisy memory'').
Since forecasts are not accurately based on past knowledge, revising one’s views more aggressively is optimal. While this mechanism explains overreaction in general, I focus on specific applications in three chapters of this thesis. In the first two chapters, I explore how noisy memory impacts the learning of structural parameters. Specifically, I focus on learning about mean and variance of a stochastic process in each chapter. In the third chapter, I study how noisy memory interacts with conventional information frictions
How Reinforcement Learning can improve Video Games Development: Dreamer and P2E Algorithms in the SheepRL Framework
Artificial Intelligence (AI) in video games is along-standing research area. It studies how to use AI technologies to achieve human-level performance when playing games. For years now, ReinforcementLearning (RL) algorithms have outperformed the best human players in most video games. For this reason, it is interesting to investigate whether RL can still be used in the video game industry or whether the relationship between RL and the video game industry should remain purely academic.
This work focuses on two primary objectives within the video game industry: (i) Testing and Debugging: how RL can be exploited in order to uncover latent bugs, assess game difficulty, and refine the design of the video game. (ii) Non-Playable Characters (NPC) Creation and Generalization: is RL the best strategy to efficiently create NPCs or the RL algorithms have become too advanced?
This thesis explores the feasibility of using the state-of-the-art Dreamer algorithm in automated testing and NPCs creation for video games; in addition, it proposes SheepRL a scalable open source framework for running experiments in a distributed manner
Leading a business school
Business schools are critical players in higher education, educating current and future leaders to make a difference in the world. Yet we know surprisingly little about the leaders of business schools. Leading a Business School demystifies this complex and dynamic role, offering international insights into deans’ dilemmas in different contexts and situations. It highlights the importance of deans creating challenging and supportive learning cultures to enhance business and management education, organizations and society more broadly.
Written by renowned experts on the role of the dean, Julie Davies, Howard Thomas, Eric Cornuel and Rolf D. Cremer, the book traces the historical evolution of the business school deanship, the current challenges and future sources of disruption. The leadership characteristics and styles of business school deans are presented based on an examination of different dimensions of their roles. These include issues of strategic positioning, such as financial viability, prestige, size, mission, age, location and programme portfolios, as well as the influences of rankings, sector accreditations, governance structures, networks and national policies on strategy implementation. Drawing on international case studies and deans’ development programmes globally, the authors explore constraints on deans’ autonomy, university and external relations, and how business school deans add value over the period of their tenures.
This candid and well-researched book is essential reading for aspiring business school leaders, those hiring and working with deans, and other higher education leaders
- …