1,933 research outputs found
Right Place, Right Time:Proactive Multi-Robot Task Allocation Under Spatiotemporal Uncertainty
For many multi-robot problems, tasks are announced during execution, where task announcement times and locations are uncertain. To synthesise multi-robot behaviour that is robust to early announcements and unexpected delays, multi-robot task allocation methods must explicitly model the stochastic processes that govern task announcement. In this paper, we model task announcement using continuous-time Markov chains which predict when and where tasks will be announced. We then present a task allocation framework which uses the continuous-time Markov chains to allocate tasks proactively, such that robots are near or at the task location upon its announcement. Our method seeks to minimise the expected total waiting duration for each task, i.e. the duration between task announcement and a robot beginning to service the task. Our framework can be applied to any multi-robot task allocation problem where robots complete spatiotemporal tasks which are announced stochastically. We demonstrate the efficacy of our approach in simulation, where we outperform baselines which do not allocate tasks proactively, or do not fully exploit our task announcement models
Exploring the calibration of cosmological probes used in gravitational-wave and multi-messenger astronomy
The field of gravitational wave astronomy has grown remarkably since the first direct detection of gravitational waves on 14th September 2015. The signal, originating from the merger of two black holes, was detected by the two US-based Advanced LIGO interferometers in Hanford (Washington State) and Livingston (Louisiana). The second observing run of the Advanced LIGO and Virgo detectors marked the first detection of a binary neutron star merger, along with its electromagnetic counterparts. The optical follow-up of the merger led to the first confirmed observations of a kilonova, an electromagnetic counterpart to binary neutron star and neutron star-black hole mergers whose existence was first predicted in 1970s. Following the multimessenger observations of the binary neutron star merger GW170817, constraints were put on the rate of expansion of the Universe using both gravitational wave and electromagnetic data. These measurements could help us understand the current tension between early-Universe and late-Universe measurements of the Hubble constant H0. The use of gravitational wave signals for measuring the rate of expansion of the Universe was proposed by Schutz in 1986. Compact binary coalescences can be used as distance markers, a gravitational wave analogue to standard candles: "Standard Sirens". Measurements of the Hubble constant from standard sirens are independent from previous methods of constraining H0. Bright sirens are gravitational wave signals that are detected coincidentally with electromagnetic signatures. These "bright" gravitational wave sirens are powerful cosmological probes, allowing us to extract information on both the distance and the redshift of the source. It is therefore important to maximise these coincident detections, and to carefully calibrate the data extracted from any standard siren. The work presented in this thesis can be divided into three main topics, all under the umbrella of maximising scientific returns from observations of compact binary coalescences. These three topics are: kilonova parameter estimation, cosmology with gravitational waves, and calibration of advanced gravitational wave detectors. We present work on inferring parameters from kilonova light curves. Ejecta parameters and information about the merging time of the progenitor is extracted from simulated kilonova light curves. We explore the consequence of neglecting some aspects of microphysics on the resulting parameter estimation. We also present new results on the inference of the Hubble constant through the application of a robust test of galaxy catalogue completeness to the current gravitational wave cosmology pipeline. We explore the impact of adopting a robust estimate of the apparent magnitude threshold mthr for the galaxy catalogues used in gravitational wave cosmology on the final inference of the Hubble constant H0 from standard sirens, and compare the results to those obtained when adopting a conservative estimate for mthr. Finally, we present the first results from the prototype of a Newtonian Calibrator at the LIGO Hanford detector. Calibrating the LIGO detectors is crucial to the extraction of the gravitational wave source parameters that are used in cosmology with standard sirens
Optimal speed trajectory and energy management control for connected and automated vehicles
Connected and automated vehicles (CAVs) emerge as a promising solution to improve urban mobility, safety, energy efficiency, and passenger comfort with the development of communication technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I). This thesis proposes several control approaches for CAVs with electric powertrains, including hybrid electric vehicles (HEVs) and battery electric vehicles (BEVs), with the main objective to improve energy efficiency by optimising vehicle speed trajectory and energy management system. By types of vehicle control, these methods can be categorised into three main scenarios, optimal energy management for a single CAV (single-vehicle), energy-optimal strategy for the vehicle following scenario (two-vehicle), and optimal autonomous intersection management for CAVs (multiple-vehicle).
The first part of this thesis is devoted to the optimal energy management for a single automated series HEV with consideration of engine start-stop system (SSS) under battery charge sustaining operation. A heuristic hysteresis power threshold strategy (HPTS) is proposed to optimise the fuel economy of an HEV with SSS and extra penalty fuel for engine restarts. By a systematic tuning process, the overall control performance of HPTS can be fully optimised for different vehicle parameters and driving cycles.
In the second part, two energy-optimal control strategies via a model predictive control (MPC) framework are proposed for the vehicle following problem. To forecast the behaviour of the preceding vehicle, a neural network predictor is utilised and incorporated into a nonlinear MPC method, of which the fuel and computational efficiencies are verified to be effective through comparisons of numerical examples between a practical adaptive cruise control strategy and an impractical optimal control method. A robust MPC (RMPC) via linear matrix inequality (LMI) is also utilised to deal with the uncertainties existing in V2V communication and modelling errors. By conservative relaxation and approximation, the RMPC problem is formulated as a convex semi-definite program, and the simulation results prove the robustness of the RMPC and the rapid computational efficiency resorting to the convex optimisation.
The final part focuses on the centralised and decentralised control frameworks at signal-free intersections, where the energy consumption and the crossing time of a group of CAVs are minimised. Their crossing order and velocity trajectories are optimised by convex second-order cone programs in a hierarchical scheme subject to safety constraints. It is shown that the centralised strategy with consideration of turning manoeuvres is effective and outperforms a benchmark solution invoking the widely used first-in-first-out policy. On the other hand, the decentralised method is proposed to further improve computational efficiency and enhance the system robustness via a tube-based RMPC. The numerical examples of both frameworks highlight the importance of examining the trade-off between energy consumption and travel time, as small compromises in travel time could produce significant energy savings.Open Acces
Automatic Control of General Anesthesia: New Developments and Clinical Experiments
L’anestesia generale è uno stato di coma farmacologicamente indotto, temporaneo e reversibile. Il suo obiettivo consiste nel provocare la perdita totale della coscienza e nel sopprimere la percezione del dolore. Essa costituisce un aspetto fondamentale per la medicina moderna in quanto consente di praticare interventi chirurgici invasivi senza causare ansia e dolore al paziente. Nella pratica clinica dell’anestesia totalmente endovenosa questi effetti vengono generalmente ottenuti mediante la somministrazione simultanea del farmaco ipnotico propofol e del farmaco analgesico remifentanil. Il dosaggio di questi farmaci viene gestito dal medico anestesista basandosi su linee guida farmacologiche e monitorando la risposta clinica del paziente. Recenti sviluppi nelle tecniche di elaborazione dei segnali fisiologici hanno consentito di ottenere degli indicatori quantitativi dello stato anestetico del paziente. Tali indicatori possono essere utilizzati come segnali di retroazione per sistemi di controllo automatico dell'anestesia. Lo sviluppo di questi sistemi ha come obiettivo quello di fornire uno strumento di supporto per l'anestesista.
Il lavoro presentato in questa tesi è stato svolto nell'ambito del progetto di ricerca riguardante il controllo automatico dell'anestesia attivo presso l'Università degli Studi di Brescia. Esso è denominato ACTIVA (Automatic Control of Total IntraVenous Anesthesia) ed è il risultato della collaborazione tra il Gruppo di Ricerca sui Sistemi di Controllo dell’Università degli Studi di Brescia e l’Unità Operativa Anestesia e Rianimazione 2 degli Spedali Civili di Brescia. L’obiettivo del progetto ACTIVA consiste nello sviluppo teorico, nell’implementazione e nella validazione clinica di strategie di controllo innovative per il controllo automatico dell’anestesia totalmente endovenosa. Nel dettaglio, in questa tesi vengono inizialmente presentati i risultati sperimentali ottenuti con strutture di controllo basate sull'algoritmo PID e PID ad eventi per la somministrazione di propofol e remifentanil. Viene poi presentato lo sviluppo teorico e la validazione clinica di strutture di controllo predittivo basate su modello. Successivamente vengono presentati i risultati di uno studio in simulazione riguardante una soluzione di controllo innovativa che consente all'anestesista di regolare esplicitamente il bilanciamento tra propofol e remifentanil. Infine, vengono presentati gli sviluppi teorici ed i relativi studi in simulazione riguardanti soluzioni di controllo personalizzate per le fasi di induzione e mantenimento dell'anestesia.General anesthesia is a state of pharmacologically induced, temporary and reversible coma. Its goal is to cause total loss of consciousness and suppress the perception of pain. It constitutes a fundamental aspect of modern medicine as it allows invasive surgical procedures to be performed without causing anxiety and pain to the patient. In the clinical practice of total intravenous anesthesia, these effects are generally obtained by the simultaneous administration of the hypnotic drug propofol and of the analgesic drug remifentanil. The dosing of these drugs is managed by the anesthesiologist on the basis of pharmacological guidelines and by monitoring the patient's clinical response. Recent developments in physiological signal processing techniques have introduced the possibility to obtain quantitative indicators of the patient's anesthetic state. These indicators can be used as feedback signals for automatic anesthesia control systems. The development of these systems aims to provide a support tool for the anesthesiologist.
The work presented in this thesis has been carried out in the framework of the research project concerning the automatic control anesthesia at the University of Brescia. The project is called ACTIVA (Automatic Control of Total IntraVenous Anesthesia) and is the result of the collaboration between the Research Group on Control Systems of the University of Brescia and the Anesthesia and Intensive Care Unit 2 of the Spedali Civili di Brescia. The objective of the ACTIVA project consists in the theoretical development, implementation, and clinical validation of innovative control strategies for the automatic control of total intravenous anesthesia. In detail, in this thesis the experimental results obtained with control structures based on the PID and on event-based PID controllers for the administration of propofol and remifentanil are initially presented. The theoretical development and clinical validation of model predictive control strategies is then proposed. Next, the results of a simulation study regarding an innovative control solution that allows the anesthesiologist to explicitly adjust the balance between propofol and remifentanil are given. Finally, the theoretical developments and the relative simulation studies concerning personalized control solutions for induction and maintenance phases of anesthesia are explained
Reinforcement Learning Curricula as Interpolations between Task Distributions
In the last decade, the increased availability of powerful computing machinery has led to an increasingly widespread application of machine learning methods. Machine learning has been particularly successful when large models, typically neural networks with an ever-increasing number of parameters, can leverage vast data to make predictions.
While reinforcement learning (RL) has been no exception from this development, a distinguishing feature of RL is its well-known exploration-exploitation trade-off, whose optimal solution – while possible to model as a partially observable Markov decision process – evades computation in all but the simplest problems. Consequently, it seems unsurprising that notable demonstrations of reinforcement learning, such as an RL-based Go agent (AlphaGo) by Deepmind beating the professional Go player Lee Sedol, relied both on the availability of massive computing capabilities and specific forms of regularization that facilitate learning. In the case of AlphaGo, this regularization came in the form of self-play, enabling learning by interacting with gradually more proficient opponents.
In this thesis, we develop techniques that, similarly to the concept of self-play of AlphaGo, improve the learning performance of RL agents by training on sequences of increasingly complex tasks. These task sequences are typically called curricula and are known to side-step problems such as slow learning or convergence to poor behavior that may occur when directly learning in complicated tasks. The algorithms we develop in this thesis create curricula by minimizing distances or divergences between probability distributions of learning tasks, generating interpolations between an initial distribution of easy learning tasks and a target task distribution. Apart from improving the learning performance of RL agents in experiments, developing methods that realize curricula as interpolations between task distributions results in a nuanced picture of key aspects of successful reinforcement learning curricula.
In Chapter 1, we start this thesis by introducing required reinforcement learning notation and then motivating curriculum reinforcement learning from the perspective of continuation methods for non-linear optimization. Similar to curricula for reinforcement learning agents, continuation methods have been used in non-linear optimization to solve challenging optimization problems. This similarity provides an intuition about the effect of the curricula we aim to generate and their limits.
In Chapter 2, we transfer the concept of self-paced learning, initially proposed in the supervised learning community, to the problem of RL, showing that an automated curriculum generation for RL agents can be motivated by a regularized RL objective. This regularized RL objective implies generating a curriculum as a sequence of task distributions that trade off the expected agent performance against similarity to a specified distribution of target tasks. This view on curriculum RL contrasts existing approaches, as it motivates curricula via a regularized RL objective instead of generating them from a set of assumptions about an optimal curriculum. In experiments, we show that an approximate implementation of the aforementioned curriculum – that restricts the interpolating task distribution to a Gaussian – results in improved learning performance compared to regular reinforcement learning, matching or surpassing the performance of existing curriculum-based methods.
Subsequently, Chapter 3 builds up on the intuition of curricula as sequences of interpolating task distributions established in Chapter 2. Motivated by using more flexible task distribution representations, we show how parametric assumptions play a crucial role in the empirical success of the previous approach and subsequently uncover key ingredients that enable the generation of meaningful curricula without assuming a parametric model of the task distributions. One major ingredient is an explicit notion of task similarity via a distance function of two Markov Decision Processes. We turn towards optimal transport theory, allowing for flexible particle-based representations of the task distributions while properly considering the newly introduced metric structure of the task space. Combined with other improvements to our first method, such as a more aggressive restriction of the curriculum to tasks that are not too hard for the agent, the resulting approach delivers consistently high learning performance in multiple experiments.
In the final Chapter 4, we apply the refined method of Chapter 3 to a trajectory-tracking task, in which we task an RL agent to follow a three-dimensional reference trajectory with the tip of an inverted pendulum mounted on a Barrett Whole Arm Manipulator. The access to only positional information results in a partially observable system that, paired with its inherent instability, underactuation, and non-trivial kinematic structure, presents a challenge for modern reinforcement learning algorithms, which we tackle via curricula. The technically infinite-dimensional task space of target trajectories allows us to probe the developed curriculum learning method for flaws that have not surfaced in the rather low-dimensional experiments of the previous chapters. Through an improved optimization scheme that better respects the non-Euclidean structure of target trajectories, we reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to learn state estimation and control for non-linear tracking tasks jointly.
In summary, this thesis introduces a perspective on reinforcement learning curricula as interpolations between task distributions. The methods developed under this perspective enjoy a precise formulation as optimization problems and deliver empirical benefits throughout experiments. Building upon this precise formulation may allow future work to advance the formal understanding of reinforcement learning curricula and, with that, enable the solution of challenging decision-making and control problems with reinforcement learning
Disability-free life expectancy of Italian older adults: trends, inequalities, and applications
Italy's ageing population may pose challenges to the sustainability of the country's socioeconomic and healthcare systems. This depends on the (un)healthy ageing process. The disability status of mid-to-older adults is a crucial determinant of individuals' autonomy and participation in society. Disability-free life expectancy (DFLE) is an important metric for assessing the health and disability risks of the population in a summary indicator, neat of the age structure. Demographic changes also affect intergenerational relationships and in Italy, where grandparents play a significant role in caregiving, it is crucial to study their health evolution. This thesis aims to, first, detect the long-term trend of DFLE in Italy and to analyse the drivers of its change in terms of disability-specific mortality and dynamics of disability onset and recovery. Second, to shed light on gender, socioeconomic, and territorial inequalities in DFLE (and their intersections) and the factors driving these inequalities in terms of differences in mortality and disability risks. Third, to analyse the trend of the length of life to live as grandparents free from disability and understand how it is influenced by age-specific survival and grandparenthood-disability prevalence evolution. The thesis applies different demographic and statistical methods to different cross-sectional and longitudinal data and provides DFLE estimates, trends and applications for mid-to-older Italian men and women. The findings show that while DFLE at mid-to-older ages has increased, it has not always progressed as favourably as life expectancy. The greatest contribution to DFLE changes is the changes in the transition in and out of disability. There are notable differences in DFLE at older ages within the country, between genders and educational groups. Women have a life expectancy advantage, but their health disadvantage counterbalances it. The disadvantage in DFLE accumulates over education and region of residence, resulting in higher educated living in northern regions having more than double DFLE than lower educated living in southern regions. Health differences are also the major contributors to educational differences in DFLE. Italian grandmothers and grandfathers are gaining years of coexistence-life-time with their grandchildren in good functional health. Women can expect to live more years as disability-free grandmothers than men, but their share of disability-free grandmothers years over total years as grandmothers is lower than that for men. The increase in disability-free grandparenthood years is primarily led by improved survival and health conditions and, for men, by the postponement of grandparenthood to older ages
- …