801 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Peak Estimation of Time Delay Systems using Occupation Measures
This work proposes a method to compute the maximum value obtained by a state
function along trajectories of a Delay Differential Equation (DDE). An example
of this task is finding the maximum number of infected people in an epidemic
model with a nonzero incubation period. The variables of this peak estimation
problem include the stopping time and the original history (restricted to a
class of admissible histories). The original nonconvex DDE peak estimation
problem is approximated by an infinite-dimensional Linear Program (LP) in
occupation measures, inspired by existing measure-based methods in peak
estimation and optimal control. This LP is approximated from above by a
sequence of Semidefinite Programs (SDPs) through the moment-Sum of Squares
(SOS) hierarchy. Effectiveness of this scheme in providing peak estimates for
DDEs is demonstrated with provided examplesComment: 34 pages, 14 figures, 3 table
Temperament & Character account for brain functional connectivity at rest: A diathesis-stress model of functional dysregulation in psychosis
The human brainâs resting-state functional connectivity (rsFC) provides stable trait-like measures of differences in the perceptual, cognitive, emotional, and social functioning of individuals. The rsFC of the prefrontal cortex is hypothesized to mediate a personâs rational self-government, as is also measured by personality, so we tested whether its connectivity networks account for vulnerability to psychosis and related personality configurations. Young adults were recruited as outpatients or controls from the same communities around psychiatric clinics. Healthy controls (nâ=â30) and clinically stable outpatients with bipolar disorder (nâ=â35) or schizophrenia (nâ=â27) were diagnosed by structured interviews, and then were assessed with standardized protocols of the Human Connectome Project. Data-driven clustering identified five groups of patients with distinct patterns of rsFC regardless of diagnosis. These groups were distinguished by rsFC networks that regulate specific biopsychosocial aspects of psychosis: sensory hypersensitivity, negative emotional balance, impaired attentional control, avolition, and social mistrust. The rsFc group differences were validated by independent measures of white matter microstructure, personality, and clinical features not used to identify the subjects. We confirmed that each connectivity group was organized by differential collaborative interactions among six prefrontal and eight other automatically-coactivated networks. The temperament and character traits of the members of these groups strongly accounted for the differences in rsFC between groups, indicating that configurations of rsFC are internal representations of personality organization. These representations involve weakly self-regulated emotional drives of fear, irrational desire, and mistrust, which predispose to psychopathology. However, stable outpatients with different diagnoses (bipolar or schizophrenic psychoses) were highly similar in rsFC and personality. This supports a diathesis-stress model in which different complex adaptive systems regulate predisposition (which is similar in stable outpatients despite diagnosis) and stress-induced clinical dysfunction (which differs by diagnosis)
Real-time simulations of transmon systems with time-dependent Hamiltonian models
In this thesis we study aspects of Hamiltonian models which can affect the
time evolution of transmon systems. We model the time evolution of various
systems as a unitary real-time process by numerically solving the
time-dependent Schr\"odinger equation (TDSE). We denote the corresponding
computer models as non-ideal gate-based quantum computer (NIGQC) models since
transmons are usually used as transmon qubits in superconducting prototype
gate-based quantum computers (PGQCs).We first review the ideal gate-based
quantum computer (IGQC) model and provide a distinction between the IGQC, PGQCs
and the NIGQC models we consider in this thesis. Then, we derive the circuit
Hamiltonians which generate the dynamics of fixed-frequency and flux-tunable
transmons. Furthermore, we also provide clear and concise derivations of
effective Hamiltonians for both types of transmons. We use the circuit and
effective Hamiltonians we derived to define two many-particle Hamiltonians,
namely a circuit and an associated effective Hamiltonian. The interactions
between the different subsystems are modelled as dipole-dipole interactions.
Next, we develop two product-formula algorithms which solve the TDSE for the
Hamiltonians we defined. Afterwards, we use these algorithms to investigate how
various frequently applied assumptions affect the time evolution of transmon
systems modelled with the many-particle effective Hamiltonian when a control
pulse is applied. Here we also compare the time evolutions generated by the
effective and circuit Hamiltonian. We find that the assumptions we investigate
can substantially affect the time evolution of the probability amplitudes we
model. Next, we investigate how susceptible gate-error quantifiers are to
assumptions which make up the NIGQC model. We find that the assumptions we
consider clearly affect gate-error quantifiers like the diamond distance and
the average infidelity.Comment: Dissertation, 203 pages, RWTH Aachen University, 2023. This
dissertation includes and extends the results of arXiv:2201.02402 and
arXiv:2211.1101
Reinforcement Learning Curricula as Interpolations between Task Distributions
In the last decade, the increased availability of powerful computing machinery has led to an increasingly widespread application of machine learning methods. Machine learning has been particularly successful when large models, typically neural networks with an ever-increasing number of parameters, can leverage vast data to make predictions.
While reinforcement learning (RL) has been no exception from this development, a distinguishing feature of RL is its well-known exploration-exploitation trade-off, whose optimal solution â while possible to model as a partially observable Markov decision process â evades computation in all but the simplest problems. Consequently, it seems unsurprising that notable demonstrations of reinforcement learning, such as an RL-based Go agent (AlphaGo) by Deepmind beating the professional Go player Lee Sedol, relied both on the availability of massive computing capabilities and specific forms of regularization that facilitate learning. In the case of AlphaGo, this regularization came in the form of self-play, enabling learning by interacting with gradually more proficient opponents.
In this thesis, we develop techniques that, similarly to the concept of self-play of AlphaGo, improve the learning performance of RL agents by training on sequences of increasingly complex tasks. These task sequences are typically called curricula and are known to side-step problems such as slow learning or convergence to poor behavior that may occur when directly learning in complicated tasks. The algorithms we develop in this thesis create curricula by minimizing distances or divergences between probability distributions of learning tasks, generating interpolations between an initial distribution of easy learning tasks and a target task distribution. Apart from improving the learning performance of RL agents in experiments, developing methods that realize curricula as interpolations between task distributions results in a nuanced picture of key aspects of successful reinforcement learning curricula.
In Chapter 1, we start this thesis by introducing required reinforcement learning notation and then motivating curriculum reinforcement learning from the perspective of continuation methods for non-linear optimization. Similar to curricula for reinforcement learning agents, continuation methods have been used in non-linear optimization to solve challenging optimization problems. This similarity provides an intuition about the effect of the curricula we aim to generate and their limits.
In Chapter 2, we transfer the concept of self-paced learning, initially proposed in the supervised learning community, to the problem of RL, showing that an automated curriculum generation for RL agents can be motivated by a regularized RL objective. This regularized RL objective implies generating a curriculum as a sequence of task distributions that trade off the expected agent performance against similarity to a specified distribution of target tasks. This view on curriculum RL contrasts existing approaches, as it motivates curricula via a regularized RL objective instead of generating them from a set of assumptions about an optimal curriculum. In experiments, we show that an approximate implementation of the aforementioned curriculum â that restricts the interpolating task distribution to a Gaussian â results in improved learning performance compared to regular reinforcement learning, matching or surpassing the performance of existing curriculum-based methods.
Subsequently, Chapter 3 builds up on the intuition of curricula as sequences of interpolating task distributions established in Chapter 2. Motivated by using more flexible task distribution representations, we show how parametric assumptions play a crucial role in the empirical success of the previous approach and subsequently uncover key ingredients that enable the generation of meaningful curricula without assuming a parametric model of the task distributions. One major ingredient is an explicit notion of task similarity via a distance function of two Markov Decision Processes. We turn towards optimal transport theory, allowing for flexible particle-based representations of the task distributions while properly considering the newly introduced metric structure of the task space. Combined with other improvements to our first method, such as a more aggressive restriction of the curriculum to tasks that are not too hard for the agent, the resulting approach delivers consistently high learning performance in multiple experiments.
In the final Chapter 4, we apply the refined method of Chapter 3 to a trajectory-tracking task, in which we task an RL agent to follow a three-dimensional reference trajectory with the tip of an inverted pendulum mounted on a Barrett Whole Arm Manipulator. The access to only positional information results in a partially observable system that, paired with its inherent instability, underactuation, and non-trivial kinematic structure, presents a challenge for modern reinforcement learning algorithms, which we tackle via curricula. The technically infinite-dimensional task space of target trajectories allows us to probe the developed curriculum learning method for flaws that have not surfaced in the rather low-dimensional experiments of the previous chapters. Through an improved optimization scheme that better respects the non-Euclidean structure of target trajectories, we reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to learn state estimation and control for non-linear tracking tasks jointly.
In summary, this thesis introduces a perspective on reinforcement learning curricula as interpolations between task distributions. The methods developed under this perspective enjoy a precise formulation as optimization problems and deliver empirical benefits throughout experiments. Building upon this precise formulation may allow future work to advance the formal understanding of reinforcement learning curricula and, with that, enable the solution of challenging decision-making and control problems with reinforcement learning
Temperament & Character account for brain functional connectivity at rest: A diathesis-stress model of functional dysregulation in psychosis
The online version contains supplementary material
available at https://doi.org/10.1038/s41380-023-02039-6The human brainâs resting-state functional connectivity (rsFC) provides stable trait-like measures of differences in the perceptual,
cognitive, emotional, and social functioning of individuals. The rsFC of the prefrontal cortex is hypothesized to mediate a personâs
rational self-government, as is also measured by personality, so we tested whether its connectivity networks account for
vulnerability to psychosis and related personality configurations. Young adults were recruited as outpatients or controls from the
same communities around psychiatric clinics. Healthy controls (n = 30) and clinically stable outpatients with bipolar disorder
(n = 35) or schizophrenia (n = 27) were diagnosed by structured interviews, and then were assessed with standardized protocols of
the Human Connectome Project. Data-driven clustering identified five groups of patients with distinct patterns of rsFC regardless of
diagnosis. These groups were distinguished by rsFC networks that regulate specific biopsychosocial aspects of psychosis: sensory
hypersensitivity, negative emotional balance, impaired attentional control, avolition, and social mistrust. The rsFc group differences
were validated by independent measures of white matter microstructure, personality, and clinical features not used to identify the
subjects. We confirmed that each connectivity group was organized by differential collaborative interactions among six prefrontal
and eight other automatically-coactivated networks. The temperament and character traits of the members of these groups
strongly accounted for the differences in rsFC between groups, indicating that configurations of rsFC are internal representations of
personality organization. These representations involve weakly self-regulated emotional drives of fear, irrational desire, and
mistrust, which predispose to psychopathology. However, stable outpatients with different diagnoses (bipolar or schizophrenic
psychoses) were highly similar in rsFC and personality. This supports a diathesis-stress model in which different complex adaptive
systems regulate predisposition (which is similar in stable outpatients despite diagnosis) and stress-induced clinical dysfunction
(which differs by diagnosis).EU FEDER grants through the Spanish Ministry of Science and Technology
PID2021-125017OB-I00,
RTI2018-098983-B-I00,
D43 TW011793-06A1,
PID2021-125017OB-I00,
RTI2018-098983-B-I00,
D43 TW011793-06A1United States Department of Health & Human Services
National Institutes of Health (NIH) - USA
R01-MH124060Psychosis-Risk Outcomes Network
U01 MH12463
Least-cost distribution network tariff design in theory and practice
First published online: 31 December 2020In this paper a game-theoretical model with self-interest pursuing consumers is introduced in order to assess how to design a least-cost distribution tariff under two constraints that regulators typically face. The first constraint is related to difficulties regarding the implementation of cost-reflective tariffs. In practice, so-called cost-reflective tariffs are only a proxy for the actual cost driver(s) in distribution grids. The second constraint has to do with fairness. There is a fear that active consumers investing in distributed energy resources (DER) might benefit at the expense of passive consumers. We find that both constraints have a significant impact on the least-cost network tariff design, and the results depend on the state of the grid. If most of the grid investments still have to be made, passive and active consumers can both benefit from cost-reflective tariffs, while this is not the case for passive consumers if the costs are mostly sunk.Published version of EUI WP RSCAS, 2018/1
Safety and Liveness of Quantitative Automata
The safety-liveness dichotomy is a fundamental concept in formal languages which plays a key role in verification. Recently, this dichotomy has been lifted to quantitative properties, which are arbitrary functions from infinite words to partially-ordered domains. We look into harnessing the dichotomy for the specific classes of quantitative properties expressed by quantitative automata. These automata contain finitely many states and rational-valued transition weights, and their common value functions Inf, Sup, LimInf, LimSup, LimInfAvg, LimSupAvg, and DSum map infinite words into the totally-ordered domain of real numbers. In this automata-theoretic setting, we establish a connection between quantitative safety and topological continuity and provide an alternative characterization of quantitative safety and liveness in terms of their boolean counterparts. For all common value functions, we show how the safety closure of a quantitative automaton can be constructed in PTime, and we provide PSpace-complete checks of whether a given quantitative automaton is safe or live, with the exception of LimInfAvg and LimSupAvg automata, for which the safety check is in ExpSpace. Moreover, for deterministic Sup, LimInf, and LimSup automata, we give PTime decompositions into safe and live automata. These decompositions enable the separation of techniques for safety and liveness verification for quantitative specifications
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Actor-critic (AC) methods are widely used in reinforcement learning (RL) and
benefit from the flexibility of using any policy gradient method as the actor
and value-based method as the critic. The critic is usually trained by
minimizing the TD error, an objective that is potentially decorrelated with the
true goal of achieving a high reward with the actor. We address this mismatch
by designing a joint objective for training the actor and critic in a
decision-aware fashion. We use the proposed objective to design a generic, AC
algorithm that can easily handle any function approximation. We explicitly
characterize the conditions under which the resulting algorithm guarantees
monotonic policy improvement, regardless of the choice of the policy and critic
parameterization. Instantiating the generic algorithm results in an actor that
involves maximizing a sequence of surrogate functions (similar to TRPO, PPO)
and a critic that involves minimizing a closely connected objective. Using
simple bandit examples, we provably establish the benefit of the proposed
critic objective over the standard squared error. Finally, we empirically
demonstrate the benefit of our decision-aware actor-critic framework on simple
RL problems.Comment: 44 page
Time, movement, urban space. âPlaces of transitâ: an urban asynchrony?
The aim of the research project is to examine the spatial transformation of âplaces of transitâ for migrants in 21st century Europe. The aim is to analyse the relationship between these places and the contemporary city, and how the latter is being transformed, or should be transformed, taking account - or not - of their existence. Transit places are part of a short timeframe, with a normatively limited duration; their existence, however, has an impact on the morphology of the city over a (more or less) long period. Rather than considering them as âout-of-the-way placesâ or as âhigh placesâ (usually in the media), letâs start from the premise that, like the other elements of the urban jigsaw, they are places that (de) structure, organise and shape the city and the lives of those who live there. Characterised by a specific temporality, one in which exile becomes waiting, these places are âintervalsâ because they are as much perimeters obeying specific social and spatial dynamics as they are âpockets of timeâ covering ways of living in this territory that are out of sync or poorly synchronised with the âurban pulseâ. The question of time is therefore a key factor in understanding the relationship between these places. Bringing space and time together here amounts to âdiscoveringâ what constitutes a mesh, paradoxically not very visible and yet the matrix of urban space, where the nodes are called chronotopes (rhythms), urban continuities (duration) and spatial resistances (memory). This is my line of enquiry. Based on five case studies (Calais, Lampedusa, Lavrio, Lesbos, Amman/Zaatari), this thesis offers a viewpoint on the dynamics that govern the integration, or otherwise, of âplaces of transitâ into a network that extends beyond them both spatially and temporally.Le projet de recherche entend questionner la transformation spatiale des « lieux de transit » pour personnes migrantes dans lâEurope du 21e sieÌcle. Il sâagit dâanalyser les relations entre ces lieux et la ville contemporaine, comment cette dernieÌre se transforme ou doit se transformer en tenant compte â ou non â de leur existence. Les « lieux de transit » sont inscrits dans un temps court, aÌ dureÌe « normativement » limiteÌe ; leur existence a toutefois un impact sur la morphologie de la ville selon une (plus ou moins) longue dureÌe. PlutoÌt que de les consideÌrer ou comme des « hors lieux » ou comme des « hauts lieux » (meÌdiatiques le plus souvent), partons du postulat quâils sont, aÌ lâinstar des autres eÌleÌments du puzzle urbain, des lieux qui (deÌ)structurent, organisent, nervurent la citeÌ et la vie de ceux qui lâhabitent. CaracteÌriseÌs par une temporaliteÌ speÌcifique, celle ouÌ lâexil se fait attente, ces lieux sont « intervalles » car ils sont tout autant des peÌrimeÌtres obeÌissant aÌ des dynamiques sociales et spatiales particulieÌres que des « poches de temps » qui recouvrent des manieÌres, deÌsynchroniseÌes ou mal synchroniseÌes par rapport aÌ la « pulsation urbaine », de vivre ce territoire. La question du temps est deÌs lors une entreÌe deÌterminante pour lire la relation entre ces lieux. AgreÌger ici lâespace et le temps revient aÌ Â« deÌcouvrir » ce qui constitue un maillage, paradoxalement peu visible et pourtant matriciel de lâespace urbain ouÌ les nĆuds se nomment chronotopes (rythmes), continuiteÌs urbaines (dureÌe) et reÌsistances spatiales (meÌmoire). Ceci est ma ligne dâhorizon. AÌ partir de cinq cas dâeÌtude (Calais, Lampedusa, Lavrio, Lesbos, Amman/Zaatari), cette theÌse propose un point de vue sur les dynamiques qui preÌsident aÌ lâinteÌgration, ou non, des « lieux de transit » dans un maillage qui les deÌpasse et spatialement et temporellement
- âŠ