16 research outputs found
Anderson's orthogonality catastrophe
The topic of this thesis is a mathematical treatment of Anderson's orthogonality catastrophe. Named after P.W. Anderson, who studied the phenomenon in the late 1960s, the catastrophe is an intrinsic effect in Fermi gases. In his first work on the topic in [Phys. Rev. Lett. 18:1049--1051], Anderson studied a system of noninteracting fermions in three space dimensions and found the ground state to be asymptotically orthogonal to the ground state of the same system perturbed by a finite-range scattering potential.
More precisely, let be the -body ground state of the fermionic system in a -dimensional box of length ,and let be the ground state of the corresponding system in the presence of the additional finite-range potential. Then the catastrophe brings about the asymptotic vanishing of the overlap of the -body ground states and . The asymptotics is in the thermodynamic limit and with fixed density .
In [Commun. Math. Phys. 329:979--998], the overlap has been bounded from above with an asymptotic bound of the form \abs{\S_L^N}^2 \lesssim L^{-\tilde{\gamma}}. The decay exponent there corresponds to the one of Anderson in [Phys. Rev. Lett. 18:1049--1051]. Another publication by Anderson from the same year, [Phys. Rev. 164:352--359], contains the exact asymptotics with a bigger coefficient .
This thesis features a step towards the exact asymptotics. We prove a bound with a coefficient that corresponds in a certain sense to the one in [Phys. Rev. 164:352--359], and improves upon the one in [Commun. Math. Phys. 329:979--998]. We use the methods from [Commun. Math. Phys. 329:979--998], but treat every term in a series expansion of , instead of only the first one. Treating the higher order terms introduces additional arguments since the trace expressions occurring are no longer necessarily nonnegative, which complicates some of the estimates.
The main contents of this thesis will also be published in a forthcoming article co-authored with Martin Gebert, Peter Müller, and Peter Otte
The exponent in the orthogonality catastrophe for Fermi gases
We quantify the asymptotic vanishing of the ground-state overlap of two
non-interacting Fermi gases in -dimensional Euclidean space in the
thermodynamic limit. Given two one-particle Schr\"odinger operators in
finite-volume which differ by a compactly supported bounded potential, we prove
a power-law upper bound on the ground-state overlap of the corresponding
non-interacting -particle systems. We interpret the decay exponent
in terms of scattering theory and find , where is the transition matrix at
the Fermi energy . This exponent reduces to the one predicted by Anderson
[Phys. Rev. 164, 352-359 (1967)] for the exact asymptotics in the special case
of a repulsive point-like perturbation.Comment: Version as to appear in J. Spectr. Theory, References update
Anderson's orthogonality catastrophe
The topic of this thesis is a mathematical treatment of Anderson's orthogonality catastrophe. Named after P.W. Anderson, who studied the phenomenon in the late 1960s, the catastrophe is an intrinsic effect in Fermi gases. In his first work on the topic in [Phys. Rev. Lett. 18:1049--1051], Anderson studied a system of noninteracting fermions in three space dimensions and found the ground state to be asymptotically orthogonal to the ground state of the same system perturbed by a finite-range scattering potential.
More precisely, let be the -body ground state of the fermionic system in a -dimensional box of length ,and let be the ground state of the corresponding system in the presence of the additional finite-range potential. Then the catastrophe brings about the asymptotic vanishing of the overlap of the -body ground states and . The asymptotics is in the thermodynamic limit and with fixed density .
In [Commun. Math. Phys. 329:979--998], the overlap has been bounded from above with an asymptotic bound of the form \abs{\S_L^N}^2 \lesssim L^{-\tilde{\gamma}}. The decay exponent there corresponds to the one of Anderson in [Phys. Rev. Lett. 18:1049--1051]. Another publication by Anderson from the same year, [Phys. Rev. 164:352--359], contains the exact asymptotics with a bigger coefficient .
This thesis features a step towards the exact asymptotics. We prove a bound with a coefficient that corresponds in a certain sense to the one in [Phys. Rev. 164:352--359], and improves upon the one in [Commun. Math. Phys. 329:979--998]. We use the methods from [Commun. Math. Phys. 329:979--998], but treat every term in a series expansion of , instead of only the first one. Treating the higher order terms introduces additional arguments since the trace expressions occurring are no longer necessarily nonnegative, which complicates some of the estimates.
The main contents of this thesis will also be published in a forthcoming article co-authored with Martin Gebert, Peter Müller, and Peter Otte
Learning with AMIGo: Adversarially Motivated Intrinsic Goals
A key challenge for reinforcement learning (RL) consists of learning in
environments with sparse extrinsic rewards. In contrast to current RL methods,
humans are able to learn new skills with little or no reward by using various
forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating --
as form of meta-learning -- a goal-generating teacher that proposes
Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student"
policy in the absence of (or alongside) environment reward. Specifically,
through a simple but effective "constructively adversarial" objective, the
teacher learns to propose increasingly challenging -- yet achievable -- goals
that allow the student to learn general skills for acting in a new environment,
independent of the task to be solved. We show that our method generates a
natural curriculum of self-proposed goals which ultimately allows the agent to
solve challenging procedurally-generated tasks where other forms of intrinsic
motivation and state-of-the-art RL methods fail.Comment: 18 pages, 6 figures, published at The Ninth International Conference
on Learning Representations (2021
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Open-domain Question Answering models which directly leverage question-answer
(QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show
promise in terms of speed and memory compared to conventional models which
retrieve and read from text corpora. QA-pair retrievers also offer
interpretable answers, a high degree of control, and are trivial to update at
test time with new knowledge. However, these models lack the accuracy of
retrieve-and-read systems, as substantially less knowledge is covered by the
available QA-pairs relative to text corpora like Wikipedia. To facilitate
improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very
large resource of 65M automatically-generated QA-pairs. We introduce a new
QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and
caches test questions, enabling RePAQ to match the accuracy of recent
retrieve-and-read models, whilst being significantly faster. Using PAQ, we
train CBQA models which outperform comparable baselines by 5%, but trail RePAQ
by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be
configured for size (under 500MB) or speed (over 1K questions per second)
whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at
selective QA, abstaining from answering when it is likely to be incorrect. This
enables RePAQ to ``back-off" to a more expensive state-of-the-art model,
leading to a combined system which is both more accurate and 2x faster than the
state-of-the-art model alone
Grounding Aleatoric Uncertainty in Unsupervised Environment Design
Adaptive curricula in reinforcement learning (RL) have proven effective for
producing policies robust to discrepancies between the train and test
environment. Recently, the Unsupervised Environment Design (UED) framework
generalized RL curricula to generating sequences of entire environments,
leading to new methods with robust minimax regret properties. Problematically,
in partially-observable or stochastic settings, optimal policies may depend on
the ground-truth distribution over aleatoric parameters of the environment in
the intended deployment setting, while curriculum learning necessarily shifts
the training distribution. We formalize this phenomenon as curriculum-induced
covariate shift (CICS), and describe how its occurrence in aleatoric parameters
can lead to suboptimal policies. Directly sampling these parameters from the
ground-truth distribution avoids the issue, but thwarts curriculum learning. We
propose SAMPLR, a minimax regret UED method that optimizes the ground-truth
utility function, even when the underlying training data is biased due to CICS.
We prove, and validate on challenging domains, that our approach preserves
optimality under the ground-truth distribution, while promoting robustness
across the full range of environment settings
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when fine-tuned on
downstream NLP tasks. However, their ability to access and precisely manipulate
knowledge is still limited, and hence on knowledge-intensive tasks, their
performance lags behind task-specific architectures. Additionally, providing
provenance for their decisions and updating their world knowledge remain open
research problems. Pre-trained models with a differentiable access mechanism to
explicit non-parametric memory can overcome this issue, but have so far been
only investigated for extractive downstream tasks. We explore a general-purpose
fine-tuning recipe for retrieval-augmented generation (RAG) -- models which
combine pre-trained parametric and non-parametric memory for language
generation. We introduce RAG models where the parametric memory is a
pre-trained seq2seq model and the non-parametric memory is a dense vector index
of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG
formulations, one which conditions on the same retrieved passages across the
whole generated sequence, the other can use different passages per token. We
fine-tune and evaluate our models on a wide range of knowledge-intensive NLP
tasks and set the state-of-the-art on three open domain QA tasks, outperforming
parametric seq2seq models and task-specific retrieve-and-extract architectures.
For language generation tasks, we find that RAG models generate more specific,
diverse and factual language than a state-of-the-art parametric-only seq2seq
baseline.Comment: Accepted at NeurIPS 202