34,346 research outputs found
We Need to Talk About Random Splits
Gorman and Bedrick (2019) argued for using random splits rather than standard
splits in NLP experiments. We argue that random splits, like standard splits,
lead to overly optimistic performance estimates. We can also split data in
biased or adversarial ways, e.g., training on short sentences and evaluating on
long ones. Biased sampling has been used in domain adaptation to simulate
real-world drift; this is known as the covariate shift assumption. In NLP,
however, even worst-case splits, maximizing bias, often under-estimate the
error observed on new samples of in-domain data, i.e., the data that models
should minimally generalize to at test time. This invalidates the covariate
shift assumption. Instead of using multiple random splits, future benchmarks
should ideally include multiple, independent test sets instead; if infeasible,
we argue that multiple biased splits leads to more realistic performance
estimates than multiple random splits.Comment: Accepted at EACL 202
Mediation and peace
This paper applies mechanism design to conflict resolution. We determine when and how unmediated communication and mediation reduce the ex ante probability of conflict in a game with asymmetric information. Mediation improves upon unmediated communication when the intensity of conflict is high, or when asymmetric
information is significant. The mediator improves upon unmediated communication by not precisely reporting information to conflicting parties, and precisely, by not
revealing to a player with probability one that the opponent is weak. Arbitrators
who can enforce settlements are no more effective than mediators who only make
non-binding recommendations
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Out-of-distribution (OOD) testing is increasingly popular for evaluating a
machine learning system's ability to generalize beyond the biases of a training
set. OOD benchmarks are designed to present a different joint distribution of
data and labels between training and test time. VQA-CP has become the standard
OOD benchmark for visual question answering, but we discovered three troubling
practices in its current use. First, most published methods rely on explicit
knowledge of the construction of the OOD splits. They often rely on
``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the
common training answer is 'no'. Second, the OOD test set is used for model
selection. Third, a model's in-domain performance is assessed after retraining
it on in-domain splits (VQA v2) that exhibit a more balanced distribution of
labels. These three practices defeat the objective of evaluating
generalization, and put into question the value of methods specifically
designed for this dataset. We show that embarrassingly-simple methods,
including one that generates answers at random, surpass the state of the art on
some question types. We provide short- and long-term solutions to avoid these
pitfalls and realize the benefits of OOD evaluation
A random tunnel number one 3-manifold does not fiber over the circle
We address the question: how common is it for a 3-manifold to fiber over the
circle? One motivation for considering this is to give insight into the fairly
inscrutable Virtual Fibration Conjecture. For the special class of 3-manifolds
with tunnel number one, we provide compelling theoretical and experimental
evidence that fibering is a very rare property. Indeed, in various precise
senses it happens with probability 0. Our main theorem is that this is true for
a measured lamination model of random tunnel number one 3-manifolds.
The first ingredient is an algorithm of K Brown which can decide if a given
tunnel number one 3-manifold fibers over the circle. Following the lead of
Agol, Hass and W Thurston, we implement Brown's algorithm very efficiently by
working in the context of train tracks/interval exchanges. To analyze the
resulting algorithm, we generalize work of Kerckhoff to understand the dynamics
of splitting sequences of complete genus 2 interval exchanges. Combining all of
this with a "magic splitting sequence" and work of Mirzakhani proves the main
theorem.
The 3-manifold situation contrasts markedly with random 2-generator 1-relator
groups; in particular, we show that such groups "fiber" with probability
strictly between 0 and 1.Comment: This is the version published by Geometry & Topology on 15 December
200
The roundtable: an abstract model of conversation dynamics
Is it possible to abstract a formal mechanism originating schisms and
governing the size evolution of social conversations? In this work a
constructive solution to such problem is proposed: an abstract model of a
generic N-party turn-taking conversation. The model develops from simple yet
realistic assumptions derived from experimental evidence, abstracts from
conversation content and semantics while including topological information, and
is driven by stochastic dynamics. We find that a single mechanism - namely the
dynamics of conversational party's individual fitness, as related to
conversation size - controls the development of the self-organized schisming
phenomenon. Potential generalizations of the model - including individual
traits and preferences, memory effects and more elaborated conversational
topologies - may find important applications also in other fields of research,
where dynamically-interacting and networked agents play a fundamental role.Comment: 18 pages, 4 figures, to be published in Journal of Artificial
Societies and Social Simulatio
Secure bit commitment from relativistic constraints
We investigate two-party cryptographic protocols that are secure under
assumptions motivated by physics, namely relativistic assumptions
(no-signalling) and quantum mechanics. In particular, we discuss the security
of bit commitment in so-called split models, i.e. models in which at least some
of the parties are not allowed to communicate during certain phases of the
protocol. We find the minimal splits that are necessary to evade the
Mayers-Lo-Chau no-go argument and present protocols that achieve security in
these split models. Furthermore, we introduce the notion of local versus global
command, a subtle issue that arises when the split committer is required to
delegate non-communicating agents to open the commitment. We argue that
classical protocols are insecure under global command in the split model we
consider. On the other hand, we provide a rigorous security proof in the global
command model for Kent's quantum protocol [Kent 2011, Unconditionally Secure
Bit Commitment by Transmitting Measurement Outcomes]. The proof employs two
fundamental principles of modern physics, the no-signalling property of
relativity and the uncertainty principle of quantum mechanics.Comment: published version, IEEE format, 18 pages, 8 figure
Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems
This paper presents the Frames dataset (Frames is available at
http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues
with an average of 15 turns per dialogue. We developed this dataset to study
the role of memory in goal-oriented dialogue systems. Based on Frames, we
introduce a task called frame tracking, which extends state tracking to a
setting where several states are tracked simultaneously. We propose a baseline
model for this task. We show that Frames can also be used to study memory in
dialogue management and information presentation through natural language
generation
- …