34,346 research outputs found

    We Need to Talk About Random Splits

    Full text link
    Gorman and Bedrick (2019) argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. We can also split data in biased or adversarial ways, e.g., training on short sentences and evaluating on long ones. Biased sampling has been used in domain adaptation to simulate real-world drift; this is known as the covariate shift assumption. In NLP, however, even worst-case splits, maximizing bias, often under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This invalidates the covariate shift assumption. Instead of using multiple random splits, future benchmarks should ideally include multiple, independent test sets instead; if infeasible, we argue that multiple biased splits leads to more realistic performance estimates than multiple random splits.Comment: Accepted at EACL 202

    Mediation and peace

    Get PDF
    This paper applies mechanism design to conflict resolution. We determine when and how unmediated communication and mediation reduce the ex ante probability of conflict in a game with asymmetric information. Mediation improves upon unmediated communication when the intensity of conflict is high, or when asymmetric information is significant. The mediator improves upon unmediated communication by not precisely reporting information to conflicting parties, and precisely, by not revealing to a player with probability one that the opponent is weak. Arbitrators who can enforce settlements are no more effective than mediators who only make non-binding recommendations

    On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

    Full text link
    Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation

    A random tunnel number one 3-manifold does not fiber over the circle

    Get PDF
    We address the question: how common is it for a 3-manifold to fiber over the circle? One motivation for considering this is to give insight into the fairly inscrutable Virtual Fibration Conjecture. For the special class of 3-manifolds with tunnel number one, we provide compelling theoretical and experimental evidence that fibering is a very rare property. Indeed, in various precise senses it happens with probability 0. Our main theorem is that this is true for a measured lamination model of random tunnel number one 3-manifolds. The first ingredient is an algorithm of K Brown which can decide if a given tunnel number one 3-manifold fibers over the circle. Following the lead of Agol, Hass and W Thurston, we implement Brown's algorithm very efficiently by working in the context of train tracks/interval exchanges. To analyze the resulting algorithm, we generalize work of Kerckhoff to understand the dynamics of splitting sequences of complete genus 2 interval exchanges. Combining all of this with a "magic splitting sequence" and work of Mirzakhani proves the main theorem. The 3-manifold situation contrasts markedly with random 2-generator 1-relator groups; in particular, we show that such groups "fiber" with probability strictly between 0 and 1.Comment: This is the version published by Geometry & Topology on 15 December 200

    The roundtable: an abstract model of conversation dynamics

    Full text link
    Is it possible to abstract a formal mechanism originating schisms and governing the size evolution of social conversations? In this work a constructive solution to such problem is proposed: an abstract model of a generic N-party turn-taking conversation. The model develops from simple yet realistic assumptions derived from experimental evidence, abstracts from conversation content and semantics while including topological information, and is driven by stochastic dynamics. We find that a single mechanism - namely the dynamics of conversational party's individual fitness, as related to conversation size - controls the development of the self-organized schisming phenomenon. Potential generalizations of the model - including individual traits and preferences, memory effects and more elaborated conversational topologies - may find important applications also in other fields of research, where dynamically-interacting and networked agents play a fundamental role.Comment: 18 pages, 4 figures, to be published in Journal of Artificial Societies and Social Simulatio

    Secure bit commitment from relativistic constraints

    Full text link
    We investigate two-party cryptographic protocols that are secure under assumptions motivated by physics, namely relativistic assumptions (no-signalling) and quantum mechanics. In particular, we discuss the security of bit commitment in so-called split models, i.e. models in which at least some of the parties are not allowed to communicate during certain phases of the protocol. We find the minimal splits that are necessary to evade the Mayers-Lo-Chau no-go argument and present protocols that achieve security in these split models. Furthermore, we introduce the notion of local versus global command, a subtle issue that arises when the split committer is required to delegate non-communicating agents to open the commitment. We argue that classical protocols are insecure under global command in the split model we consider. On the other hand, we provide a rigorous security proof in the global command model for Kent's quantum protocol [Kent 2011, Unconditionally Secure Bit Commitment by Transmitting Measurement Outcomes]. The proof employs two fundamental principles of modern physics, the no-signalling property of relativity and the uncertainty principle of quantum mechanics.Comment: published version, IEEE format, 18 pages, 8 figure

    Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems

    Full text link
    This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation
    corecore