6 research outputs found

    Training Datasets for Machine Reading Comprehension and Their Limitations

    Get PDF
    Neural networks are a powerful model class to learn machine Reading Comprehen- sion (RC), yet they crucially depend on the availability of suitable training datasets. In this thesis we describe methods for data collection, evaluate the performance of established models, and examine a number of model behaviours and dataset limita- tions. We first describe the creation of a data resource for the science exam QA do- main, and compare existing models on the resulting dataset. The collected ques- tions are plausible – non-experts can distinguish them from real exam questions with 55% accuracy – and using them as additional training data leads to improved model scores on real science exam questions. Second, we describe and apply a distant supervision dataset construction method for multi-hop RC across documents. We identify and mitigate several dataset assembly pitfalls – a lack of unanswerable candidates, label imbalance, and spurious correlations between documents and particular candidates – which often leave shallow predictive cues for the answer. Furthermore we demonstrate that se- lecting relevant document combinations is a critical performance bottleneck on the datasets created. We thus investigate Pseudo-Relevance Feedback, which leads to improvements compared to TF-IDF-based document combination selection both in retrieval metrics and answer accuracy. Third, we investigate model undersensitivity: model predictions do not change when given adversarially altered questions in SQUAD2.0 and NEWSQA, even though they should. We characterise affected samples, and show that the phe- nomenon is related to a lack of structurally similar but unanswerable samples during training: data augmentation reduces the adversarial error rate, e.g. from 51.7% to 20.7% for a BERT model on SQUAD2.0, and improves robustness also in other settings. Finally we explore efficient formal model verification via Interval Bound Propagation (IBP) to measure and address model undersensitivity, and show that using an IBP-derived auxiliary loss can improve verification rates, e.g. from 2.8% to 18.4% on the SNLI test set

    State-of-the-art generalisation research in NLP: a taxonomy and review

    Get PDF
    The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to up-date as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference

    USING THE DIAMETRICAL MODEL TO EXAMINE THE RELATIONSHIP BETWEEN THE AUTISM AND PSYCHOSIS SPECTRA

    Get PDF
    Schizophrenia and autism spectrum disorders (SSD; ASD) share clinical features, although considered distinct. Theories contrast ASD and SSD social cognition. The reasoning for this thesis is based on dimensional models of personality spanning from the healthy to pathological variations. Under this scenario, do some healthy autistic traits oppose to schizotypic ones on a Mentalism continuum? Also, does this psychometric opposition correspond to a behavioural one, f.i. in processing face and gaze? First, we validated schizotypic and autistic trait questionnaires in French. Second, we identified shared and diametrical traits. Third, we conducted 3 experiments to measure face pareidolia-proneness. We expected larger pareidolia-proneness with larger positive schizotypy, and smaller autistic trait scores. Fourth, we assessed gaze direction discrimination, and gaze cueing of attention. We expected larger sensitivity to gaze with larger positive schizotypy, but a smaller one with larger autistic traits. Psychometrically, we replicated oppositions between autistic mentalizing deficits and positive schizotypic traits. Although pareidolia-proneness was unrelated to personality, configural face processing was impaired with larger positive schizotypy, but preserved with smaller autistic mentalizing deficits scores. Also, gaze sensitivity was decreased in men with larger autistic mentalizing traits, but unassociated with positive schizotypy. Our results partially support ASD-SSD opposition in social cognition, to be further confirmed by future studies. Pareidolia-proneness may be better measured using other measurement strategies. Gaze direction attribution might better contrast ASD and SSD. Comparisons of resembling disorder-related phenotypes is promising for understanding underlying aetiological mechanisms, notably using a transdiagnostic approach associating personality, cognitive styles, endophenotypes, and multidimensional or network models. -- Les troubles des spectres schizophréniques et autistiques (TSS; TSA) sont cliniquement ressemblants, mais catégoriellement distincts. Des théories opposent la cognition sociale des TSA et TSS. Le raisonnement de cette thèse se base sur les modèles dimensionnels de la personnalité comme reliant normal et pathologique. Aussi, certains traits autistiques s'opposent-ils aux traits schizotypiques ? Une opposition psychométrique correspond-elle à une opposition comportementale, i.e. dans le traitement des visages et du regard ? Premièrement, nous avons validé les questionnaires de personnalité schizotypiques et autistiques. Deuxièmement, nous avons identifié les traits partagés et opposés. Troisièmement, nous avons conduit 3 expériences sur la paréidolie facial, que nous attendions associée à plus de schizotypie positive et moins de traits autistiques. Quatrièmement, nous avons examiné la discrimination de la direction du regard et la redirection de l'attention par le regard, que nous attendions associées à plus de schizotypie positive et moins de traits autistiques. Au niveau psychométrique, nous avons répliqués les oppositions entre traits autistiques de mentalisation déficitaire et traits schizotypiques positifs. Bien que paréidolie et personnalité étaient sans liens, le traitement configural des informations faciales était péjoré avec plus de schizotypie positive, mais préservé avec plus de déficits autistiques de mentalisation. Aussi, la sensibilité au regard était moindre chez les hommes avec plus de déficits autistiques de mentalisation, mais sans lien avec la schizotypie positive. Nos résultats soutiennent partiellement une opposition TSA-TSS de la cognition sociale, à confirmer par de futures études. La tendance à la paréidolie gagnerait à être mesurée par d'autres stratégies. L'attribution de la direction du regard pourrait mieux distinguer TSA et TSS. La comparaison de phénotypes psychiatriques resemblants est une approche prometteuse pour comprendre des méchanismes étiologiques sous-jacents, notamment par une approche transdiagnostique associant la personnalité, les styles cognitifs, les endophénotypes, des modèles multidimensionels ou en réseau
    corecore