6 research outputs found
Training Datasets for Machine Reading Comprehension and Their Limitations
Neural networks are a powerful model class to learn machine Reading Comprehen- sion (RC), yet they crucially depend on the availability of suitable training datasets. In this thesis we describe methods for data collection, evaluate the performance of established models, and examine a number of model behaviours and dataset limita- tions. We first describe the creation of a data resource for the science exam QA do- main, and compare existing models on the resulting dataset. The collected ques- tions are plausible – non-experts can distinguish them from real exam questions with 55% accuracy – and using them as additional training data leads to improved model scores on real science exam questions. Second, we describe and apply a distant supervision dataset construction method for multi-hop RC across documents. We identify and mitigate several dataset assembly pitfalls – a lack of unanswerable candidates, label imbalance, and spurious correlations between documents and particular candidates – which often leave shallow predictive cues for the answer. Furthermore we demonstrate that se- lecting relevant document combinations is a critical performance bottleneck on the datasets created. We thus investigate Pseudo-Relevance Feedback, which leads to improvements compared to TF-IDF-based document combination selection both in retrieval metrics and answer accuracy. Third, we investigate model undersensitivity: model predictions do not change when given adversarially altered questions in SQUAD2.0 and NEWSQA, even though they should. We characterise affected samples, and show that the phe- nomenon is related to a lack of structurally similar but unanswerable samples during training: data augmentation reduces the adversarial error rate, e.g. from 51.7% to 20.7% for a BERT model on SQUAD2.0, and improves robustness also in other settings. Finally we explore efficient formal model verification via Interval Bound Propagation (IBP) to measure and address model undersensitivity, and show that using an IBP-derived auxiliary loss can improve verification rates, e.g. from 2.8% to 18.4% on the SNLI test set
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural
language processing (NLP). Yet, what `good generalisation' entails and how it
should be evaluated is not well understood, nor are there any common standards
to evaluate it. In this paper, we aim to lay the ground-work to improve both of
these issues. We present a taxonomy for characterising and understanding
generalisation research in NLP, we use that taxonomy to present a comprehensive
map of published generalisation studies, and we make recommendations for which
areas might deserve attention in the future. Our taxonomy is based on an
extensive literature review of generalisation research, and contains five axes
along which studies can differ: their main motivation, the type of
generalisation they aim to solve, the type of data shift they consider, the
source by which this data shift is obtained, and the locus of the shift within
the modelling pipeline. We use our taxonomy to classify over 400 previous
papers that test generalisation, for a total of more than 600 individual
experiments. Considering the results of this review, we present an in-depth
analysis of the current state of generalisation research in NLP, and make
recommendations for the future. Along with this paper, we release a webpage
where the results of our review can be dynamically explored, and which we
intend to up-date as new NLP generalisation studies are published. With this
work, we aim to make steps towards making state-of-the-art generalisation
testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference
Recommended from our members
The effect of context on the performance of children with ADHD on a series of computerised tasks and games
This thesis examines context effects in relation to the performance of children with ADHD in test and 'real world' situations. There is a wealth of empirical research that illustrates poor performance of these children on a range of cognitive measures, particularly tasks that claim to measure executive function and inhibitory control. However, anecdotal reports have suggested that while playing computer games these children display abilities that contrast sharply with empirical findings. This contrast was the basis for a series of studies using computer games and computerised tasks to investigate the performance of children with ADHD across contexts.
The first investigation (Study 1), a questionnaire study, lent support to the anecdotal reports. Parents of children with ADHD confirmed that their children were able to sit still, concentrate, pay attention and achieve higher levels of success when playing computer games. In Study 2 parents of children with ADHD were asked to discuss the features of computer games they felt were most influential in contributing to their child's interest and performance. Observations made in the Study 3 provided further confirmation that performance improves when children with ADHD play computer games; performance in terms of error making and and on-task activity on a standardised test of inhibition and attention, the Conners' Continuous Performance Test II (CPT II), was significantly poorer than performance on a more 'game' like Pokemon version of the task and significantly different to the performance of typically developing children. Features of computer games that may have contributed to the observed improvements for children with ADHD were examined in four subsequent studies. These features included the addition of narrative, the addition of a points scoring system, the addition of character, auditory reinforcement and differing levels of response cost. Inhibitory performance on two commercially available games was also investigated (Study 8), and the performance of participants with ADHD was not significantly different to that of typically developing participants. The results raise questions about current understanding of the disorder and models of ADHD, stress the need for examining contextual sensitivity of children with apparently constitutional disorders such as ADHD, and have implications for methodological design and the contexts in which cognitive abilities are investigated
USING THE DIAMETRICAL MODEL TO EXAMINE THE RELATIONSHIP BETWEEN THE AUTISM AND PSYCHOSIS SPECTRA
Schizophrenia and autism spectrum disorders (SSD; ASD) share clinical features, although considered distinct. Theories contrast ASD and SSD social cognition. The reasoning for this thesis is based on dimensional models of personality spanning from the healthy to pathological variations. Under this scenario, do some healthy autistic traits oppose to schizotypic ones on a Mentalism continuum? Also, does this psychometric opposition correspond to a behavioural one, f.i. in processing face and gaze? First, we validated schizotypic and autistic trait questionnaires in French. Second, we identified shared and diametrical traits. Third, we conducted 3 experiments to measure face pareidolia-proneness. We expected larger pareidolia-proneness with larger positive schizotypy, and smaller autistic trait scores. Fourth, we assessed gaze direction discrimination, and gaze cueing of attention. We expected larger sensitivity to gaze with larger positive schizotypy, but a smaller one with larger autistic traits. Psychometrically, we replicated oppositions between autistic mentalizing deficits and positive schizotypic traits. Although pareidolia-proneness was unrelated to personality, configural face processing was impaired with larger positive schizotypy, but preserved with smaller autistic mentalizing deficits scores. Also, gaze sensitivity was decreased in men with larger autistic mentalizing traits, but unassociated with positive schizotypy. Our results partially support ASD-SSD opposition in social cognition, to be further confirmed by future studies. Pareidolia-proneness may be better measured using other measurement strategies. Gaze direction attribution might better contrast ASD and SSD. Comparisons of resembling disorder-related phenotypes is promising for understanding underlying aetiological mechanisms, notably using a transdiagnostic approach associating personality, cognitive styles, endophenotypes, and multidimensional or network models.
--
Les troubles des spectres schizophréniques et autistiques (TSS; TSA) sont cliniquement ressemblants, mais catégoriellement distincts. Des théories opposent la cognition sociale des TSA et TSS. Le raisonnement de cette thèse se base sur les modèles dimensionnels de la personnalité comme reliant normal et pathologique. Aussi, certains traits autistiques s'opposent-ils aux traits schizotypiques ? Une opposition psychométrique correspond-elle à une opposition comportementale, i.e. dans le traitement des visages et du regard ? Premièrement, nous avons validé les questionnaires de personnalité schizotypiques et autistiques. Deuxièmement, nous avons identifié les traits partagés et opposés. Troisièmement, nous avons conduit 3 expériences sur la paréidolie facial, que nous attendions associée à plus de schizotypie positive et moins de traits autistiques. Quatrièmement, nous avons examiné la discrimination de la direction du regard et la redirection de l'attention par le regard, que nous attendions associées à plus de schizotypie positive et moins de traits autistiques. Au niveau psychométrique, nous avons répliqués les oppositions entre traits autistiques de mentalisation déficitaire et traits schizotypiques positifs. Bien que paréidolie et personnalité étaient sans liens, le traitement configural des informations faciales était péjoré avec plus de schizotypie positive, mais préservé avec plus de déficits autistiques de mentalisation. Aussi, la sensibilité au regard était moindre chez les hommes avec plus de déficits autistiques de mentalisation, mais sans lien avec la schizotypie positive. Nos résultats soutiennent partiellement une opposition TSA-TSS de la cognition sociale, à confirmer par de futures études. La tendance à la paréidolie gagnerait à être mesurée par d'autres stratégies. L'attribution de la direction du regard pourrait mieux distinguer TSA et TSS. La comparaison de phénotypes psychiatriques resemblants est une approche prometteuse pour comprendre des méchanismes étiologiques sous-jacents, notamment par une approche transdiagnostique associant la personnalité, les styles cognitifs, les endophénotypes, des modèles multidimensionels ou en réseau