510 research outputs found

    Pareto Probing: Trading Off Accuracy for Complexity

    Full text link
    The question of how to probe contextual word representations for linguistic structure in a way that is both principled and useful has seen significant attention recently in the NLP literature. In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume. To measure complexity, we present a number of parametric and non-parametric metrics. Our experiments using Pareto hypervolume as an evaluation metric show that probes often do not conform to our expectations---e.g., why should the non-contextual fastText representations encode more morpho-syntactic information than the contextual BERT representations? These results suggest that common, simplistic probing tasks, such as part-of-speech labeling and dependency arc labeling, are inadequate to evaluate the linguistic structure encoded in contextual word representations. This leads us to propose full dependency parsing as a probing task. In support of our suggestion that harder probing tasks are necessary, our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.Comment: Tiago Pimentel and Naomi Saphra contributed equally to this work. Camera ready version of EMNLP 2020 publication. Code available in https://github.com/rycolab/pareto-probin

    Shift-Reduce CCG Parsing with a Dependency Model

    Get PDF
    This paper presents the first dependency model for a shift-reduce CCG parser. Modelling dependencies is desirable for a number of reasons, including handling the “spurious ” ambiguity of CCG; fitting well with the theory of CCG; and optimizing for structures which are evaluated at test time. We develop a novel training technique using a dependency oracle, in which all derivations are hidden. A challenge arises from the fact that the oracle needs to keep track of exponentially many goldstandard derivations, which is solved by integrating a packed parse forest with the beam-search decoder. Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models

    The imperfect observer: Mind, machines, and materialism in the 21st century

    Get PDF
    The dualist / materialist debates about the nature of consciousness are based on the assumption that an entirely physical universe must ultimately be observable by humans (with infinitely advanced tools). Thus the dualists claim that anything unobservable must be non-physical, while the materialists argue that in theory nothing is unobservable. However, there may be fundamental limitations in the power of human observation, no matter how well aided, that greatly curtail our ability to know and observe even a fully physical universe. This paper presents arguments to support the model of an inherently limited observer and explores the consequences of this view

    Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

    Get PDF
    The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insights together, we propose a simple but widely applicable parser architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages.Comment: 14 pages, 1 figure; camera-ready version for IWPT 202

    Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

    Full text link
    Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic tasks. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a few-shot parsing setting.Comment: Accepted in BlackboxNLP@EMNLP202

    Poverty and suicide research in low- and middle-income countries: systematic mapping of literature published in English and a proposed research agenda

    Get PDF
    Approximately 75% of suicides occur in low- and middle-income countries (LMICs) where rates of poverty are high. Evidence suggests a relationship between economic variables and suicidal behaviour. To plan effective suicide prevention interventions in LMICs we need to understand the relationship between poverty and suicidal behaviour and how contextual factors may mediate this relationship. We conducted a systematic mapping of the English literature on poverty and suicidal behaviour in LMICs, to provide an overview of what is known about this topic, highlight gaps in literature, and consider the implications of current knowledge for research and policy. Eleven databases were searched using a combination of key words for suicidal ideation and behaviours, poverty and LMICs to identify articles published in English between January 2004 and April 2014. Narrative analysis was performed for the 84 studies meeting inclusion criteria. Most English studies in this area come from South Asia and Middle, East and North Africa, with a relative dearth of studies from countries in Sub-Saharan Africa. Most of the available evidence comes from upper middle-income countries; only 6% of studies come from low-income countries. Most studies focused on poverty measures such as unemployment and economic status, while neglecting dimensions such as debt, relative and absolute poverty, and support from welfare systems. Most studies are conducted within a risk-factor paradigm and employ descriptive statistics thus providing little insight into the nature of the relationship. More robust evidence is needed in this area, with theory-driven studies focussing on a wider range of poverty dimensions, and employing more sophisticated statistical methods
    • 

    corecore