510 research outputs found
Pareto Probing: Trading Off Accuracy for Complexity
The question of how to probe contextual word representations for linguistic
structure in a way that is both principled and useful has seen significant
attention recently in the NLP literature. In our contribution to this
discussion, we argue for a probe metric that reflects the fundamental trade-off
between probe complexity and performance: the Pareto hypervolume. To measure
complexity, we present a number of parametric and non-parametric metrics. Our
experiments using Pareto hypervolume as an evaluation metric show that probes
often do not conform to our expectations---e.g., why should the non-contextual
fastText representations encode more morpho-syntactic information than the
contextual BERT representations? These results suggest that common, simplistic
probing tasks, such as part-of-speech labeling and dependency arc labeling, are
inadequate to evaluate the linguistic structure encoded in contextual word
representations. This leads us to propose full dependency parsing as a probing
task. In support of our suggestion that harder probing tasks are necessary, our
experiments with dependency parsing reveal a wide gap in syntactic knowledge
between contextual and non-contextual representations.Comment: Tiago Pimentel and Naomi Saphra contributed equally to this work.
Camera ready version of EMNLP 2020 publication. Code available in
https://github.com/rycolab/pareto-probin
Shift-Reduce CCG Parsing with a Dependency Model
This paper presents the first dependency model for a shift-reduce CCG parser. Modelling dependencies is desirable for a number of reasons, including handling the âspurious â ambiguity of CCG; fitting well with the theory of CCG; and optimizing for structures which are evaluated at test time. We develop a novel training technique using a dependency oracle, in which all derivations are hidden. A challenge arises from the fact that the oracle needs to keep track of exponentially many goldstandard derivations, which is solved by integrating a packed parse forest with the beam-search decoder. Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models
The imperfect observer: Mind, machines, and materialism in the 21st century
The dualist / materialist debates about the nature of consciousness are based on the assumption that an entirely physical universe must ultimately be observable by humans (with infinitely advanced tools). Thus the dualists claim that anything unobservable must be non-physical, while the materialists argue that in theory nothing is unobservable. However, there may be fundamental limitations in the power of human observation, no matter how well aided, that greatly curtail our ability to know and observe even a fully physical universe. This paper presents arguments to support the model of an inherently limited observer and explores the consequences of this view
Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary
The introduction of pre-trained transformer-based contextualized word
embeddings has led to considerable improvements in the accuracy of graph-based
parsers for frameworks such as Universal Dependencies (UD). However, previous
works differ in various dimensions, including their choice of pre-trained
language models and whether they use LSTM layers. With the aims of
disentangling the effects of these choices and identifying a simple yet widely
applicable architecture, we introduce STEPS, a new modular graph-based
dependency parser. Using STEPS, we perform a series of analyses on the UD
corpora of a diverse set of languages. We find that the choice of pre-trained
embeddings has by far the greatest impact on parser performance and identify
XLM-R as a robust choice across the languages in our study. Adding LSTM layers
provides no benefits when using transformer-based embeddings. A multi-task
training setup outputting additional UD features may contort results. Taking
these insights together, we propose a simple but widely applicable parser
architecture and configuration, achieving new state-of-the-art results (in
terms of LAS) for 10 out of 12 diverse languages.Comment: 14 pages, 1 figure; camera-ready version for IWPT 202
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
Contextualized word representations, such as ELMo and BERT, were shown to
perform well on various semantic and syntactic tasks. In this work, we tackle
the task of unsupervised disentanglement between semantics and structure in
neural language representations: we aim to learn a transformation of the
contextualized vectors, that discards the lexical semantics, but keeps the
structural information. To this end, we automatically generate groups of
sentences which are structurally similar but semantically different, and use
metric-learning approach to learn a transformation that emphasizes the
structural component that is encoded in the vectors. We demonstrate that our
transformation clusters vectors in space by structural properties, rather than
by lexical semantics. Finally, we demonstrate the utility of our distilled
representations by showing that they outperform the original contextualized
representations in a few-shot parsing setting.Comment: Accepted in BlackboxNLP@EMNLP202
Poverty and suicide research in low- and middle-income countries: systematic mapping of literature published in English and a proposed research agenda
Approximately 75% of suicides occur in low- and middle-income countries (LMICs) where rates of poverty are high. Evidence suggests a relationship between economic variables and suicidal behaviour. To plan effective suicide prevention interventions in LMICs we need to understand the relationship between poverty and suicidal behaviour and how contextual factors may mediate this relationship. We conducted a systematic mapping of the English literature on poverty and suicidal behaviour in LMICs, to provide an overview of what is known about this topic, highlight gaps in literature, and consider the implications of current knowledge for research and policy. Eleven databases were searched using a combination of key words for suicidal ideation and behaviours, poverty and LMICs to identify articles published in English between January 2004 and April 2014. Narrative analysis was performed for the 84 studies meeting inclusion criteria. Most English studies in this area come from South Asia and Middle, East and North Africa, with a relative dearth of studies from countries in Sub-Saharan Africa. Most of the available evidence comes from upper middle-income countries; only 6% of studies come from low-income countries. Most studies focused on poverty measures such as unemployment and economic status, while neglecting dimensions such as debt, relative and absolute poverty, and support from welfare systems. Most studies are conducted within a risk-factor paradigm and employ descriptive statistics thus providing little insight into the nature of the relationship. More robust evidence is needed in this area, with theory-driven studies focussing on a wider range of poverty dimensions, and employing more sophisticated statistical methods
- âŠ