165 research outputs found
Improved Relation Extraction with Feature-Rich Compositional Embedding Models
Compositional embedding models build a representation (or embedding) for a
linguistic structure based on its component word embeddings. We propose a
Feature-rich Compositional Embedding Model (FCM) for relation extraction that
is expressive, generalizes to new domains, and is easy-to-implement. The key
idea is to combine both (unlexicalized) hand-crafted features with learned word
embeddings. The model is able to directly tackle the difficulties met by
traditional compositional embeddings models, such as handling arbitrary types
of sentence annotations and utilizing global information for composition. We
test the proposed model on two relation extraction tasks, and demonstrate that
our model outperforms both previous compositional models and traditional
feature rich models on the ACE 2005 relation extraction task, and the SemEval
2010 relation classification task. The combination of our model and a
log-linear classifier with hand-crafted features gives state-of-the-art
results.Comment: 12 pages for EMNLP 201
First Year GPA and Academic Service Use Among College Students With and Without ADHD
ADHD is a chronic neurodevelopmental disorder characterized by significant impairments in attention and behavioral inhibition typically resulting in academic difficulties that persist into college (Weyandt & DuPaul, 2013). Although most colleges offer support services, students often do not utilize the services they are entitled to or have available to them (Chew et al., 2009). The current study is the first to examine differences in GPA using a rigorously defined, multi-site sample. Second, the current study seeks to identify the predictors of academic performance specifically among college students with ADHD. Third, this study provides data regarding how often students with ADHD utilize academic support services. Finally, the current study investigates the academic outcomes of service use among students with and without ADHD during their first year at a four-year college. Results demonstrated significantly lower GPAs among a rigorously defined, multi-site sample of first year college students with ADHD relative to students without ADHD. Second, this study indicated that traditional predictors of college success may be less meaningful for students with ADHD. Third, ADHD combined with other disorders, but not ADHD alone, predicted higher rates of service use relative to students without ADHD. Finally, the present results suggest that typically available academic services are not independently related to GPA among first-year college students with or without ADHD
AdaFocal: Calibration-aware Adaptive Focal Loss
Much recent work has been devoted to the problem of ensuring that a neural
network's confidence scores match the true probability of being correct, i.e.
the calibration problem. Of note, it was found that training with focal loss
leads to better calibration than cross-entropy while achieving similar level of
accuracy \cite{mukhoti2020}. This success stems from focal loss regularizing
the entropy of the model's prediction (controlled by the parameter ),
thereby reining in the model's overconfidence. Further improvement is expected
if is selected independently for each training sample
(Sample-Dependent Focal Loss (FLSD-53) \cite{mukhoti2020}). However, FLSD-53 is
based on heuristics and does not generalize well. In this paper, we propose a
calibration-aware adaptive focal loss called AdaFocal that utilizes the
calibration properties of focal (and inverse-focal) loss and adaptively
modifies for different groups of samples based on
from the previous step and the knowledge of model's under/over-confidence on
the validation set. We evaluate AdaFocal on various image recognition and one
NLP task, covering a wide variety of network architectures, to confirm the
improvement in calibration while achieving similar levels of accuracy.
Additionally, we show that models trained with AdaFocal achieve a significant
boost in out-of-distribution detection.Comment: Accepted to NeurIPS 202
Learning Mutually Informed Representations for Characters and Subwords
Most pretrained language models rely on subword tokenization, which processes
text as a sequence of subword tokens. However, different granularities of text,
such as characters, subwords, and words, can contain different kinds of
information. Previous studies have shown that incorporating multiple input
granularities improves model generalization, yet very few of them outputs
useful representations for each granularity. In this paper, we introduce the
entanglement model, aiming to combine character and subword language models.
Inspired by vision-language models, our model treats characters and subwords as
separate modalities, and it generates mutually informed representations for
both granularities as output. We evaluate our model on text classification,
named entity recognition, and POS-tagging tasks. Notably, the entanglement
model outperforms its backbone language models, particularly in the presence of
noisy texts and low-resource languages. Furthermore, the entanglement model
even outperforms larger pre-trained models on all English sequence labeling
tasks and classification tasks. Our anonymized code is available at
https://anonymous.4open.science/r/noisy-IE-A67
He Said, She Said: Style Transfer for Shifting the Perspective of Dialogues
In this work, we define a new style transfer task: perspective shift, which
reframes a dialogue from informal first person to a formal third person
rephrasing of the text. This task requires challenging coreference resolution,
emotion attribution, and interpretation of informal text. We explore several
baseline approaches and discuss further directions on this task when applied to
short dialogues. As a sample application, we demonstrate that applying
perspective shifting to a dialogue summarization dataset (SAMSum) substantially
improves the zero-shot performance of extractive news summarization models on
this data. Additionally, supervised extractive models perform better when
trained on perspective shifted data than on the original dialogues. We release
our code publicly.Comment: Findings of EMNLP 2022, 18 page
Graphical Models with Structured Factors, Neural Factors, and Approximation-aware Training
This thesis broadens the space of rich yet practical models for structured prediction. We introduce a general framework for modeling with four ingredients: (1) latent variables, (2) structural constraints, (3) learned (neural) feature representations of the inputs, and (4) training that takes the approximations made during inference into account. The thesis builds up to this framework through an empirical study of three NLP tasks: semantic role labeling, relation extraction, and dependency parsing -- obtaining state-of-the-art results on the former two. We apply the resulting graphical models with structured and neural factors, and approximation-aware learning to jointly model part-of-speech tags, a syntactic dependency parse, and semantic roles in a low-resource setting where the syntax is unobserved. We present an alternative view of these models as neural networks with a topology inspired by inference on graphical models that encode our intuitions about the data
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Since the proposal of transformers, these models have been limited to bounded
input lengths, because of their need to attend to every token in the input. In
this work, we propose Unlimiformer: a general approach that wraps any existing
pretrained encoder-decoder transformer, and offloads the cross-attention
computation to a single k-nearest-neighbor (kNN) index, while the returned kNN
distances are the attention dot-product scores. This kNN index can be kept on
either the GPU or CPU memory and queried in sub-linear time; this way, we can
index practically unlimited input sequences, while every attention head in
every decoder layer retrieves its top-k keys, instead of attending to every
key. We evaluate Unlimiformer on several long-document and book-summarization
benchmarks, showing that it can process even 500k token-long inputs from the
BookSum dataset, without any input truncation at test time. We demonstrate that
Unlimiformer improves pretrained models such as BART and Longformer by
extending them to unlimited inputs without additional learned weights and
without modifying their code. We make our code and models publicly available at
https://github.com/abertsch72/unlimiformer .Comment: NeurIPS 202
- …