104 research outputs found
Active Discriminative Text Representation Learning
We propose a new active learning (AL) method for text classification with
convolutional neural networks (CNNs). In AL, one selects the instances to be
manually labeled with the aim of maximizing model performance with minimal
effort. Neural models capitalize on word embeddings as representations
(features), tuning these to the task at hand. We argue that AL strategies for
multi-layered neural models should focus on selecting instances that most
affect the embedding space (i.e., induce discriminative word representations).
This is in contrast to traditional AL approaches (e.g., entropy-based
uncertainty sampling), which specify higher level objectives. We propose a
simple approach for sentence classification that selects instances containing
words whose embeddings are likely to be updated with the greatest magnitude,
thereby rapidly learning discriminative, task-specific embeddings. We extend
this approach to document classification by jointly considering: (1) the
expected changes to the constituent word representations; and (2) the model's
current overall uncertainty regarding the instance. The relative emphasis
placed on these criteria is governed by a stochastic process that favors
selecting instances likely to improve representations at the outset of
learning, and then shifts toward general uncertainty sampling as AL progresses.
Empirical results show that our method outperforms baseline AL approaches on
both sentence and document classification tasks. We also show that, as
expected, the method quickly learns discriminative word embeddings. To the best
of our knowledge, this is the first work on AL addressing neural models for
text classification.Comment: This paper got accepted by AAAI 201
Dating Texts without Explicit Temporal Cues
This paper tackles temporal resolution of documents, such as determining when
a document is about or when it was written, based only on its text. We apply
techniques from information retrieval that predict dates via language models
over a discretized timeline. Unlike most previous works, we rely {\it solely}
on temporal cues implicit in the text. We consider both document-likelihood and
divergence based techniques and several smoothing methods for both of them. Our
best model predicts the mid-point of individuals' lives with a median of 22 and
mean error of 36 years for Wikipedia biographies from 3800 B.C. to the present
day. We also show that this approach works well when training on such
biographies and predicting dates both for non-biographical Wikipedia pages
about specific years (500 B.C. to 2010 A.D.) and for publication dates of short
stories (1798 to 2008). Together, our work shows that, even in absence of
temporal extraction resources, it is possible to achieve remarkable temporal
locality across a diverse set of texts
- …