4,840 research outputs found
Enhancing Sensitivity Classification with Semantic Features using Word Embeddings
Government documents must be reviewed to identify any sensitive information
they may contain, before they can be released to the public. However,
traditional paper-based sensitivity review processes are not practical for reviewing
born-digital documents. Therefore, there is a timely need for automatic sensitivity
classification techniques, to assist the digital sensitivity review process.
However, sensitivity is typically a product of the relations between combinations
of terms, such as who said what about whom, therefore, automatic sensitivity
classification is a difficult task. Vector representations of terms, such as word
embeddings, have been shown to be effective at encoding latent term features
that preserve semantic relations between terms, which can also be beneficial to
sensitivity classification. In this work, we present a thorough evaluation of the
effectiveness of semantic word embedding features, along with term and grammatical
features, for sensitivity classification. On a test collection of government
documents containing real sensitivities, we show that extending text classification
with semantic features and additional term n-grams results in significant improvements
in classification effectiveness, correctly classifying 9.99% more sensitive
documents compared to the text classification baseline
Improved Relation Extraction with Feature-Rich Compositional Embedding Models
Compositional embedding models build a representation (or embedding) for a
linguistic structure based on its component word embeddings. We propose a
Feature-rich Compositional Embedding Model (FCM) for relation extraction that
is expressive, generalizes to new domains, and is easy-to-implement. The key
idea is to combine both (unlexicalized) hand-crafted features with learned word
embeddings. The model is able to directly tackle the difficulties met by
traditional compositional embeddings models, such as handling arbitrary types
of sentence annotations and utilizing global information for composition. We
test the proposed model on two relation extraction tasks, and demonstrate that
our model outperforms both previous compositional models and traditional
feature rich models on the ACE 2005 relation extraction task, and the SemEval
2010 relation classification task. The combination of our model and a
log-linear classifier with hand-crafted features gives state-of-the-art
results.Comment: 12 pages for EMNLP 201
Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing
Embedding models typically associate each word with a single real-valued
vector, representing its different properties. Evaluation methods, therefore,
need to analyze the accuracy and completeness of these properties in
embeddings. This requires fine-grained analysis of embedding subspaces.
Multi-label classification is an appropriate way to do so. We propose a new
evaluation method for word embeddings based on multi-label classification given
a word embedding. The task we use is fine-grained name typing: given a large
corpus, find all types that a name can refer to based on the name embedding.
Given the scale of entities in knowledge bases, we can build datasets for this
task that are complementary to the current embedding evaluation datasets in:
they are very large, contain fine-grained classes, and allow the direct
evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP
(RepL4NLP @ ACL2018
- …