6,076 research outputs found
Document Filtering for Long-tail Entities
Filtering relevant documents with respect to entities is an essential task in
the context of knowledge base construction and maintenance. It entails
processing a time-ordered stream of documents that might be relevant to an
entity in order to select only those that contain vital information.
State-of-the-art approaches to document filtering for popular entities are
entity-dependent: they rely on and are also trained on the specifics of
differentiating features for each specific entity. Moreover, these approaches
tend to use so-called extrinsic information such as Wikipedia page views and
related entities which is typically only available only for popular head
entities. Entity-dependent approaches based on such signals are therefore
ill-suited as filtering methods for long-tail entities. In this paper we
propose a document filtering method for long-tail entities that is
entity-independent and thus also generalizes to unseen or rarely seen entities.
It is based on intrinsic features, i.e., features that are derived from the
documents in which the entities are mentioned. We propose a set of features
that capture informativeness, entity-saliency, and timeliness. In particular,
we introduce features based on entity aspect similarities, relation patterns,
and temporal expressions and combine these with standard features for document
filtering. Experiments following the TREC KBA 2014 setup on a publicly
available dataset show that our model is able to improve the filtering
performance for long-tail entities over several baselines. Results of applying
the model to unseen entities are promising, indicating that the model is able
to learn the general characteristics of a vital document. The overall
performance across all entities---i.e., not just long-tail entities---improves
upon the state-of-the-art without depending on any entity-specific training
data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on
Information and Knowledge Management. 201
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation
We present a Few-Shot Relation Classification Dataset (FewRel), consisting of
70, 000 sentences on 100 relations derived from Wikipedia and annotated by
crowdworkers. The relation of each sentence is first recognized by distant
supervision methods, and then filtered by crowdworkers. We adapt the most
recent state-of-the-art few-shot learning methods for relation classification
and conduct a thorough evaluation of these methods. Empirical results show that
even the most competitive few-shot learning models struggle on this task,
especially as compared with humans. We also show that a range of different
reasoning skills are needed to solve our task. These results indicate that
few-shot relation classification remains an open problem and still requires
further research. Our detailed analysis points multiple directions for future
research. All details and resources about the dataset and baselines are
released on http://zhuhao.me/fewrel.Comment: EMNLP 2018. The first four authors contribute equally. The order is
determined by dice rolling. Visit our website http://zhuhao.me/fewre
- …