44,394 research outputs found
Mining web data for competency management
We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We
discuss the problems associated with evaluating
unsupervised learners and report our initial evaluation
experiments
Relation Discovery from Web Data for Competency Management
This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006
Document Filtering for Long-tail Entities
Filtering relevant documents with respect to entities is an essential task in
the context of knowledge base construction and maintenance. It entails
processing a time-ordered stream of documents that might be relevant to an
entity in order to select only those that contain vital information.
State-of-the-art approaches to document filtering for popular entities are
entity-dependent: they rely on and are also trained on the specifics of
differentiating features for each specific entity. Moreover, these approaches
tend to use so-called extrinsic information such as Wikipedia page views and
related entities which is typically only available only for popular head
entities. Entity-dependent approaches based on such signals are therefore
ill-suited as filtering methods for long-tail entities. In this paper we
propose a document filtering method for long-tail entities that is
entity-independent and thus also generalizes to unseen or rarely seen entities.
It is based on intrinsic features, i.e., features that are derived from the
documents in which the entities are mentioned. We propose a set of features
that capture informativeness, entity-saliency, and timeliness. In particular,
we introduce features based on entity aspect similarities, relation patterns,
and temporal expressions and combine these with standard features for document
filtering. Experiments following the TREC KBA 2014 setup on a publicly
available dataset show that our model is able to improve the filtering
performance for long-tail entities over several baselines. Results of applying
the model to unseen entities are promising, indicating that the model is able
to learn the general characteristics of a vital document. The overall
performance across all entities---i.e., not just long-tail entities---improves
upon the state-of-the-art without depending on any entity-specific training
data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on
Information and Knowledge Management. 201
- …