74 research outputs found
University of Twente at the TREC 2008 Enterprise Track: using the Global Web as an expertise evidence source
This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track.\ud
This is the fourth (and the last) year of TREC 2007 Enterprise Track and the second year the University of Twente (Database group) submitted runs for the expert nding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies\ud
complimentary to our recent research [8] that demonstrated how taking the web factor seriously signicantly improves the performance of expert nding in the enterprise
University of Twente at the TREC 2007 Enterprise Track : modeling relevance propagation for the expert search task
This paper describes several approaches which we used for the expert search task of the TREC 2007 Enterprise track.\ud
We studied several methods of relevance propagation from documents to related candidate experts. Instead of one-step propagation from documents to directly related candidates, used by many systems in the previous years, we do not limit the relevance flow and disseminate it further through mutual documents-candidates connections. We model relevance propagation using random walk principles, or in formal terms, discrete Markov processes. We experiment with\ud
innite and nite number of propagation steps. We also demonstrate how additional information, namely hyperlinks among documents, organizational structure of the enterprise and relevance feedback may be utilized by the presented techniques
Using the Global Web as an Expertise Evidence Source
This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track. The presented study demonstrates the predicting potential of the expertise evidence that can be found outside of
the organization. We discovered that combining the ranking built solely on the Enterprise data with the Global Web
based ranking may produce significant increases in performance. However, our main goal was to explore whether
this result can be further improved by using various quality measures to distinguish among web result items. While,
indeed, it was beneficial to use some of these measures, especially those measuring relevance of URL strings and titles,
it stayed unclear whether they are decisively important
Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding
Modern expert nding algorithms are developed under the
assumption that all possible expertise evidence for a person
is concentrated in a company that currently employs the
person. The evidence that can be acquired outside of an
enterprise is traditionally unnoticed. At the same time, the
Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only
Design Patterns for Fusion-Based Object Retrieval
We address the task of ranking objects (such as people, blogs, or verticals)
that, unlike documents, do not have direct term-based representations. To be
able to match them against keyword queries, evidence needs to be amassed from
documents that are associated with the given object. We present two design
patterns, i.e., general reusable retrieval strategies, which are able to
encompass most existing approaches from the past. One strategy combines
evidence on the term level (early fusion), while the other does it on the
document level (late fusion). We demonstrate the generality of these patterns
by applying them to three different object retrieval tasks: expert finding,
blog distillation, and vertical ranking.Comment: Proceedings of the 39th European conference on Advances in
Information Retrieval (ECIR '17), 201
Recommended from our members
Integrating multiple document features in language models for expert finding
We argue that expert finding is sensitive to multiple document features in an organizational intranet. These document features include multiple levels of associations between experts and a query topic from sentence, paragraph, up to document levels, document authority information such as the PageRank, indegree, and URL length of documents, and internal document structures that indicate the experts' relationship with the content of documents. Our assumption is that expert finding can largely benefit from the incorporation of these document features. However, existing language modeling approaches for expert finding have not sufficiently taken into account these document features. We propose a novel language modeling approach, which integrates multiple document features, for expert finding. Our experiments on two large scale TREC Enterprise Track datasets, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios. Our main contribution is to develop an effective formal method for modeling multiple document features in expert finding, and conduct a systematic investigation of their effects. It is worth noting that our novel approach achieves better results in terms of MAP than previous language model based approaches and the best automatic runs in both the TREC2006 and TREC2007 expert search tasks, respectively
The Open University at TREC 2007 Enterprise Track
The Multimedia and Information Systems group at the Knowledge Media Institute of the Open University participated in the Expert Search and Document Search tasks of the Enterprise Track in TREC 2007. In both the document and expert search tasks, we have studied the effect of anchor texts in addition to document contents, document authority, url length, query expansion, and relevance feedback in improving search effectiveness. In the expert search task, we have continued using a two-stage language model consisting of a document relevance and cooccurrence models. The document relevance model is equivalent to our approach in the document search task. We have used our innovative multiple-window-based cooccurrence approach. The assumption is that there are multiple levels of associations between an expert and his/her expertise. Our experimental results show that the introduction of additional features in addition to document contents has improved the retrieval effectiveness
Unsupervised, Efficient and Semantic Expertise Retrieval
We introduce an unsupervised discriminative model for the task of retrieving
experts in online document collections. We exclusively employ textual evidence
and avoid explicit feature engineering by learning distributed word
representations in an unsupervised way. We compare our model to
state-of-the-art unsupervised statistical vector space and probabilistic
generative approaches. Our proposed log-linear model achieves the retrieval
performance levels of state-of-the-art document-centric methods with the low
inference cost of so-called profile-centric approaches. It yields a
statistically significant improved ranking over vector space and generative
models in most cases, matching the performance of supervised methods on various
benchmarks. That is, by using solely text we can do as well as methods that
work with external evidence and/or relevance feedback. A contrastive analysis
of rankings produced by discriminative and generative approaches shows that
they have complementary strengths due to the ability of the unsupervised
discriminative model to perform semantic matching.Comment: WWW2016, Proceedings of the 25th International Conference on World
Wide Web. 201
Entity finding in a document collection using adaptive window sizes
Traditional search engines work by returning a list of documents in response to queries. However, such engines are often inadequate when the information need of the user involves entities. This issue has led to the development of entity-search, which unlike normal web search does not aim at returning documents but names of people, products, organisations, etc. Some of the most successful methods for identifying relevant entities were built around the idea of a proximity search. In this thesis, we present an adaptive, well-founded, general-purpose entity finding model. In contrast to the work of other researchers, where the size of the targeted part of the document (i.e., the window size) is fixed across the collection, our method uses a number of document features to calculate an adaptive window size for each document in the collection. We construct a new entity finding test collection called the ESSEX test collection for use in evaluating our method. This collection represents a university setting as the data was collected from the publicly accessible webpages of the University of Essex.
We test our method on five different datasets including the W3C Dataset, CERC Dataset, UvT/TU Datasets, ESSEX dataset and the ClueWeb09 entity finding collection. Our method provides a considerable improvement over various baseline models on all of these datasets. We also find that the document features considered for the calculation of the window size have differing impacts on the performance of the search. These impacts depend on the structure of the documents and the document language.
As users may have a variety of search requirements, we show that our method is adaptable to different applications, environments, types of named entities and document collections
- …