3,779 research outputs found
University of Twente at the TREC 2007 Enterprise Track : modeling relevance propagation for the expert search task
This paper describes several approaches which we used for the expert search task of the TREC 2007 Enterprise track.\ud
We studied several methods of relevance propagation from documents to related candidate experts. Instead of one-step propagation from documents to directly related candidates, used by many systems in the previous years, we do not limit the relevance flow and disseminate it further through mutual documents-candidates connections. We model relevance propagation using random walk principles, or in formal terms, discrete Markov processes. We experiment with\ud
innite and nite number of propagation steps. We also demonstrate how additional information, namely hyperlinks among documents, organizational structure of the enterprise and relevance feedback may be utilized by the presented techniques
Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding
Modern expert nding algorithms are developed under the
assumption that all possible expertise evidence for a person
is concentrated in a company that currently employs the
person. The evidence that can be acquired outside of an
enterprise is traditionally unnoticed. At the same time, the
Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only
Unsupervised, Efficient and Semantic Expertise Retrieval
We introduce an unsupervised discriminative model for the task of retrieving
experts in online document collections. We exclusively employ textual evidence
and avoid explicit feature engineering by learning distributed word
representations in an unsupervised way. We compare our model to
state-of-the-art unsupervised statistical vector space and probabilistic
generative approaches. Our proposed log-linear model achieves the retrieval
performance levels of state-of-the-art document-centric methods with the low
inference cost of so-called profile-centric approaches. It yields a
statistically significant improved ranking over vector space and generative
models in most cases, matching the performance of supervised methods on various
benchmarks. That is, by using solely text we can do as well as methods that
work with external evidence and/or relevance feedback. A contrastive analysis
of rankings produced by discriminative and generative approaches shows that
they have complementary strengths due to the ability of the unsupervised
discriminative model to perform semantic matching.Comment: WWW2016, Proceedings of the 25th International Conference on World
Wide Web. 201
Recommended from our members
Integrating multiple document features in language models for expert finding
We argue that expert finding is sensitive to multiple document features in an organizational intranet. These document features include multiple levels of associations between experts and a query topic from sentence, paragraph, up to document levels, document authority information such as the PageRank, indegree, and URL length of documents, and internal document structures that indicate the experts' relationship with the content of documents. Our assumption is that expert finding can largely benefit from the incorporation of these document features. However, existing language modeling approaches for expert finding have not sufficiently taken into account these document features. We propose a novel language modeling approach, which integrates multiple document features, for expert finding. Our experiments on two large scale TREC Enterprise Track datasets, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios. Our main contribution is to develop an effective formal method for modeling multiple document features in expert finding, and conduct a systematic investigation of their effects. It is worth noting that our novel approach achieves better results in terms of MAP than previous language model based approaches and the best automatic runs in both the TREC2006 and TREC2007 expert search tasks, respectively
Design Patterns for Fusion-Based Object Retrieval
We address the task of ranking objects (such as people, blogs, or verticals)
that, unlike documents, do not have direct term-based representations. To be
able to match them against keyword queries, evidence needs to be amassed from
documents that are associated with the given object. We present two design
patterns, i.e., general reusable retrieval strategies, which are able to
encompass most existing approaches from the past. One strategy combines
evidence on the term level (early fusion), while the other does it on the
document level (late fusion). We demonstrate the generality of these patterns
by applying them to three different object retrieval tasks: expert finding,
blog distillation, and vertical ranking.Comment: Proceedings of the 39th European conference on Advances in
Information Retrieval (ECIR '17), 201
Quantitative modelling of the human–Earth System a new kind of science?
The five grand challenges set out for Earth System Science by the International Council for Science in 2010 require a true fusion of social science, economics and natural science—a fusion that has not yet been achieved. In this paper we propose that constructing quantitative models of the dynamics of the human–Earth system can serve as a catalyst for this fusion. We confront well-known objections to modelling societal dynamics by drawing lessons from the development of natural science over the last four centuries and applying them to social and economic science. First, we pose three questions that require real integration of the three fields of science. They concern the coupling of physical planetary boundaries via social processes; the extension of the concept of planetary boundaries to the human–Earth System; and the possibly self-defeating nature of the United Nation’s Millennium Development Goals. Second, we ask whether there are regularities or ‘attractors’ in the human–Earth System analogous to those that prompted the search for laws of nature. We nominate some candidates and discuss why we should observe them given that human actors with foresight and intentionality play a fundamental role in the human–Earth System. We conclude that, at sufficiently large time and space scales, social processes are predictable in some sense. Third, we canvass some essential mathematical techniques that this research fusion must incorporate, and we ask what kind of data would be needed to validate or falsify our models. Finally, we briefly review the state of the art in quantitative modelling of the human–Earth System today and highlight a gap between so-called integrated assessment models applied at regional and global scale, which could be filled by a new scale of model
Modeling document features for expert finding
We argue that expert finding is sensitive to multiple document features in an organization, and therefore, can benefit from the incorporation of these document features. We propose a unified language model, which integrates multiple document features, namely, multiple levels of associations, PageRank, indegree, internal document structure, and URL length. Our experiments on two TREC Enterprise Track collections, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios
Enhancing Content-And-Structure Information Retrieval using a Native XML Database
Three approaches to content-and-structure XML retrieval are analysed in this
paper: first by using Zettair, a full-text information retrieval system; second
by using eXist, a native XML database, and third by using a hybrid XML
retrieval system that uses eXist to produce the final answers from likely
relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics
can be classified in two categories: the first retrieving full articles as
final answers, and the second retrieving more specific elements within articles
as final answers. We show that for both topic categories our initial hybrid
system improves the retrieval effectiveness of a native XML database. For
ranking the final answer elements, we propose and evaluate a novel retrieval
model that utilises the structural relationships between the answer elements of
a native XML database and retrieves Coherent Retrieval Elements. The final
results of our experiments show that when the XML retrieval task focusses on
highly relevant elements our hybrid XML retrieval system with the Coherent
Retrieval Elements module is 1.8 times more effective than Zettair and 3 times
more effective than eXist, and yields an effective content-and-structure XML
retrieval
Sensor Search Techniques for Sensing as a Service Architecture for The Internet of Things
The Internet of Things (IoT) is part of the Internet of the future and will
comprise billions of intelligent communicating "things" or Internet Connected
Objects (ICO) which will have sensing, actuating, and data processing
capabilities. Each ICO will have one or more embedded sensors that will capture
potentially enormous amounts of data. The sensors and related data streams can
be clustered physically or virtually, which raises the challenge of searching
and selecting the right sensors for a query in an efficient and effective way.
This paper proposes a context-aware sensor search, selection and ranking model,
called CASSARAM, to address the challenge of efficiently selecting a subset of
relevant sensors out of a large set of sensors with similar functionality and
capabilities. CASSARAM takes into account user preferences and considers a
broad range of sensor characteristics, such as reliability, accuracy, location,
battery life, and many more. The paper highlights the importance of sensor
search, selection and ranking for the IoT, identifies important characteristics
of both sensors and data capture processes, and discusses how semantic and
quantitative reasoning can be combined together. This work also addresses
challenges such as efficient distributed sensor search and
relational-expression based filtering. CASSARAM testing and performance
evaluation results are presented and discussed.Comment: IEEE sensors Journal, 2013. arXiv admin note: text overlap with
arXiv:1303.244
- …