176 research outputs found

    University of Twente at the TREC 2008 Enterprise Track: using the Global Web as an expertise evidence source

    Get PDF
    This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track.\ud This is the fourth (and the last) year of TREC 2007 Enterprise Track and the second year the University of Twente (Database group) submitted runs for the expert nding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies\ud complimentary to our recent research [8] that demonstrated how taking the web factor seriously signicantly improves the performance of expert nding in the enterprise

    Using the Global Web as an Expertise Evidence Source

    Get PDF
    This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track. The presented study demonstrates the predicting potential of the expertise evidence that can be found outside of the organization. We discovered that combining the ranking built solely on the Enterprise data with the Global Web based ranking may produce significant increases in performance. However, our main goal was to explore whether this result can be further improved by using various quality measures to distinguish among web result items. While, indeed, it was beneficial to use some of these measures, especially those measuring relevance of URL strings and titles, it stayed unclear whether they are decisively important

    Using historical data to enhance rank aggregation

    Get PDF
    Rank aggregation is a pervading operation in IR technology. We hypothesize that the performance of score-based aggregation may be affected by artificial, usually meaningless deviations consistently occurring in the input score distributions, which distort the combined result when the individual biases differ from each other. We propose a score-based rank aggregation model where the source scores are normalized to a common distribution before being combined. Early experiments on available data from several TREC collections are shown to support our proposal

    Applying Data Fusion Methods to Passage Retrieval in QAS

    Get PDF

    On-line Metasearch, Pooling, and System Evaluation

    Get PDF
    This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements. The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling is presented with possible application to a broad range of practical and theoretical contexts

    Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding

    Get PDF
    Modern expert nding algorithms are developed under the assumption that all possible expertise evidence for a person is concentrated in a company that currently employs the person. The evidence that can be acquired outside of an enterprise is traditionally unnoticed. At the same time, the Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only

    PI SA: A PERSONALIZED INFORMATION SEARCH ASSISTANT

    Get PDF
    A common characteristic of most of the traditional search and retrieval systems is that they are oriented towards a generic user, often failing in connecting people with what they are really looking for. In this paper we present PI SA, a Personalized Information Search Assistant, which, rather than relying on the unrealistic assumption that the user will precisely specify what she is really looking for when searching, leverages implicit information about the user\u27s interests. PI SA is a desktop application which provides the user with a highly personalized information space where she can create, manage and organize folders (similarly to email programs), and manage documents retrieved by the system into her folders to best fit her needs. Furthermore, PI SA offers different mechanisms to search the Web, and the possibility of personalizing result delivery and visualization. PI SA learns user and folder profiles from user\u27s choices, and uses these profiles to improve retrieval effectiveness in searching by selecting the relevant resources to query and filtering the results accordingly. A working prototype has been also developed, tested and evaluated. Preliminary user evaluation and experimental results are very promising, showing that the personalized search environment PI SA provides considerably increases effectiveness and user satisfaction in the searching process

    Ranked feature fusion models for ad hoc retrieval

    Full text link
    corecore