5 research outputs found

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    Integrating multiple windows and document features for expert finding

    Get PDF
    Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures

    Distributed Contact and Identity Management

    Get PDF
    Contact management is a twofold problem involving a local and global level where the separation between them is rather fuzzy. Locally, users need to deal with contact management, which refers to a local need to store, organize, maintain up to date, and find information that will allow them contacting or reaching other people, organizations, etc. Globally, users deal with identity management that refers to peers having multiple identities (i.e., profiles) and the need of staying in control of them. In other words, they should be able to manage what information is shared and with whom. We believe many existing applications try to deal with this problem looking only at the data level and without analyzing the underlying complexity. Our approach focus on the complex social relations and interactions between users, identifying three main subproblem: (i) management of identity, (ii) search, and (iii) privacy. The solution we propose concentrates on the models that are needed to address these problems. In particular, we propose a Distributed Contact Management System (DCM System) that: Models and represents the knowledge of peers about physical or abstract objects through the notion of entities that can be of different types (e.g., locations, people, events, facilities, organizations, etc.) and are described by a set of attributes; By representing contacts as entities, allows peers to locally organize their contacts taking into consideration the semantics of the contact’s characteristics; By describing peers as entities allows them to manage their different identities in the network, by sharing different views of themselves (showing possibly different in- formation) with different people. The contributions of this thesis are, (i) the definition of a reference architecture that allows dealing with the diversity in relation with the partial view that peers have of the world, (ii) an approach to search entities based on identifiers, (iii) an approach to search entities based on descriptions, and (iv) the definition of the DCM system that instantiates the previously mentioned approaches and architecture to address concrete usage scenarios

    A Supervised Learning Approach to Entity Search

    No full text
    3. Département d'informatique et de recherche opérationnelle Université de Montréal Abstract. In this paper we address the problem of entity search. Expert search and time search are used as examples. In entity search, given a query and an entity type, a search system returns a ranked list of entities in the type (e.g., person name, time expression) relevant to the query. Ranking is a key issue in entity search. In the literature, only expert search was studied and the use of cooccurrence was proposed. In general, many features may be useful for ranking in entity search. We propose using a linear model to combine the uses of different features and employing a supervised learning approach in training of the model. Experimental results on several data sets indicate that our method significantly outperforms the baseline method based solely on co-occurrences
    corecore