130 research outputs found

    Coping with noise in a real-world weblog crawler and retrieval system

    Get PDF
    In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and dis- cover that the time-interval between crawls is more impor- tant to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself

    The right expert at the right time and place: From expertise identification to expertise selection

    Get PDF
    We propose a unified and complete solution for expert finding in organizations, including not only expertise identification, but also expertise selection functionality. The latter two include the use of implicit and explicit preferences of users on meeting each other, as well as localization and planning as important auxiliary processes. We also propose a solution for privacy protection, which is urgently required in view of the huge amount of privacy sensitive data involved. Various parts are elaborated elsewhere, and we look forward to a realization and usage of the proposed system as a whole

    A derivational rephrasing experiment for question answering

    Get PDF
    In Knowledge Management, variations in information expressions have proven a real challenge. In particular, classical semantic relations (e.g. synonymy) do not connect words with different parts-of-speech. The method proposed tries to address this issue. It consists in building a derivational resource from a morphological derivation tool together with derivational guidelines from a dictionary in order to store only correct derivatives. This resource, combined with a syntactic parser, a semantic disambiguator and some derivational patterns, helps to reformulate an original sentence while keeping the initial meaning in a convincing manner This approach has been evaluated in three different ways: the precision of the derivatives produced from a lemma; its ability to provide well-formed reformulations from an original sentence, preserving the initial meaning; its impact on the results coping with a real issue, ie a question answering task . The evaluation of this approach through a question answering system shows the pros and cons of this system, while foreshadowing some interesting future developments
    corecore