130 research outputs found
Coping with noise in a real-world weblog crawler and retrieval system
In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and dis- cover that the time-interval between crawls is more impor- tant to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself
The right expert at the right time and place: From expertise identification to expertise selection
We propose a unified and complete solution for expert finding in organizations, including not only expertise identification, but also expertise selection functionality. The latter two include the use of implicit and explicit preferences of users on meeting each other, as well as localization and planning as important auxiliary processes. We also propose a solution for privacy protection, which is urgently required in view of the huge amount of privacy sensitive data involved. Various parts are elaborated elsewhere, and we look forward to a realization and usage of the proposed system as a whole
A derivational rephrasing experiment for question answering
In Knowledge Management, variations in information expressions have proven a
real challenge. In particular, classical semantic relations (e.g. synonymy) do
not connect words with different parts-of-speech. The method proposed tries to
address this issue. It consists in building a derivational resource from a
morphological derivation tool together with derivational guidelines from a
dictionary in order to store only correct derivatives. This resource, combined
with a syntactic parser, a semantic disambiguator and some derivational
patterns, helps to reformulate an original sentence while keeping the initial
meaning in a convincing manner This approach has been evaluated in three
different ways: the precision of the derivatives produced from a lemma; its
ability to provide well-formed reformulations from an original sentence,
preserving the initial meaning; its impact on the results coping with a real
issue, ie a question answering task . The evaluation of this approach through a
question answering system shows the pros and cons of this system, while
foreshadowing some interesting future developments
- …