198 research outputs found
Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding
Modern expert nding algorithms are developed under the
assumption that all possible expertise evidence for a person
is concentrated in a company that currently employs the
person. The evidence that can be acquired outside of an
enterprise is traditionally unnoticed. At the same time, the
Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only
CORE: a tool for collaborative ontology reuse and evaluation
Ontology evaluation can be defined as assessing the quality and the adequacy of an ontology for being used in a specific context, for a specific goal. In this work, a tool for Collaborative Ontology Reuse and Evaluation (CORE) is presented. The system receives an informal description of a semantic domain and determines which ontologies, from an ontology repository, are the most appropriate to describe the given domain. For this task, the environment is divided into three main modules. The first component receives the problem description represented as a set of terms and allows the user to refine and enlarge it using WordNet. The second module applies multiple automatic criteria to evaluate the ontologies of the repository and determine which ones fit best the problem description. A ranked list of ontologies is returned for each criterion, and the lists are combined by means of rank fusion techniques that combine the selected criteria. A third component of the system uses manual user evaluations of the ontologies in order to incorporate a human, collaborative assessment of the quality of ontologies
Bilingual Lexicon Extraction from Comparable Corpora as Metasearch
International audienceIn this article we present a novel way of looking at the problem of automatic acquisition of pairs of translationally equivalent words from comparable corpora. We first present the standard and extended approaches traditionally dedicated to this task. We then reinterpret the extended method, and motivate a novel model to reformulate this approach inspired by the metasearch engines in information retrieval. The empirical results show that performances of our model are always better than the baseline obtained with the extended approach and also competitive with the standard approach
Online Forum Thread Retrieval using Pseudo Cluster Selection and Voting Techniques
Online forums facilitate knowledge seeking and sharing on the Web. However,
the shared knowledge is not fully utilized due to information overload. Thread
retrieval is one method to overcome information overload. In this paper, we
propose a model that combines two existing approaches: the Pseudo Cluster
Selection and the Voting Techniques. In both, a retrieval system first scores a
list of messages and then ranks threads by aggregating their scored messages.
They differ on what and how to aggregate. The pseudo cluster selection focuses
on input, while voting techniques focus on the aggregation method. Our combined
models focus on the input and the aggregation methods. The result shows that
some combined models are statistically superior to baseline methods.Comment: The original publication is available at
http://www.springerlink.com/. arXiv admin note: substantial text overlap with
arXiv:1212.533
On-line Metasearch, Pooling, and System Evaluation
This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements. The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling is presented with possible application to a broad range of practical and theoretical contexts
CORE: A tool for Collaborative Ontology Reuse and Evaluation
Also published online by CEUR Workshop Proceedings (CEUR-WS.org, ISSN 1613-0073) Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge, CKC 2007In this work, we present an extension of CORE [8], a tool for
Collaborative Ontology Reuse and Evaluation. The system receives
an informal description of a specific semantic domain and
determines which ontologies from a repository are the most
appropriate to describe the given domain. For this task, the
environment is divided into three modules. The first component
receives the problem description as a set of terms, and allows the
user to refine and enlarge it using WordNet. The second module
applies multiple automatic criteria to evaluate the ontologies of the
repository, and determines which ones fit best the problem
description. A ranked list of ontologies is returned for each criterion,
and the lists are combined by means of rank fusion techniques.
Finally, the third component uses manual user evaluations in order
to incorporate a human, collaborative assessment of the ontologies.
The new version of the system incorporates several novelties, such
as its implementation as a web application; the incorporation of a
NLP module to manage the problem definitions; modifications on
the automatic ontology retrieval strategies; and a collaborative
framework to find potential relevant terms according to previous
user queries. Finally, we present some early experiments on
ontology retrieval and evaluation, showing the benefits of our system.This research was supported by the Spanish Ministry of Science
and Education (TIN2005-06885 and FPU program)
Using historical data to enhance rank aggregation
Rank aggregation is a pervading operation in IR technology. We hypothesize that the performance of score-based aggregation may be affected by artificial, usually meaningless deviations consistently occurring in the input score distributions, which distort the combined result when the individual biases differ from each other. We propose a score-based rank aggregation model where the source scores are normalized to a common distribution before being combined. Early experiments on available data from several TREC collections are shown to support our proposal
Active Sampling for Large-scale Information Retrieval Evaluation
Evaluation is crucial in Information Retrieval. The development of models,
tools and methods has significantly benefited from the availability of reusable
test collections formed through a standardized and thoroughly tested
methodology, known as the Cranfield paradigm. Constructing these collections
requires obtaining relevance judgments for a pool of documents, retrieved by
systems participating in an evaluation task; thus involves immense human labor.
To alleviate this effort different methods for constructing collections have
been proposed in the literature, falling under two broad categories: (a)
sampling, and (b) active selection of documents. The former devises a smart
sampling strategy by choosing only a subset of documents to be assessed and
inferring evaluation measure on the basis of the obtained sample; the sampling
distribution is being fixed at the beginning of the process. The latter
recognizes that systems contributing documents to be judged vary in quality,
and actively selects documents from good systems. The quality of systems is
measured every time a new document is being judged. In this paper we seek to
solve the problem of large-scale retrieval evaluation combining the two
approaches. We devise an active sampling method that avoids the bias of the
active selection methods towards good systems, and at the same time reduces the
variance of the current sampling approaches by placing a distribution over
systems, which varies as judgments become available. We validate the proposed
method using TREC data and demonstrate the advantages of this new method
compared to past approaches
- …