10,346 research outputs found
Creating a test collection to evaluate diversity in image retrieval
This paper describes the adaptation of an existing test collection
for image retrieval to enable diversity in the results set to be
measured. Previous research has shown that a more diverse set of
results often satisfies the needs of more users better than standard
document rankings. To enable diversity to be quantified, it is
necessary to classify images relevant to a given theme to one or
more sub-topics or clusters. We describe the challenges in
building (as far as we are aware) the first test collection for
evaluating diversity in image retrieval. This includes selecting
appropriate topics, creating sub-topics, and quantifying the overall
effectiveness of a retrieval system. A total of 39 topics were
augmented for cluster-based relevance and we also provide an
initial analysis of assessor agreement for grouping relevant
images into sub-topics or clusters
Deep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection
methods on a new recently introduced open dataset, which contains parallel and
comparable collections of documents with multiple characteristics (different
genres, languages and sizes of texts). We investigate cross-language plagiarism
detection methods for 6 language pairs on 2 granularities of text units in
order to draw robust conclusions on the best methods while deeply analyzing
correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable
Corpora) colocated with ACL 201
Identity and Granularity of Events in Text
In this paper we describe a method to detect event descrip- tions in
different news articles and to model the semantics of events and their
components using RDF representations. We compare these descriptions to solve a
cross-document event coreference task. Our com- ponent approach to event
semantics defines identity and granularity of events at different levels. It
performs close to state-of-the-art approaches on the cross-document event
coreference task, while outperforming other works when assuming similar quality
of event detection. We demonstrate how granularity and identity are
interconnected and we discuss how se- mantic anomaly could be used to define
differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201
- …