263 research outputs found
Fixed versus Dynamic Co-Occurrence Windows in TextRank Term Weights for Information Retrieval
TextRank is a variant of PageRank typically used in graphs that represent
documents, and where vertices denote terms and edges denote relations between
terms. Quite often the relation between terms is simple term co-occurrence
within a fixed window of k terms. The output of TextRank when applied
iteratively is a score for each vertex, i.e. a term weight, that can be used
for information retrieval (IR) just like conventional term frequency based term
weights. So far, when computing TextRank term weights over co- occurrence
graphs, the window of term co-occurrence is al- ways ?xed. This work departs
from this, and considers dy- namically adjusted windows of term co-occurrence
that fol- low the document structure on a sentence- and paragraph- level. The
resulting TextRank term weights are used in a ranking function that re-ranks
1000 initially returned search results in order to improve the precision of the
ranking. Ex- periments with two IR collections show that adjusting the vicinity
of term co-occurrence when computing TextRank term weights can lead to gains in
early precision
Closing the loop: assisting archival appraisal and information retrieval in one sweep
In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval
A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection
Compositionality in language refers to how much the meaning of some phrase
can be decomposed into the meaning of its constituents and the way these
constituents are combined. Based on the premise that substitution by synonyms
is meaning-preserving, compositionality can be approximated as the semantic
similarity between a phrase and a version of that phrase where words have been
replaced by their synonyms. Different ways of representing such phrases exist
(e.g., vectors [1] or language models [2]), and the choice of representation
affects the measurement of semantic similarity.
We propose a new compositionality detection method that represents phrases as
ranked lists of term weights. Our method approximates the semantic similarity
between two ranked list representations using a range of well-known distance
and correlation metrics. In contrast to most state-of-the-art approaches in
compositionality detection, our method is completely unsupervised. Experiments
with a publicly available dataset of 1048 human-annotated phrases shows that,
compared to strong supervised baselines, our approach provides superior
measurement of compositionality using any of the distance and correlation
metrics considered
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
The Evolution of Web Search User Interfaces -- An Archaeological Analysis of Google Search Engine Result Pages
Web search engines have marked everyone's life by transforming how one
searches and accesses information. Search engines give special attention to the
user interface, especially search engine result pages (SERP). The well-known
''10 blue links'' list has evolved into richer interfaces, often personalized
to the search query, the user, and other aspects. More than 20 years later, the
literature has not adequately portrayed this development. We present a study on
the evolution of SERP interfaces during the last two decades using Google
Search as a case study. We used the most searched queries by year to extract a
sample of SERP from the Internet Archive. Using this dataset, we analyzed how
SERP evolved in content, layout, design (e.g., color scheme, text styling,
graphics), navigation, and file size. We have also analyzed the user interface
design patterns associated with SERP elements. We found that SERP are becoming
more diverse in terms of elements, aggregating content from different verticals
and including more features that provide direct answers. This systematic
analysis portrays evolution trends in search engine user interfaces and, more
generally, web design. We expect this work will trigger other, more specific
studies that can take advantage of our dataset.Comment: 10 pages, Full Paper of CHIIR 202
- …