74,116 research outputs found
Non-Compositional Term Dependence for Information Retrieval
Modelling term dependence in IR aims to identify co-occurring terms that are
too heavily dependent on each other to be treated as a bag of words, and to
adapt the indexing and ranking accordingly. Dependent terms are predominantly
identified using lexical frequency statistics, assuming that (a) if terms
co-occur often enough in some corpus, they are semantically dependent; (b) the
more often they co-occur, the more semantically dependent they are. This
assumption is not always correct: the frequency of co-occurring terms can be
separate from the strength of their semantic dependence. E.g. "red tape" might
be overall less frequent than "tape measure" in some corpus, but this does not
mean that "red"+"tape" are less dependent than "tape"+"measure". This is
especially the case for non-compositional phrases, i.e. phrases whose meaning
cannot be composed from the individual meanings of their terms (such as the
phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction
between the frequency and strength of term dependence in IR, we present a
principled approach for handling term dependence in queries, using both lexical
frequency and semantic evidence. We focus on non-compositional phrases,
extending a recent unsupervised model for their detection [21] to IR. Our
approach, integrated into ranking using Markov Random Fields [31], yields
effectiveness gains over competitive TREC baselines, showing that there is
still room for improvement in the very well-studied area of term dependence in
IR
Design Patterns for Fusion-Based Object Retrieval
We address the task of ranking objects (such as people, blogs, or verticals)
that, unlike documents, do not have direct term-based representations. To be
able to match them against keyword queries, evidence needs to be amassed from
documents that are associated with the given object. We present two design
patterns, i.e., general reusable retrieval strategies, which are able to
encompass most existing approaches from the past. One strategy combines
evidence on the term level (early fusion), while the other does it on the
document level (late fusion). We demonstrate the generality of these patterns
by applying them to three different object retrieval tasks: expert finding,
blog distillation, and vertical ranking.Comment: Proceedings of the 39th European conference on Advances in
Information Retrieval (ECIR '17), 201
- …