Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

Abstract

Techniques that compare short text segments using dependency paths (or simply, paths) appear in a wide range of automated language processing applications including question answering (QA). However, few models in ad hoc information retrieval (IR) use paths for document ranking due to the prohibitive cost of parsing a retrieval collection. In this paper, we introduce a flexible notion of paths that describe chains of words on a dependency path. These chains, or catenae, are readily applied in standard IR models. Informative catenae are selected using supervised machine learning with linguistically informed features and compared to both non-linguistic terms and catenae selected heuristically with filters derived from work on paths. Automatically selected catenae of 1-2 words deliver significant performance gains on three TREC collections.

    Similar works