thesis

Interpreting and Answering Keyword Queries using Web Knowledge Bases

Abstract

Many keyword queries issued to Web search engines target information about real world entities, and interpreting these queries over Web knowledge bases can allow a search system to provide exact answers to keyword queries. Such an ability provides a useful service to end users, as their information need can be directly addressed and they need not scour textual results for the desired information. However, not all keyword queries can be addressed by even the most comprehensive knowledge base, and therefore equally important is the problem of recognizing when a reference knowledge base is not capable of modelling the keyword query's intention. This may be due to lack of coverage of the knowledge base or lack of expressiveness in the underlying query representation formalism. This thesis presents an approach to computing structured representations of keyword queries over a reference knowledge base. Keyword queries are annotated with occurrences of semantic constructs by learning a sequential labelling model from an annotated Web query log. Frequent query structures are then mined from the query log and are used along with the annotations to map keyword queries into a structured representation over the vocabulary of a reference knowledge base. The proposed approach exploits coarse linguistic structure in keyword queries, and combines it with rich structured query representations of information needs. As an intermediate representation formalism, a novel query language is proposed that blends keyword search with structured query processing over large Web knowledge bases. The formalism for structured keyword queries combines the flexibility of keyword search with the expressiveness of structures queries. A solution to the resulting disambiguation problem caused by introducing keywords as primitives in a structured query language is presented. Expressions in our proposed language are rewritten using the vocabulary of the knowledge base, and different possible rewritings are ranked based on their syntactic relationship to the keywords in the query as well as their semantic coherence in the underlying knowledge base. The problem of ranking knowledge base entities returned as a query result is also explored from the perspective of personalized result ranking. User interest models based on entity types are learned from a Web search session by cross referencing clicks on URLs with known entity homepages. The user interest model is then used to effectively rerank answer lists for a given user. A methodology for evaluating entity-based search engines is also proposed and empirically evaluated

    Similar works