40 research outputs found

    Unsupervised identification of synonymous query intent templates for attribute intents

    Get PDF
    ABSTRACT Among all web search queries there is an important subset of queries containing entity mentions. In these queries, it is observed that users are most interested in requesting some attribute of an entity, such as "Obama age" for the intent of age, which we refer to as the attribute intent. In this work we address the problem of identifying synonymous query intent templates for the attribute intent. For example, "how old is [Person]" and "[Person]'s age" are both synonymous templates for the age intent. Successful identification of the synonymous query intent templates not only can improve the performance of all existing query annotation approaches, but also could benefit applications such as instant answers and intent-based query suggestion. In this work we propose a clustering framework with multiple kernel functions to identify synonymous query intent templates for a set of canonical templates jointly. Furthermore, signals from multiple sources of information are integrated into a kernel function between templates, where the weights of these signals are tuned in an unsupervised manner. We have conducted extensive experiments across multiple domains in FreeBase, and results demonstrate the effectiveness of our clustering framework for finding synonymous query intent templates for attribute intents

    Named entity recognition and classification in search queries

    Get PDF
    Named Entity Recognition and Classification is the task of extracting from text, instances of different entity classes such as person, location, or company. This task has recently been applied to web search queries in order to better understand their semantics, where a search query consists of linguistic units that users submit to a search engine to convey their search need. Discovering and analysing the linguistic units comprising a search query enables search engines to reveal and meet users' search intents. As a result, recent research has concentrated on analysing the constituent units comprising search queries. However, since search queries are short, unstructured, and ambiguous, an approach to detect and classify named entities is presented in this thesis, in which queries are augmented with the text snippets of search results for search queries. The thesis makes the following contributions: 1. A novel method for detecting candidate named entities in search queries, which utilises both query grammatical annotation and query segmentation. 2. A novel method to classify the detected candidate entities into a set of target entity classes, by using a seed expansion approach; the method presented exploits the representation of the sets of contextual clues surrounding the entities in the snippets as vectors in a common vector space. 3. An exploratory analysis of three main categories of search refiners: nouns, verbs, and adjectives, that users often incorporate in entity-centric queries in order to further refine the entity-related search results. 4. A taxonomy of named entities derived from a search engine query log. By using a large commercial query log, experimental evidence is provided that the work presented herein is competitive with the existing research in the field of entity recognition and classification in search queries

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading
    corecore