2 research outputs found
Extraction and evaluation of candidate named entities in search engine queries
Named Entity Recognition (NER) has recently been applied to search queries, in order to better understand their semantics. We present a novel method for detecting candidate named entities (NEs) using grammar annotation and query segmentation with the aid of top-n snippets from search engine results, and a web n-gram model to accurately identify NE boundaries. We then evaluate this method automatically using DBpedia as a rich data source of NEs, with the aid of a small representative random sample that is manually annotated. Finally, an analysis of the types of named entities that often occur in a query log is conducted, from which a search query driven named entity taxonomy is presented
Named entity recognition and classification in search queries
Named Entity Recognition and Classification is the task of extracting from text, instances of
different entity classes such as person, location, or company. This task has recently been
applied to web search queries in order to better understand their semantics, where a search
query consists of linguistic units that users submit to a search engine to convey their search
need. Discovering and analysing the linguistic units comprising a search query enables search
engines to reveal and meet users' search intents. As a result, recent research has concentrated
on analysing the constituent units comprising search queries. However, since search queries
are short, unstructured, and ambiguous, an approach to detect and classify named entities is
presented in this thesis, in which queries are augmented with the text snippets of search results
for search queries.
The thesis makes the following contributions:
1. A novel method for detecting candidate named entities in search queries, which utilises
both query grammatical annotation and query segmentation.
2. A novel method to classify the detected candidate entities into a set of target entity
classes, by using a seed expansion approach; the method presented exploits the representation
of the sets of contextual clues surrounding the entities in the snippets as vectors
in a common vector space.
3. An exploratory analysis of three main categories of search refiners: nouns, verbs, and
adjectives, that users often incorporate in entity-centric queries in order to further refine
the entity-related search results.
4. A taxonomy of named entities derived from a search engine query log.
By using a large commercial query log, experimental evidence is provided that the work
presented herein is competitive with the existing research in the field of entity recognition and
classification in search queries