43,665 research outputs found
MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding
Multimodal named entity recognition (MNER) is a critical step in information
extraction, which aims to detect entity spans and classify them to
corresponding entity types given a sentence-image pair. Existing methods either
(1) obtain named entities with coarse-grained visual clues from attention
mechanisms, or (2) first detect fine-grained visual regions with toolkits and
then recognize named entities. However, they suffer from improper alignment
between entity types and visual regions or error propagation in the two-stage
manner, which finally imports irrelevant visual information into texts. In this
paper, we propose a novel end-to-end framework named MNER-QG that can
simultaneously perform MRC-based multimodal named entity recognition and query
grounding. Specifically, with the assistance of queries, MNER-QG can provide
prior knowledge of entity types and visual regions, and further enhance
representations of both texts and images. To conduct the query grounding task,
we provide manual annotations and weak supervisions that are obtained via
training a highly flexible visual grounding model with transfer learning. We
conduct extensive experiments on two public MNER datasets, Twitter2015 and
Twitter2017. Experimental results show that MNER-QG outperforms the current
state-of-the-art models on the MNER task, and also improves the query grounding
performance.Comment: 13 pages, 6 figures, published to AAA
Lexicon Infused Phrase Embeddings for Named Entity Resolution
Most state-of-the-art approaches for named-entity recognition (NER) use semi
supervised information in the form of word clusters and lexicons. Recently
neural network-based language models have been explored, as they as a byproduct
generate highly informative vector representations for words, known as word
embeddings. In this paper we present two contributions: a new form of learning
word embeddings that can leverage information from relevant lexicons to improve
the representations, and the first system to use neural word embeddings to
achieve state-of-the-art results on named-entity recognition in both CoNLL and
Ontonotes NER. Our system achieves an F1 score of 90.90 on the test set for
CoNLL 2003---significantly better than any previous system trained on public
data, and matching a system employing massive private industrial query-log
data.Comment: Accepted in CoNLL 201
MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information
This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives
Named entity recognition and classification in search queries
Named Entity Recognition and Classification is the task of extracting from text, instances of
different entity classes such as person, location, or company. This task has recently been
applied to web search queries in order to better understand their semantics, where a search
query consists of linguistic units that users submit to a search engine to convey their search
need. Discovering and analysing the linguistic units comprising a search query enables search
engines to reveal and meet users' search intents. As a result, recent research has concentrated
on analysing the constituent units comprising search queries. However, since search queries
are short, unstructured, and ambiguous, an approach to detect and classify named entities is
presented in this thesis, in which queries are augmented with the text snippets of search results
for search queries.
The thesis makes the following contributions:
1. A novel method for detecting candidate named entities in search queries, which utilises
both query grammatical annotation and query segmentation.
2. A novel method to classify the detected candidate entities into a set of target entity
classes, by using a seed expansion approach; the method presented exploits the representation
of the sets of contextual clues surrounding the entities in the snippets as vectors
in a common vector space.
3. An exploratory analysis of three main categories of search refiners: nouns, verbs, and
adjectives, that users often incorporate in entity-centric queries in order to further refine
the entity-related search results.
4. A taxonomy of named entities derived from a search engine query log.
By using a large commercial query log, experimental evidence is provided that the work
presented herein is competitive with the existing research in the field of entity recognition and
classification in search queries
A Proof-of-Concept for Orthographic Named Entity Correction in Spanish Voice Queries
Proceedings of: 10th International Workshop on Adaptive Multimedia Retrieval. Took place October 24-25, 2012, in Copenhaguen (Denmark).Automatic speech recognition (ASR) systems are not able to recognize entities that are not present in its vocabulary. The problem considered in this paper is the misrecognition of named entities in Spanish voice queries introducing a proof-of-concept for named entity correction that provides alternative entities to the ones incorrectly recognized or misrecognized by retrieving entities phonetically similar from a dictionary. This system is domain-dependent, using sports news, especially football news, regardless of the automatic speech recognition system used. The correction process exploits the query structure and its semantic information to detect where a named entity appears. The system finds the most suitable alternative entity from a dictionary previously generated with the existing named entities.This work has been partially supported by the Regional Government of
Madrid under the Research Network MA2VICMR (S2009/TIC-1542) and by the Spanish Center
for Industry Technological Development (CDTI, Ministry of Industry, Tourism and Trade)
through the BUSCAMEDIA Project (CEN-20091026).Publicad
End-to-End Entity Detection with Proposer and Regressor
Named entity recognition is a traditional task in natural language
processing. In particular, nested entity recognition receives extensive
attention for the widespread existence of the nesting scenario. The latest
research migrates the well-established paradigm of set prediction in object
detection to cope with entity nesting. However, the manual creation of query
vectors, which fail to adapt to the rich semantic information in the context,
limits these approaches. An end-to-end entity detection approach with proposer
and regressor is presented in this paper to tackle the issues. First, the
proposer utilizes the feature pyramid network to generate high-quality entity
proposals. Then, the regressor refines the proposals for generating the final
prediction. The model adopts encoder-only architecture and thus obtains the
advantages of the richness of query semantics, high precision of entity
localization, and easiness of model training. Moreover, we introduce the novel
spatially modulated attention and progressive refinement for further
improvement. Extensive experiments demonstrate that our model achieves advanced
performance in flat and nested NER, achieving a new state-of-the-art F1 score
of 80.74 on the GENIA dataset and 72.38 on the WeiboNER dataset
Named entity recognition and classification in search queries
Named Entity Recognition and Classification is the task of extracting from text, instances of
different entity classes such as person, location, or company. This task has recently been
applied to web search queries in order to better understand their semantics, where a search
query consists of linguistic units that users submit to a search engine to convey their search
need. Discovering and analysing the linguistic units comprising a search query enables search
engines to reveal and meet users' search intents. As a result, recent research has concentrated
on analysing the constituent units comprising search queries. However, since search queries
are short, unstructured, and ambiguous, an approach to detect and classify named entities is
presented in this thesis, in which queries are augmented with the text snippets of search results
for search queries.
The thesis makes the following contributions:
1. A novel method for detecting candidate named entities in search queries, which utilises
both query grammatical annotation and query segmentation.
2. A novel method to classify the detected candidate entities into a set of target entity
classes, by using a seed expansion approach; the method presented exploits the representation
of the sets of contextual clues surrounding the entities in the snippets as vectors
in a common vector space.
3. An exploratory analysis of three main categories of search refiners: nouns, verbs, and
adjectives, that users often incorporate in entity-centric queries in order to further refine
the entity-related search results.
4. A taxonomy of named entities derived from a search engine query log.
By using a large commercial query log, experimental evidence is provided that the work
presented herein is competitive with the existing research in the field of entity recognition and
classification in search queries
Arabic Information Retrieval: A Relevancy Assessment Survey
The paper presents a research in Arabic Information Retrieval (IR). It surveys the impact of statistical and morphological analysis of Arabic text in improving Arabic IR relevancy. We investigated the contributions of Stemming, Indexing, Query Expansion, Text Summarization (TS), Text Translation, and Named Entity Recognition (NER) in enhancing the relevancy of Arabic IR. Our survey emphasizing on the quantitative relevancy measurements provided in the surveyed publications. The paper shows that the researchers achieved significant enhancements especially in building accurate stemmers, with accuracy reaches 97%, and in measuring the impact of different indexing strategies. Query expansion and Text Translation showed positive relevancy effect. However, other tasks such as NER and TS still need more research to realize their impact on Arabic IR
- …