1,115 research outputs found
A Factoid Question Answering System for Vietnamese
In this paper, we describe the development of an end-to-end factoid question
answering system for the Vietnamese language. This system combines both
statistical models and ontology-based methods in a chain of processing modules
to provide high-quality mappings from natural language text to entities. We
present the challenges in the development of such an intelligent user interface
for an isolating language like Vietnamese and show that techniques developed
for inflectional languages cannot be applied "as is". Our question answering
system can answer a wide range of general knowledge questions with promising
accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference
Companion, Lyon, Franc
Recommended from our members
Enriching videos with light semantics
This paper describes an ongoing prototypical framework to annotate and retrieve web videos with light semantics. The proposed framework reuses many existing vocabularies along with a video model. The knowledge is captured from three different information spaces (media content, context, document). We also describe ways to extract the semantic content descriptions from the existing usergenerated content using multiple approaches of linguistic processing and Named Entity Recognition, which are later identified with DBpedia resources to establish meanings for the tags. Finally, the implemented prototype is described with multiple search interfaces and retrieval processes. Evaluation on semantic enrichment shows a considerable (50% of videos) improvement in content description
Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All
Collective entity disambiguation aims to jointly resolve multiple mentions by
linking them to their associated entities in a knowledge base. Previous works
are primarily based on the underlying assumption that entities within the same
document are highly related. However, the extend to which these mentioned
entities are actually connected in reality is rarely studied and therefore
raises interesting research questions. For the first time, we show that the
semantic relationships between the mentioned entities are in fact less dense
than expected. This could be attributed to several reasons such as noise, data
sparsity and knowledge base incompleteness. As a remedy, we introduce MINTREE,
a new tree-based objective for the entity disambiguation problem. The key
intuition behind MINTREE is the concept of coherence relaxation which utilizes
the weight of a minimum spanning tree to measure the coherence between
entities. Based on this new objective, we design a novel entity disambiguation
algorithms which we call Pair-Linking. Instead of considering all the given
mentions, Pair-Linking iteratively selects a pair with the highest confidence
at each step for decision making. Via extensive experiments, we show that our
approach is not only more accurate but also surprisingly faster than many
state-of-the-art collective linking algorithms
Entity Linking for Queries by Searching Wikipedia Sentences
We present a simple yet effective approach for linking entities in queries.
The key idea is to search sentences similar to a query from Wikipedia articles
and directly use the human-annotated entities in the similar sentences as
candidate entities for the query. Then, we employ a rich set of features, such
as link-probability, context-matching, word embeddings, and relatedness among
candidate entities as well as their related entities, to rank the candidates
under a regression based framework. The advantages of our approach lie in two
aspects, which contribute to the ranking process and final linking result.
First, it can greatly reduce the number of candidate entities by filtering out
irrelevant entities with the words in the query. Second, we can obtain the
query sensitive prior probability in addition to the static link-probability
derived from all Wikipedia articles. We conduct experiments on two benchmark
datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
dataset. Experimental results show that our method outperforms state-of-the-art
systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
dataset
NEREA: Named entity recognition and disambiguation exploiting local document repositories
In this work, we describe the design, development, and deployment of NEREA (Named Entity Recognizer for spEcific Areas), an automatic Named Entity Recognizer and Disambiguation system, developed in collaboration with professional documentalists. The aim of NEREA is to keep accurate and current information about the entities mentioned in a local repository, and then support building appropriate infoboxes, setting out the main data of these entities. It achieves a high performance thanks to the use of classification resources belonging to the local database. With this aim, the system performs tasks of named entity recognition and disambiguation by using three types of knowledge bases: local classification resources, global databases like DBpedia, and its own catalog created by NEREA. The proposed method has been validated with two different datasets and its operation has been tested in English and Spanish. The working methodology is being applied in a real environment of a media with promising results
Named Entities in the QTLeap Corpus of Online Helpdesk Interactions
In this paper we present the annotation of a corpus with named entities that are classified into semantic types and disambiguated by linking them to their corresponding entry in the Portuguese DBpedia. This corpus, QTLeap Corpus, is a multilingual collection of question and answer pairs from a chat-based helpdesk service for Information and Communication Technologies. The resulting annotated corpus is a gold-standard named entity annotated lexical resource that is useful in supporting the training and evaluation of named entity annotation and disambiguation tools for Portuguese.info:eu-repo/semantics/publishedVersio
NEED4Tweet: a Twitterbot for tweets named entity extraction and disambiguation
In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required; and the seldom entities involved. In this paper, we introduce a novel framework that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language
- …