Search CORE

1,115 research outputs found

A Factoid Question Answering System for Vietnamese

Author: Bui Duc-Thien
Le-Hong Phuong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality mappings from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating language like Vietnamese and show that techniques developed for inflectional languages cannot be applied "as is". Our question answering system can answer a wide range of general knowledge questions with promising accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference Companion, Lyon, Franc

arXiv.org e-Print Archive

Crossref

Recommended from our members

Enriching videos with light semantics

Author: Breslin John G.
Choudhury Smitashree
Publication venue
Publication date: 01/10/2010
Field of study

This paper describes an ongoing prototypical framework to annotate and retrieve web videos with light semantics. The proposed framework reuses many existing vocabularies along with a video model. The knowledge is captured from three different information spaces (media content, context, document). We also describe ways to extract the semantic content descriptions from the existing usergenerated content using multiple approaches of linguistic processing and Named Entity Recognition, which are later identified with DBpedia resources to establish meanings for the tags. Finally, the implemented prototype is described with multiple search interfaces and retrieval processes. Evaluation on semantic enrichment shows a considerable (50% of videos) improvement in content description

Open Research Online

Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All

Author: Han Jialong
Li Chenliang
Phan Minh C.
Sun Aixin
Tay Yi
Publication venue
Publication date: 01/01/2018
Field of study

Collective entity disambiguation aims to jointly resolve multiple mentions by linking them to their associated entities in a knowledge base. Previous works are primarily based on the underlying assumption that entities within the same document are highly related. However, the extend to which these mentioned entities are actually connected in reality is rarely studied and therefore raises interesting research questions. For the first time, we show that the semantic relationships between the mentioned entities are in fact less dense than expected. This could be attributed to several reasons such as noise, data sparsity and knowledge base incompleteness. As a remedy, we introduce MINTREE, a new tree-based objective for the entity disambiguation problem. The key intuition behind MINTREE is the concept of coherence relaxation which utilizes the weight of a minimum spanning tree to measure the coherence between entities. Based on this new objective, we design a novel entity disambiguation algorithms which we call Pair-Linking. Instead of considering all the given mentions, Pair-Linking iteratively selects a pair with the highest confidence at each step for decision making. Via extensive experiments, we show that our approach is not only more accurate but also surprisingly faster than many state-of-the-art collective linking algorithms

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Entity Linking for Queries by Searching Wikipedia Sentences

Author: Lv Weifeng
Ren Pengjie
Tan Chuanqi
Wei Furu
Zhou Ming
Publication venue
Publication date: 01/01/2017
Field of study

We present a simple yet effective approach for linking entities in queries. The key idea is to search sentences similar to a query from Wikipedia articles and directly use the human-annotated entities in the similar sentences as candidate entities for the query. Then, we employ a rich set of features, such as link-probability, context-matching, word embeddings, and relatedness among candidate entities as well as their related entities, to rank the candidates under a regression based framework. The advantages of our approach lie in two aspects, which contribute to the ranking process and final linking result. First, it can greatly reduce the number of candidate entities by filtering out irrelevant entities with the words in the query. Second, we can obtain the query sensitive prior probability in addition to the static link-probability derived from all Wikipedia articles. We conduct experiments on two benchmark datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ dataset. Experimental results show that our method outperforms state-of-the-art systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ dataset

arXiv.org e-Print Archive

Crossref

NEREA: Named entity recognition and disambiguation exploiting local document repositories

Author: Bean A.
Cardiel O.
Garrido A.L.
Gañan A.
Ilarri S.
Sangiao S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

In this work, we describe the design, development, and deployment of NEREA (Named Entity Recognizer for spEcific Areas), an automatic Named Entity Recognizer and Disambiguation system, developed in collaboration with professional documentalists. The aim of NEREA is to keep accurate and current information about the entities mentioned in a local repository, and then support building appropriate infoboxes, setting out the main data of these entities. It achieves a high performance thanks to the use of classification resources belonging to the local database. With this aim, the system performs tasks of named entity recognition and disambiguation by using three types of knowledge bases: local classification resources, global databases like DBpedia, and its own catalog created by NEREA. The proposed method has been validated with two different datasets and its operation has been tested in English and Spanish. The working methodology is being applied in a real environment of a media with promising results

Repositorio Universidad de Zaragoza

Named Entities in the QTLeap Corpus of Online Helpdesk Interactions

Author: Amaral Diana
Branco António
Carvalho Rita de
Correia Catarina
Gomes Patrícia
Neale Steven
Pereira Rita
Querido Andreia
Rodrigues João
Silva João
Publication venue: 'Associacao Portuguesa de Linguistica'
Publication date: 01/01/2016
Field of study

In this paper we present the annotation of a corpus with named entities that are classified into semantic types and disambiguated by linking them to their corresponding entry in the Portuguese DBpedia. This corpus, QTLeap Corpus, is a multilingual collection of question and answer pairs from a chat-based helpdesk service for Information and Communication Technologies. The resulting annotated corpus is a gold-standard named entity annotated lexical resource that is useful in supporting the training and evaluation of named entity annotation and disambiguation tools for Portuguese.info:eu-repo/semantics/publishedVersio

Crossref

Universidade de Lisboa: Repositório.UL

NEED4Tweet: a Twitterbot for tweets named entity extraction and disambiguation

Author: Habib Mena B.
Keulen Maurice van
Publication venue: The Association for Computer Linguistics
Publication date: 01/07/2015
Field of study

In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required; and the seldom entities involved. In this paper, we introduce a novel framework that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language

University of Twente Research Information