5,795 research outputs found
A Factoid Question Answering System for Vietnamese
In this paper, we describe the development of an end-to-end factoid question
answering system for the Vietnamese language. This system combines both
statistical models and ontology-based methods in a chain of processing modules
to provide high-quality mappings from natural language text to entities. We
present the challenges in the development of such an intelligent user interface
for an isolating language like Vietnamese and show that techniques developed
for inflectional languages cannot be applied "as is". Our question answering
system can answer a wide range of general knowledge questions with promising
accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference
Companion, Lyon, Franc
The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
This paper presents a state-of-the-art system for Vietnamese Named Entity
Recognition (NER). By incorporating automatic syntactic features with word
embeddings as input for bidirectional Long Short-Term Memory (Bi-LSTM), our
system, although simpler than some deep learning architectures, achieves a much
better result for Vietnamese NER. The proposed method achieves an overall F1
score of 92.05% on the test set of an evaluation campaign, organized in late
2016 by the Vietnamese Language and Speech Processing (VLSP) community. Our
named entity recognition system outperforms the best previous systems for
Vietnamese NER by a large margin.Comment: 7 pages, 9 tables, 3 figures, accepted to PACLIC 201
The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
Large multilingual language models such as mBERT or XLM-R enable zero-shot
cross-lingual transfer in various IR and NLP tasks. Cao et al. (2020) proposed
a data- and compute-efficient method for cross-lingual adjustment of mBERT that
uses a small parallel corpus to make embeddings of related words across
languages similar to each other. They showed it to be effective in NLI for five
European languages. In contrast we experiment with a typologically diverse set
of languages (Spanish, Russian, Vietnamese, and Hindi) and extend their
original implementations to new tasks (XSR, NER, and QA) and an additional
training regime (continual learning). Our study reproduced gains in NLI for
four languages, showed improved NER, XSR, and cross-lingual QA results in three
languages (though some cross-lingual QA gains were not statistically
significant), while mono-lingual QA performance never improved and sometimes
degraded. Analysis of distances between contextualized embeddings of related
and unrelated words (across languages) showed that fine-tuning leads to
"forgetting" some of the cross-lingual alignment information. Based on this
observation, we further improved NLI performance using continual learning.Comment: Presented at ECIR 202
Ripple Down Rules for Question Answering
Recent years have witnessed a new trend of building ontology-based question
answering systems. These systems use semantic web information to produce more
precise answers to users' queries. However, these systems are mostly designed
for English. In this paper, we introduce an ontology-based question answering
system named KbQAS which, to the best of our knowledge, is the first one made
for Vietnamese. KbQAS employs our question analysis approach that
systematically constructs a knowledge base of grammar rules to convert each
input question into an intermediate representation element. KbQAS then takes
the intermediate representation element with respect to a target ontology and
applies concept-matching techniques to return an answer. On a wide range of
Vietnamese questions, experimental results show that the performance of KbQAS
is promising with accuracies of 84.1% and 82.4% for analyzing input questions
and retrieving output answers, respectively. Furthermore, our question analysis
approach can easily be applied to new domains and new languages, thus saving
time and human effort.Comment: V1: 21 pages, 7 figures, 10 tables. V2: 8 figures, 10 tables; shorten
section 2; change sections 4.3 and 5.1.2. V3: Accepted for publication in the
Semantic Web journal. V4 (Author's manuscript): camera ready version,
available from the Semantic Web journal at
http://www.semantic-web-journal.ne
- …