2,796 research outputs found
Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining
Aspect-based opinion mining is the task of identifying sentiment at the
aspect level in opinionated text, which consists of two subtasks: aspect
category extraction and sentiment polarity classification. While aspect
category extraction aims to detect and categorize opinion targets such as
product features, sentiment polarity classification assigns a sentiment label,
i.e. positive, negative, or neutral, to each identified aspect. Supervised
learning methods have been shown to deliver better accuracy for this task but
they require labeled data, which is costly to obtain, especially for
resource-poor languages like Vietnamese. To address this problem, we present a
supervised aspect-based opinion mining method that utilizes labeled data from a
foreign language (English in this case), which is translated to Vietnamese by
an automated translation tool (Google Translate). Because aspects and opinions
in different languages may be expressed by different words, we propose using
word embeddings, in addition to other features, to reduce the vocabulary
difference between the original and translated texts, thus improving the
effectiveness of aspect category extraction and sentiment polarity
classification processes. We also introduce an annotated corpus of aspect
categories and sentiment polarities extracted from restaurant reviews in
Vietnamese, and conduct a series of experiments on the corpus. Experimental
results demonstrate the effectiveness of the proposed approach
Vietnamese Semantic Role Labelling
In this paper, we study semantic role labelling (SRL), a subtask of semantic
parsing of natural language sentences and its application for the Vietnamese
language. We present our effort in building Vietnamese PropBank, the first
Vietnamese SRL corpus and a software system for labelling semantic roles of
Vietnamese texts. In particular, we present a novel constituent extraction
algorithm in the argument candidate identification step which is more suitable
and more accurate than the common node-mapping method. In the machine learning
part, our system integrates distributed word features produced by two recent
unsupervised learning models in two learned statistical classifiers and makes
use of integer linear programming inference procedure to improve the accuracy.
The system is evaluated in a series of experiments and achieves a good result,
an score of 74.77%. Our system, including corpus and software, is
available as an open source project for free research and we believe that it is
a good baseline for the development of future Vietnamese SRL systems.Comment: Accepted to the VNU Journal of Scienc
ViCGCN: Graph Convolutional Network with Contextualized Language Models for Social Media Mining in Vietnamese
Social media processing is a fundamental task in natural language processing
with numerous applications. As Vietnamese social media and information science
have grown rapidly, the necessity of information-based mining on Vietnamese
social media has become crucial. However, state-of-the-art research faces
several significant drawbacks, including imbalanced data and noisy data on
social media platforms. Imbalanced and noisy are two essential issues that need
to be addressed in Vietnamese social media texts. Graph Convolutional Networks
can address the problems of imbalanced and noisy data in text classification on
social media by taking advantage of the graph structure of the data. This study
presents a novel approach based on contextualized language model (PhoBERT) and
graph-based method (Graph Convolutional Networks). In particular, the proposed
approach, ViCGCN, jointly trained the power of Contextualized embeddings with
the ability of Graph Convolutional Networks, GCN, to capture more syntactic and
semantic dependencies to address those drawbacks. Extensive experiments on
various Vietnamese benchmark datasets were conducted to verify our approach.
The observation shows that applying GCN to BERTology models as the final layer
significantly improves performance. Moreover, the experiments demonstrate that
ViCGCN outperforms 13 powerful baseline models, including BERTology models,
fusion BERTology and GCN models, other baselines, and SOTA on three benchmark
social media datasets. Our proposed ViCGCN approach demonstrates a significant
improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized
Language Models, including multilingual and monolingual, on three benchmark
datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our
integrated model ViCGCN achieves the best performance compared to other
BERTology integrated with GCN models
Sentiment Analysis: State of the Art
We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included
Helping each Other: A Framework for Customer-to-Customer Suggestion Mining using a Semi-supervised Deep Neural Network
Suggestion mining is increasingly becoming an important task along with
sentiment analysis. In today's cyberspace world, people not only express their
sentiments and dispositions towards some entities or services, but they also
spend considerable time sharing their experiences and advice to fellow
customers and the product/service providers with two-fold agenda: helping
fellow customers who are likely to share a similar experience, and motivating
the producer to bring specific changes in their offerings which would be more
appreciated by the customers. In our current work, we propose a hybrid deep
learning model to identify whether a review text contains any suggestion. The
model employs semi-supervised learning to leverage the useful information from
the large amount of unlabeled data. We evaluate the performance of our proposed
model on a benchmark customer review dataset, comprising of the reviews of
Hotel and Electronics domains. Our proposed approach shows the F-scores of
65.6% and 65.5% for the Hotel and Electronics review datasets, respectively.
These performances are significantly better compared to the existing
state-of-the-art system.Comment: To be appear in the proceedings of ICON 201
Multimodal Generative Models for Scalable Weakly-Supervised Learning
Multiple modalities often co-occur when describing natural phenomena.
Learning a joint representation of these modalities should yield deeper and
more useful representations. Previous generative approaches to multi-modal
input either do not learn a joint distribution or require additional
computation to handle missing data. Here, we introduce a multimodal variational
autoencoder (MVAE) that uses a product-of-experts inference network and a
sub-sampled training paradigm to solve the multi-modal inference problem.
Notably, our model shares parameters to efficiently learn under any combination
of missing modalities. We apply the MVAE on four datasets and match
state-of-the-art performance using many fewer parameters. In addition, we show
that the MVAE is directly applicable to weakly-supervised learning, and is
robust to incomplete supervision. We then consider two case studies, one of
learning image transformations---edge detection, colorization,
segmentation---as a set of modalities, followed by one of machine translation
between two languages. We find appealing results across this range of tasks.Comment: To appear at NIPS 2018; 9 pages with supplemen
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
Named Entity Recognition (NER) is a key component in NLP systems for question
answering, information retrieval, relation extraction, etc. NER systems have
been studied and developed widely for decades, but accurate systems using deep
neural networks (NN) have only been introduced in the last few years. We
present a comprehensive survey of deep neural network architectures for NER,
and contrast them with previous approaches to NER based on feature engineering
and other supervised or semi-supervised learning algorithms. Our results
highlight the improvements achieved by neural networks, and show how
incorporating some of the lessons learned from past work on feature-based NER
systems can yield further improvements.Comment: Published at COLING 201
Simple and Effective Text Simplification Using Semantic and Neural Methods
Sentence splitting is a major simplification operator. Here we present a
simple and efficient splitting algorithm based on an automatic semantic parser.
After splitting, the text is amenable for further fine-tuned simplification
operations. In particular, we show that neural Machine Translation can be
effectively used in this situation. Previous application of Machine Translation
for simplification suffers from a considerable disadvantage in that they are
over-conservative, often failing to modify the source in any way. Splitting
based on semantic parsing, as proposed here, alleviates this issue. Extensive
automatic and human evaluation shows that the proposed method compares
favorably to the state-of-the-art in combined lexical and structural
simplification
Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary
SentiWordNet is an important lexical resource supporting sentiment analysis
in opinion mining applications. In this paper, we propose a novel approach to
construct a Vietnamese SentiWordNet (VSWN). SentiWordNet is typically generated
from WordNet in which each synset has numerical scores to indicate its opinion
polarities. Many previous studies obtained these scores by applying a machine
learning method to WordNet. However, Vietnamese WordNet is not available
unfortunately by the time of this paper. Therefore, we propose a method to
construct VSWN from a Vietnamese dictionary, not from WordNet. We show the
effectiveness of the proposed method by generating a VSWN with 39,561 synsets
automatically. The method is experimentally tested with 266 synsets with aspect
of positivity and negativity. It attains a competitive result compared with
English SentiWordNet that is 0.066 and 0.052 differences for positivity and
negativity sets respectively.Comment: accepted on April-9th-2014, best paper awar
Arabic open information extraction system using dependency parsing
Arabic is a Semitic language and one of the most natural languages distinguished by the richness in morphological enunciation and derivation. This special and complex nature makes extracting information from the Arabic language difficult and always needs improvement. Open information extraction systems (OIE) have been emerged and used in different languages, especially in English. However, it has almost not been used for the Arabic language. Accordingly, this paper aims to introduce an OIE system that extracts the relation tuple from Arabic web text, exploiting Arabic dependency parsing and thinking carefully about all possible text relations. Based on clause types' propositions as extractable relations and constituents' grammatical functions, the identities of corresponding clause types are established. The proposed system named Arabic open information extraction(AOIE) can extract highly scalable Arabic text relations while being domain independent. Implementing the proposed system handles the problem using supervised strategies while the system relies on unsupervised extraction strategies. Also, the system has been implemented in several domains to avoid information extraction in a specific field. The results prove that the system achieves high efficiency in extracting clauses from large amounts of text
- …