2,796 research outputs found

    Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining

    Full text link
    Aspect-based opinion mining is the task of identifying sentiment at the aspect level in opinionated text, which consists of two subtasks: aspect category extraction and sentiment polarity classification. While aspect category extraction aims to detect and categorize opinion targets such as product features, sentiment polarity classification assigns a sentiment label, i.e. positive, negative, or neutral, to each identified aspect. Supervised learning methods have been shown to deliver better accuracy for this task but they require labeled data, which is costly to obtain, especially for resource-poor languages like Vietnamese. To address this problem, we present a supervised aspect-based opinion mining method that utilizes labeled data from a foreign language (English in this case), which is translated to Vietnamese by an automated translation tool (Google Translate). Because aspects and opinions in different languages may be expressed by different words, we propose using word embeddings, in addition to other features, to reduce the vocabulary difference between the original and translated texts, thus improving the effectiveness of aspect category extraction and sentiment polarity classification processes. We also introduce an annotated corpus of aspect categories and sentiment polarities extracted from restaurant reviews in Vietnamese, and conduct a series of experiments on the corpus. Experimental results demonstrate the effectiveness of the proposed approach

    Vietnamese Semantic Role Labelling

    Full text link
    In this paper, we study semantic role labelling (SRL), a subtask of semantic parsing of natural language sentences and its application for the Vietnamese language. We present our effort in building Vietnamese PropBank, the first Vietnamese SRL corpus and a software system for labelling semantic roles of Vietnamese texts. In particular, we present a novel constituent extraction algorithm in the argument candidate identification step which is more suitable and more accurate than the common node-mapping method. In the machine learning part, our system integrates distributed word features produced by two recent unsupervised learning models in two learned statistical classifiers and makes use of integer linear programming inference procedure to improve the accuracy. The system is evaluated in a series of experiments and achieves a good result, an F1F_1 score of 74.77%. Our system, including corpus and software, is available as an open source project for free research and we believe that it is a good baseline for the development of future Vietnamese SRL systems.Comment: Accepted to the VNU Journal of Scienc

    ViCGCN: Graph Convolutional Network with Contextualized Language Models for Social Media Mining in Vietnamese

    Full text link
    Social media processing is a fundamental task in natural language processing with numerous applications. As Vietnamese social media and information science have grown rapidly, the necessity of information-based mining on Vietnamese social media has become crucial. However, state-of-the-art research faces several significant drawbacks, including imbalanced data and noisy data on social media platforms. Imbalanced and noisy are two essential issues that need to be addressed in Vietnamese social media texts. Graph Convolutional Networks can address the problems of imbalanced and noisy data in text classification on social media by taking advantage of the graph structure of the data. This study presents a novel approach based on contextualized language model (PhoBERT) and graph-based method (Graph Convolutional Networks). In particular, the proposed approach, ViCGCN, jointly trained the power of Contextualized embeddings with the ability of Graph Convolutional Networks, GCN, to capture more syntactic and semantic dependencies to address those drawbacks. Extensive experiments on various Vietnamese benchmark datasets were conducted to verify our approach. The observation shows that applying GCN to BERTology models as the final layer significantly improves performance. Moreover, the experiments demonstrate that ViCGCN outperforms 13 powerful baseline models, including BERTology models, fusion BERTology and GCN models, other baselines, and SOTA on three benchmark social media datasets. Our proposed ViCGCN approach demonstrates a significant improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized Language Models, including multilingual and monolingual, on three benchmark datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our integrated model ViCGCN achieves the best performance compared to other BERTology integrated with GCN models

    Sentiment Analysis: State of the Art

    Get PDF
    We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included

    Helping each Other: A Framework for Customer-to-Customer Suggestion Mining using a Semi-supervised Deep Neural Network

    Full text link
    Suggestion mining is increasingly becoming an important task along with sentiment analysis. In today's cyberspace world, people not only express their sentiments and dispositions towards some entities or services, but they also spend considerable time sharing their experiences and advice to fellow customers and the product/service providers with two-fold agenda: helping fellow customers who are likely to share a similar experience, and motivating the producer to bring specific changes in their offerings which would be more appreciated by the customers. In our current work, we propose a hybrid deep learning model to identify whether a review text contains any suggestion. The model employs semi-supervised learning to leverage the useful information from the large amount of unlabeled data. We evaluate the performance of our proposed model on a benchmark customer review dataset, comprising of the reviews of Hotel and Electronics domains. Our proposed approach shows the F-scores of 65.6% and 65.5% for the Hotel and Electronics review datasets, respectively. These performances are significantly better compared to the existing state-of-the-art system.Comment: To be appear in the proceedings of ICON 201

    Multimodal Generative Models for Scalable Weakly-Supervised Learning

    Full text link
    Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision. We then consider two case studies, one of learning image transformations---edge detection, colorization, segmentation---as a set of modalities, followed by one of machine translation between two languages. We find appealing results across this range of tasks.Comment: To appear at NIPS 2018; 9 pages with supplemen

    A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

    Full text link
    Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements.Comment: Published at COLING 201

    Simple and Effective Text Simplification Using Semantic and Neural Methods

    Full text link
    Sentence splitting is a major simplification operator. Here we present a simple and efficient splitting algorithm based on an automatic semantic parser. After splitting, the text is amenable for further fine-tuned simplification operations. In particular, we show that neural Machine Translation can be effectively used in this situation. Previous application of Machine Translation for simplification suffers from a considerable disadvantage in that they are over-conservative, often failing to modify the source in any way. Splitting based on semantic parsing, as proposed here, alleviates this issue. Extensive automatic and human evaluation shows that the proposed method compares favorably to the state-of-the-art in combined lexical and structural simplification

    Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary

    Full text link
    SentiWordNet is an important lexical resource supporting sentiment analysis in opinion mining applications. In this paper, we propose a novel approach to construct a Vietnamese SentiWordNet (VSWN). SentiWordNet is typically generated from WordNet in which each synset has numerical scores to indicate its opinion polarities. Many previous studies obtained these scores by applying a machine learning method to WordNet. However, Vietnamese WordNet is not available unfortunately by the time of this paper. Therefore, we propose a method to construct VSWN from a Vietnamese dictionary, not from WordNet. We show the effectiveness of the proposed method by generating a VSWN with 39,561 synsets automatically. The method is experimentally tested with 266 synsets with aspect of positivity and negativity. It attains a competitive result compared with English SentiWordNet that is 0.066 and 0.052 differences for positivity and negativity sets respectively.Comment: accepted on April-9th-2014, best paper awar

    Arabic open information extraction system using dependency parsing

    Get PDF
    Arabic is a Semitic language and one of the most natural languages distinguished by the richness in morphological enunciation and derivation. This special and complex nature makes extracting information from the Arabic language difficult and always needs improvement. Open information extraction systems (OIE) have been emerged and used in different languages, especially in English. However, it has almost not been used for the Arabic language. Accordingly, this paper aims to introduce an OIE system that extracts the relation tuple from Arabic web text, exploiting Arabic dependency parsing and thinking carefully about all possible text relations. Based on clause types' propositions as extractable relations and constituents' grammatical functions, the identities of corresponding clause types are established. The proposed system named Arabic open information extraction(AOIE) can extract highly scalable Arabic text relations while being domain independent. Implementing the proposed system handles the problem using supervised strategies while the system relies on unsupervised extraction strategies. Also, the system has been implemented in several domains to avoid information extraction in a specific field. The results prove that the system achieves high efficiency in extracting clauses from large amounts of text
    • …
    corecore