50 research outputs found

    Revisiting Pre-Trained Models for Chinese Natural Language Processing

    Full text link
    Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community. We also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways, especially the masking strategy that adopts MLM as correction (Mac). We carried out extensive experiments on eight Chinese NLP tasks to revisit the existing pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. Resources available: https://github.com/ymcui/MacBERTComment: 12 pages, to appear at Findings of EMNLP 202

    Using sentiment analysis technique for analyzing Thai customer satisfaction from social media

    Get PDF
    With the rapidly increasing number of Thai online customer reviews available in social media and websites, sentiment analysis technique, also called opinion mining, has become an important task in the past few years.This technique aims to analyze people’s emotions, opinion, attitudes and sentiments.The classical approaches for opinion mining represents the reviews as bag-of-words as many words can be used to identify positive or negative feedbacks.This makes these methods work well with European language reviews which are segmented texts.However, these bag-of-word based methods face problem with Thai customer’s review which is non-segmented text, since Thai texts are formed as a long sequence of characters without word boundaries.Up to now, not much research conducted on sentiment analysis for Thai customer reviews.This paper proposes a sentiment analysis technique for Thai customer’s reviews.The proposed technique is based on the integration of Thai word extraction and sentiment analysis techniques for mining Thai customer’s opinion. To demonstrate the proposed technique, experimental studies on analyzing Thai customer’s reviews from social media are presented in this paper.The results show that the proposed method provides significant benefits for mining Thai customer’s opinion from social media

    Komparasi Algoritma Klasifikasi Machine Learning dan Feature Selection pada Analisis Sentimen Review Film

    Full text link
    Analisis sentimen adalah proses yang bertujuan untuk menentukan isi dari dataset yang berbentuk teks bersifat positif, negatif atau netral. Saat ini, pendapat khalayak umum menjadi sumber yang penting dalam pengambilan keputusan seseorang akan suatu produk. Algoritma klasifikasi seperti Naïve Bayes (NB), Support Vector Machine (SVM), dan Artificial Neural Network (ANN) diusulkan oleh banyak peneliti untuk digunakan pada analisis sentimen review film. Namun, klasifikasi sentimen teks mempunyai masalah pada banyaknya atribut yang digunakan pada sebuah dataset. Feature selection dapat digunakan untuk mengurangi atribut yang kurang relevan pada dataset. Beberapa algoritma feature selection yang digunakan adalah information gain, chi square, forward selection dan backward elimination. Hasil komparasi algoritma, SVM mendapatkan hasil yang terbaik dengan accuracy 81.10% dan AUC 0.904. Hasil dari komparasi feature selection, information gain mendapatkan hasil yang paling baik dengan average accuracy 84.57% dan average AUC 0.899. Hasil integrasi algoritma klasifikasi terbaik dan algoritma feature selection terbaik menghasilkan accuracy 81.50% dan AUC 0.929. Hasil ini mengalami kenaikan jika dibandingkan hasil eksperimen yang menggunakan SVM tanpa feature selection. Hasil dari pengujian algoritma feature selection terbaik untuk setiap algoritma klasifikasi adalah information gain mendapatkan hasil terbaik untuk digunakan pada algoritma NB, SVM dan ANN

    UJI COBA STEMMING POTTER PADA SKEMA SISTEM PENENTUAN PERINGKAT BUKU BERDASARKAN TESTIMONI MENGGUNAKAN KESAMAAN SEMANTIK

    Get PDF
    Testimony is an opinion mining or could be part of sentiment analysis, which can defined as a process of understanding, extract and process the textual data automatically to get the sentiment of information contained in an opinion sentence. Testimonials about a product is very important in determining the purchase of a product. Sentences which is contained in the testimonial could be negative or positive view. In this paper, sentiment analysis is done to see opinions or tendency of opinions towards a book products, whether or opine tend to view negative or positive. The magnitude of the effect and benefits of sentiment analysis led to research and application based on sentiment analysis rapidly grow. Methodology which is done in the first step in this paper is preprocessing which includes tokenizing text and stemming, and continued with the process of forming the corpus database and the process of determining the ranking of results testimony. In this study, the algorithm which is used for stemming process is a potter algorithm

    Arabic opinion mining using combined classification approach

    Get PDF
    In this paper, we present a combined approach that automatically extracts opinions from Arabic documents. Most research efforts in the area of opinion mining deal with English texts and little work with Arabic text. Unlike English, from our experiments, we found that using only one method on Arabic opinioned documents produce a poor performance. So, we used a combined approach that consists of three methods. At the beginning, lexicon based method is used to classify as much documents as possible. The resultant classified documents used as training set for maximum entropy method which subsequently classifies some other documents. Finally, k-nearest method used the classified documents from lexicon based method and maximum entropy as training set and classifies the rest of the documents. Our experiments showed that in average, the accuracy moved (almost) from 50% when using only lexicon based method to 60% when used lexicon based method and maximum entropy together, to 80% when using the three combined methods
    corecore