134 research outputs found

    Multilingual opinion mining

    Get PDF
    170 p.Cada día se genera gran cantidad de texto en diferentes medios online. Gran parte de ese texto contiene opiniones acerca de multitud de entidades, productos, servicios, etc. Dada la creciente necesidad de disponer de medios automatizados para analizar, procesar y explotar esa información, las técnicas de análisis de sentimiento han recibido gran cantidad de atención por parte de la industria y la comunidad científica durante la última década y media. No obstante, muchas de las técnicas empleadas suelen requerir de entrenamiento supervisado utilizando para ello ejemplos anotados manualmente, u otros recursos lingüísticos relacionados con un idioma o dominio de aplicación específicos. Esto limita la aplicación de este tipo de técnicas, ya que dicho recursos y ejemplos anotados no son sencillos de obtener. En esta tesis se explora una serie de métodos para realizar diversos análisis automáticos de texto en el marco del análisis de sentimiento, incluyendo la obtención automática de términos de un dominio, palabras que expresan opinión, polaridad del sentimiento de dichas palabras (positivas o negativas), etc. Finalmente se propone y se evalúa un método que combina representación continua de palabras (continuous word embeddings) y topic-modelling inspirado en la técnica de Latent Dirichlet Allocation (LDA), para obtener un sistema de análisis de sentimiento basado en aspectos (ABSA), que sólo necesita unas pocas palabras semilla para procesar textos de un idioma o dominio determinados. De este modo, la adaptación a otro idioma o dominio se reduce a la traducción de las palabras semilla correspondientes

    Exploring Natural Language Processing and Sentence Embeddings for Sentiment Analysis of Online Restaurant Reviews

    Get PDF
    This paper explores the application of Natural Language Processing (NLP) methods in sentiment analysis of restaurant reviews available online, for a sample of restaurants in the Algarve region. The primary objective was to develop an automated method that could efficiently extract and categorize relevant sentiments relating to five key attributes of customer satisfaction, namely food quality, service, ambient, price and restaurant’s location. Using the F1 Score the proposed method was compared against human classification benchmarks. The results showed that Universal Sentence Encoding (USE) was a suitable method for implementation due to its acceptable F1 score performance, ease of accessibility and reduced cost. The use of semantic embeddings can provide valuable insights from online reviews that could benefit the restaurant management and in general the data-driven decision-making processes businesses in the gastronomic sector

    Leveraging contextual embeddings and self-attention neural networks with bi-attention for sentiment analysis

    Get PDF
    People express their opinions and views in different and often ambiguous ways, hence the meaning of their words is often not explicitly stated and frequently depends on the context. Therefore, it is difficult for machines to process and understand the information conveyed in human languages. This work addresses the problem of sentiment analysis (SA). We propose a simple yet comprehensive method which uses contextual embeddings and a self-attention mechanism to detect and classify sentiment. We perform experiments on reviews from different domains, as well as on languages from three different language families, including morphologically rich Polish and German. We show that our approach is on a par with state-of-the-art models or even outperforms them in several cases. Our work also demonstrates the superiority of models leveraging contextual embeddings. In sum, in this paper we make a step towards building a universal, multilingual sentiment classifier.Peer ReviewedPostprint (published version

    Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

    Get PDF
    Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset

    Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

    Full text link
    Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201

    Sentiment Analysis of Afaan Oromoo Facebook Media Using Deep Learning Approach

    Get PDF
    The rapid development and popularity of social media and social networks provide people with unprecedented opportunities to express and share their thoughts, views, opinions and feelings about almost anything through their personal webpages and blogs or using social network sites like Facebook, Twitter, and Blogger.  This study focuses on sentiment analysis of social media content because automatically identifying and classifying opinions from social media posts can provide significant economic values and social benefits. The major problem with sentiment analysis of social media posts is that it is extremely vast, fragmented, unorganized and unstructured. Nevertheless, many organizations and individuals are highly interested to know what other peoples are thinking or feeling about their services and products. Therefore, sentiment analysis has increasingly become a major area of research interest in the field of Natural Language Processing and Text Mining. In general, sentiment analysis is the process of automatically identifying and categorizing opinions in order to determine whether the writer's attitude towards a particular entity is positive or negative. To the best of the researcher’s knowledge, there is no Deep learning approach done for Afaan Oromoo Sentiment analysis to identify the opinion of the people on social media content. Therefore, in this study, we focused on investigating Convolutional Neural Network and Long Short Term Memory deep learning approaches for the development of sentiment analysis of Afaan Oromoo social media content such as Facebook posts comments. To this end, a total of 1452 comments collected from the official site of the Facebook page of Oromo Democratic Party/ODP for the study. After collecting the data, manual annotation is undertaken. Preprocessing, normalization, tokenization, stop word removal of the sentence are performed. We used the Keras deep learning python library to implement both deep learning algorithms. Long Short Term Memory and Convolutional Neural Network, we used word embedding as a feature. We conducted our experiment on the selected classifiers. For classifiers, we used 80% training and 20% testing rule. According to the experiment, the result shows that Convolutional Neural Network achieves the accuracy of 89%. The Long Short Memory achieves accuracy of 87.6%. Even though the result is promising there are still challenges. Keywords: Sentiment Analysis; Opinionated Afaan Oromoo facebook comments; Oromo Democratic Party Facebook page DOI: 10.7176/NMMC/90-02 Publication date:May 31st 202

    Considerations about learning Word2Vec

    Get PDF
    AbstractDespite the large diffusion and use of embedding generated through Word2Vec, there are still many open questions about the reasons for its results and about its real capabilities. In particular, to our knowledge, no author seems to have analysed in detail how learning may be affected by the various choices of hyperparameters. In this work, we try to shed some light on various issues focusing on a typical dataset. It is shown that the learning rate prevents the exact mapping of the co-occurrence matrix, that Word2Vec is unable to learn syntactic relationships, and that it does not suffer from the problem of overfitting. Furthermore, through the creation of an ad-hoc network, it is also shown how it is possible to improve Word2Vec directly on the analogies, obtaining very high accuracy without damaging the pre-existing embedding. This analogy-enhanced Word2Vec may be convenient in various NLP scenarios, but it is used here as an optimal starting point to evaluate the limits of Word2Vec

    A FRAMEWORK FOR ARABIC SENTIMENT ANALYSIS USING MACHINE LEARNING CLASSIFIERS

    Get PDF
    International audienceIn recent years, the use of Internet and online comments, expressed in natural language text, have increased significantly. However, it is difficult for humans to read all these comments and classify them appropriately. Consequently, an automatic approach is required to classify the unstructured data. In this paper, we propose a framework for Arabic language comprising of three steps: pre-processing, feature extraction and machine learning classification. The main aim of the proposed framework is to exploit the combination of different Arabic linguistic features. We evaluate the framework using two benchmark Arabic tweets datasets (ASTD, ATA), which enable sentiment polarity detection in general Arabic and Jordanian dialects. Comparative simulation results show that machine learning classifiers such as Support Vector Machine (SVM), Naive Bayes, MultiLayer Perceptron (MLP) and Logistic Regression-based produce the best performance by using a combination of n-gram features from Arabic tweets datasets. Finally, we evaluate the performance of our proposed framework using an Ensemble classifier approach, with promising results
    corecore