867 research outputs found

    On the Logistical Difficulties and Findings of Jopara Sentiment Analysis

    Get PDF
    [Abstract] This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.DV is supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA. 15 DV also receives funding from MINECO (ANSWER-ASAP, TIN2017-85160-C2-1-R), from Xunta de Galicia (ED431C 2020/11), from Centro de Investigación de Galicia ‘CITIC’, funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program) by grant ED431G 2019/01Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01https://aclanthology.org/2021.calcs-

    Text pre-processing of multilingual for sentiment analysis based on social network data

    Get PDF
    Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text pre-processing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition

    Sentiment Analysis in Marathi Language

    Get PDF
    Sentiment analysis is inevitable in current era. Internet is growing day-by-day. Now-a-days everything is online. We can shop, buy, and sell online. People can give feedbacks / opinions on the internet. Customers can compare among various products by analyzing the product reviews. As more and more people from different age groups and languages are becoming new internet users, we need it in regional languages. Till date most of the work related to sentiment analysis has been done in English language. But when it comes to Indian languages, not much research has done except for few languages. This paper mainly focuses on performing sentiment analysis in one of the Indian languages i.e. Marathi

    Leveraging writing systems changes for deep learning based Chinese affective analysis

    Get PDF
    Affective analysis of social media text is in great demand. Online text written in Chinese communities often contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and minor text using Latin letters, an alphabet-based writing system. This phenomenon is referred to as writing systems changes (WSCs). Past studies have shown that WSCs often reflect unfiltered immediate affections. However, the use of WSCs poses more challenges in Natural Language Processing tasks because WSCs can break the syntax of the major text. In this work, we present our work to use WSCs as an effective feature in a hybrid deep learning model with attention network. The WSCs scripts are first identified by their encoding range. Then, the document representation of the text is learned through a Long Short-Term Memory model and the minor text is learned by a separate Convolution Neural Network model. To further highlight the WSCs components, an attention mechanism is adopted to re-weight the feature vector before the classification layer. Experiments show that the proposed hybrid deep learning method which better incorporates WSCs features can further improve performance compared to the state-of-the-art classification models. The experimental result indicates that WSCs can serve as effective information in affective analysis of the social media text

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available
    corecore