2,341 research outputs found
Recommended from our members
Cross-Lingual and Low-Resource Sentiment Analysis
Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages.
This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language.
Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis.
To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments.
The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language.
In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment
Acoustic, Morphological, and Functional Aspects of `yeah/ja' in Dutch, English and German
We explore different forms and functions of one of the most common feedback expressions in Dutch, English, and German, namely `yeah/ja' which is known for its multi-functionality and ambiguous usage in dialog. For example, it can be used as a yes-answer, or as a pure continuer, or as a way to show agreement. In addition, `yeah/ja' can be used in its single form, but it can also be combined with other particles, forming multi-word expressions, especially in Dutch and German. We have found substantial differences on the morpho-lexical level between the three related languages which enhances the ambiguous character of `yeah/ja'. An explorative analysis of the prosodic features of `yeah/ja' has shown that mainly a higher intensity is used to signal speaker incipiency across the inspected languages
Identifying sources of opinions with conditional random fields and extraction patterns
Journal ArticleRecent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extraction task and adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a). While CRFs model source identification as a sequence tagging task, AutoSlog learns extraction patterns. Our results show that the combination of these two methods performs better than either one alone. The resulting system identifies opinion sources with 79:3% precision and 59:5% recall using a head noun matching measure, and 81:2% precision and 60:6% recall using an overlap measure
A survey on sentiment analysis in Urdu: A resource-poor language
© 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis
Analys is and Creation of Free Sentiment Analysis Programs
This paper analyzes free online programs for sentiment analysis which can, on the bases of their algorithm, give a positive, negative or neutral opinion of a text. At the beginning of the paper sentiment analysis programs and techniques they use such as Naive Bayes and Recurrent Neural Networks are presented. The programs are divided into two categories for analysis. The fi rst category consists of sentiment analysis programs which analyze texts written or copied inside the user interface. The second category consists of programs for analyzing opinions posted on social networks, blogs, and other media sites. Programs from both categories were chosen for this research on the bases of positive reviews on computer science portals and their popularity on web search engin es such as Google and Bing. The accuracy of the programs from the fi rst category was checked by inserting the same sentence from movie reviews and comparing
the results. Their additional options have also been analyzed. For the second category of programs, it was determined which social networks, blogs, and other social media they cover on the internet. The purpose of this analysis was to check the overall quality and options that free sentiment analysis programs provide. An example of how to create one’s own custom sentiment analyzer by using the available Python code and libraries found online is also given. Two simple programs were created using Python. The fi rst program belongs to the fi rst category of programs for analyzing an input text. This program serves as a pilot
program for Croatian which gives only the basic analysis of sentences. The second program collects recent tweets from Twitter containing certain words and creates a pie chart based on the analysis of the results
- …