10,585 research outputs found

    Opinion Mining on Non-English Short Text

    Full text link
    As the type and the number of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. In this paper, we investigate the problem of mining opinions on the collection of informal short texts. Both positive and negative sentiment strength of texts are detected. We focus on a non-English language that has few resources for text mining. This approach would help enhance the sentiment analysis in languages where a list of opinionated words does not exist. We propose a new method projects the text into dense and low dimensional feature vectors according to the sentiment strength of the words. We detect the mixture of positive and negative sentiments on a multi-variant scale. Empirical evaluation of the proposed framework on Turkish tweets shows that our approach gets good results for opinion mining

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Personality Analysis Using Classification on Turkish Tweets

    Get PDF
    According to the psychology literature, there is a strong correlation between personality traits and the linguistic behavior of people. Due to the increase in computer based communication, individuals express their personalities in written forms on social media. Hence, social media has become a convenient resource to analyze the relationship between personality traits and lingusitic behaviour. Although there is a vast amount of studies on social media, only a small number of them focus on personality prediction. In this work, the authors aim to model the relationship between the social media messages of individuals and big five personality traits as a supervised learning problem. They use Twitter posts and user statistics for analysis. They investigate various approaches for user profile representation, explore several supervised learning techniques, and present comparative analysis results. The results confirm the findings of psychology literature, and they show that computational analysis of tweets using supervised learning methods can be used to determine the personality of individuals

    An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web

    Full text link
    With the rapid growth of the use of social media websites, obtaining the users' feedback automatically became a crucial task to evaluate their tendencies and behaviors online. Despite this great availability of information, and the increasing number of Arabic users only few research has managed to treat Arabic dialects. The purpose of this paper is to study the opinion and emotion expressed in real Moroccan texts precisely in the YouTube comments using some well-known and commonly used methods for sentiment analysis. In this paper, we present our work of Moroccan dialect comments classification using Machine Learning (ML) models and based on our collected and manually annotated YouTube Moroccan dialect dataset. By employing many text preprocessing and data representation techniques we aim to compare our classification results utilizing the most commonly used supervised classifiers: k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LTSM). Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing. In fact, the experimental results prove that DL models have a better performance for Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.Comment: 13 pages, 5 tables, 2 figure

    Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages

    Get PDF
    Jebbara S. Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages. Bielefeld: Universität Bielefeld; 2020.Everyday, vast amounts of unstructured, textual data are shared online in digital form. Websites such as forums, social media sites, review sites, blogs, and comment sections offer platforms to express and discuss opinions and experiences. Understanding the opinions in these resources is valuable for e.g. businesses to support market research and customer service but also individuals, who can benefit from the experiences and expertise of others. In this thesis, we approach the topic of opinion extraction and classification with neural network models. We regard this area of sentiment analysis as a relation extraction problem in which the sentiment of some opinion holder towards a certain aspect of a product, theme, or event needs to be extracted. In accordance with this framework, our main contributions are the following: 1. We propose a full system addressing all subtasks of relational sentiment analysis. 2. We investigate how semantic web resources can be leveraged in a neural-network-based model for the extraction of opinion targets and the classification of sentiment labels. Specifically, we experiment with enhancing pretrained word embeddings using the lexical resource WordNet. Furthermore, we enrich a purely text-based model with SenticNet concepts and observe an improvement for sentiment classification. 3. We examine how opinion targets can be automatically identified in noisy texts. Customer reviews, for instance, are prone to contain misspelled words and are difficult to process due to their domain-specific language. We integrate information about the character structure of a word into a sequence labeling system using character-level word embeddings and show their positive impact on the system's performance. We reveal encoded character patterns of the learned embeddings and give a nuanced view of the obtained performance differences. 4. Opinion target extraction usually relies on supervised learning approaches. We address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language

    Building Phrase Polarity Lexicons for Sentiment Analysis

    Get PDF
    Many approaches to sentiment analysis benefit from polarity lexicons. Most polarity lexicons include a list of polar (positive/negative) words, and sentiment analysis systems attempt to capture the occurrence of those words in text using polarity lexicons. Although there exist some polarity lexicons in many natural languages, most languages suffer from the lack of phrase polarity lexicons. Phrases play an important role in sentiment analysis because the polarity of a phrase cannot always be estimated based on the polarity of its parts. In this work, a hybrid approach is proposed for building phrase polarity lexicons which is experimented on Turkish as a low-resource language. The obtained classification accuracies in extracting and classifying phrases as positive, negative, or neutral, approve the effectiveness of the proposed methodology

    Analysis of Students Emotion for Twitter Data using Naïve Bayes and Non Linear Support Vector Machine Approachs

    Get PDF
    Students' informal discussions on social media (e.g Twitter, Facebook) shed light into their educational understandings- opinions, feelings, and concerns about the knowledge process. Data from such surroundings can provide valuable knowledge about students learning. Examining such data, however can be challenging. The difficulty of students' experiences reflected from social media content requires human analysis. However, the growing scale of data demands spontaneous data analysis techniques. The posts of engineering students' on twitter is focused to understand issues and problems in their educational experiences. Analysis on samples taken from tweets related to engineering students' college life is conducted. The proposed work is to explore engineering students informal conversations on Twitter in order to understand issues and problems students encounter in their learning experiences. The encounter problems of engineering students from tweets such as heavy study load, lack of social engagement and sleep deprivation are considered as labels. To classify tweets reflecting students' problems multi-label classification algorithms is implemented. Non Linear Support Vector Machine, Naïve Bayes and Linear Support Vector Machine methods are used as multilabel classifiers which are implemented and compared in terms of accuracy. Non Linear SVM has shown more accuracy than Naïve Bayes classifier and linear Support Vector Machine classifier. The algorithms are used to train a detector of student problems from tweets. DOI: 10.17762/ijritcc2321-8169.150515
    corecore