458 research outputs found

    Sentiment Analysis in Social Streams

    Get PDF
    In this chapter, we review and discuss the state of the art on sentiment analysis in social streams—such as web forums, microblogging systems, and social networks, aiming to clarify how user opinions, affective states, and intended emo tional effects are extracted from user generated content, how they are modeled, and howthey could be finally exploited.We explainwhy sentiment analysistasks aremore difficult for social streams than for other textual sources, and entail going beyond classic text-based opinion mining techniques. We show, for example, that social streams may use vocabularies and expressions that exist outside the mainstream of standard, formal languages, and may reflect complex dynamics in the opinions and sentiments expressed by individuals and communities


    Get PDF
    The widespread use of social media platforms such as Twitter, Instagram, Facebook, and LinkedIn have had a huge impact on daily human interactions and decision-making. Owing to Twitter's widespread acceptance, users can express their opinions/sentiments on nearly any issue, ranging from public opinion, a product/service, to even a specific group of people. Sharing these opinions/sentiments results in a massive production of user content known as tweets, which can be assessed to generate new knowledge. Corporate insights, government policy formation, decision-making, and brand identity monitoring all benefit from analyzing the opinions/sentiments expressed in these tweets. Even though several techniques have been created to analyze user sentiments from tweets, social media engagements include negation words and emoji elements that, if not properly pre-processed, would result in misclassification. The majority of available pre-processing techniques rely on clean data and machine learning algorithms to annotate sentiment in unlabeled texts. In this study, we propose a text pre-processing approach that takes into consideration negation words and emoji characteristics in text data by translating these features into single contextual words in tweets to minimize context loss. The proposed preprocessor was evaluated on benchmark Twitter datasets using four deep learning algorithms: Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), and Artificial Neural Network (ANN). The results showed that LSTM performed better than the approaches already discussed in the literature, with an accuracy of 96.36%, 88.41%, and 95.39%. The findings also suggest that pre-processing information like emoji and explicit word negations aids in the preservation of sentimental information. This appears to be the first study to classify sentiments in tweets while accounting for both explicit word negation conversion and emoji translation

    Sentiment Analysis on Twitter Data and Social Trends: The Case of Greek General Elections

    Get PDF
    Η ανάλυση συναισθήματος και εξόρυξη γνώμης (Sentiment Analysis-Opinion Mining) είναι η διαδικασία χρήσης επεξεργασίας φυσικής γλώσσας και διαφόρων τεχνικών (μηχανική μάθηση, λεξικά) για τον εντοπισμό και την εξαγωγή υποκειμενικών πληροφοριών από δεδομένα κειμένου. Χρησιμοποιείται συνήθως για τον προσδιορισμό του συνολικού συναισθήματος ενός κειμένου, όπως αν είναι θετικό, αρνητικό ή ουδέτερο. Σκοπός της παρούσας Διπλωματικής Εργασίας είναι η ανάλυση του συναισθήματος σε δεδομένα του Twitter. Πιο συγκεκριμένα, εφαρμόστηκε μια προσέγγιση βασισμένη σε λεξικό για την ανάλυση του συναισθήματος σε κείμενο tweet που σχετίζεται με τις Βουλευτικές Εκλογές του 2019 στην Ελλάδα. Τα tweets είναι στην ελληνική γλώσσα και ταξινομούνται ως θετικά, αρνητικά και ουδέτερα με βάση το συνολικό συναίσθημα που εκφράζουν. Μέσω της ανάλυσης συναισθήματος στα σύνολα δεδομένων με τη χρήση της γλώσσας προγραμματισμού Python, εξάγουμε συμπεράσματα σχετικά με τις κοινωνικές τάσεις που αναπτύσσονται στο προεκλογικό twitter σε σχέση με τα έξι (6) πολιτικά κόμματα που εξέλεξαν βουλευτές σε αυτές τις εκλογές. Τα αποτελέσματα παρουσιάζονται με σαφείς οπτικοποιήσεις με τη χρήση του εργαλείου Tableau για πληρέστερη κατανόηση. Εκτός από την περιγραφή της υλοποίησης, παρουσιάζονται οι κυριότεροι περιορισμοί και οι προκλήσεις και δυσκολίες που προέκυψαν στην προσπάθεια επεξεργασίας της ελληνικής γλώσσας. Τέλος, επιχειρείται η να επισήμανση ορισμένων πτυχών της ανάλυσης συναισθήματος και εξόρυξης γνώμης που χρήζουν βελτίωσης, τόσο στη προτεινόμενη εφαρμογή που παρουσιάζεται εδώ όσο και σε άλλες υπάρχουσες.Sentiment analysis and Opinion Mining involve the process of using natural language processing and various techniques (machine learning, lexicons) to identify and extract subjective information from text data. Sentiment analysis and Opinion Mining are commonly used to determine the emotional tone of a piece of text, such as whether it is positive, negative, or neutral. The purpose of the present Thesis is to analyze sentiment in Twitter data. More specifically, a lexicon-based approach has been implemented to analyze sentiment in tweet texts related to the 2019 general elections in Greece. The tweets are in the Greek language and are classified as positive, negative, and neutral based on the overall sentiment they express. Sentiment analysis implemented on the datasets using the Python programming language allows insights and conclusions about the social trends that develop in pre-election twitter in relation to the six (6) political parties that elected Members of Parliament (MPs) in the 2019 elections. The results are presented with visualizations using the Tableau tool targeting to a clear and more complete understanding. In addition to the description of the implementation, the main challenges, limitations, and difficulties encountered in trying to process the Greek language are presented, along with aspects of the implementation that can be improved, as well as other existing issues in Sentiment analysis and Opinion Mining

    Second language pragmatic ability: Individual differences according to environment

    Get PDF
    The aims of this paper are to review research literature on the role that the second language (L2) and foreign language (FL) environments actually play in the development of learners’ target language (TL) pragmatic ability, and also to speculate as to the extent to which individual factors can offset the advantages that learners may have by being in the L2 context while they are learning. The paper starts by defining pragmatics and by problematizing this definition. Then, attention is given to research literature dealing with the learning of pragmatics in an L2 context compared to an FL context. Next, studies on the role of pragmatic transfer are considered, with subsequent attention given to the literature on the incidence of pragmatic transfer in FL as opposed to L2 contexts. Finally, selected studies on the role of motivation in the development of pragmatic ability are examined. In the discussion section, a number of pedagogical suggestions are offered: the inclusion of pragmatics in teacher development, the use of authentic pragmatics materials, motivating learners to be more savvy about pragmatics, and supporting learners in accepting or challenging native-speaker norms. Suggestions as to further research in the field are also offered

    A study of the translation of sentiment in user-generated text

    Get PDF
    A thesis submitted in partial ful filment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Emotions are biological states of feeling that humans may verbally express to communicate their negative or positive mood, influence others, or even afflict harm. Although emotions such as anger, happiness, affection, or fear are supposedly universal experiences, the lingual realisation of the emotional experience may vary in subtle ways across different languages. For this reason, preserving the original sentiment of the source text has always been a challenging task that draws in a translator's competence and fi nesse. In the professional translation industry, an incorrect translation of the sentiment-carrying lexicon is considered a critical error as it can be either misleading or in some cases harmful since it misses the fundamental aspect of the source text, i.e. the author's sentiment. Since the advent of Neural Machine Translation (NMT), there has been a tremendous improvement in the quality of automatic translation. This has lead to an extensive use of NMT online tools to translate User-Generated Text (UGT) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards an entity. In such scenarios, the process of translating the user's sentiment is entirely automatic with no human intervention, neither for post-editing nor for accuracy checking. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes critical translation errors that may not only distort the sentiment but at times flips the polarity of the source text to its exact opposite. In this thesis, we tackle the translation of sentiment in UGT by NMT systems from two perspectives: analytical and experimental. First, the analytical approach introduces a list of linguistic features that can lead to a mistranslation of ne-grained emotions between different language pairs in the UGT domain. It also presents an error-typology specifi c to Arabic UGT illustrating the main linguistic phenomena that can cause mistranslation of sentiment polarity when translating Arabic UGT into English by NMT systems. Second, the experimental approach attempts to improve the translation of sentiment by addressing some of the linguistic challenges identifi ed in the analysis as causing mistranslation of sentiment both on the word-level and on the sentence-level. On the word-level, we propose a Transformer NMT model trained on a sentiment-oriented vector space model (VSM) of UGT data that is capable of translating the correct sentiment polarity of challenging contronyms. On the sentence-level, we propose a semi-supervised approach to overcome the problem of translating sentiment expressed by dialectical language in UGT data. We take the translation of dialectical Arabic UGT into English as a case study. Our semi-supervised AR-EN NMT model shows improved performance over the online MT Twitter tool in translating dialectical Arabic UGT not only in terms of translation quality but also in the preservation of the sentiment polarity of the source text. The experimental section also presents an empirical method to quantify the notion of sentiment transfer by an MT system and, more concretely, to modify automatic metrics such that its MT ranking comes closer to a human judgement of a poor or good translation of sentiment

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

    Understanding misinformation on Twitter in the context of controversial issues

    Get PDF
    Social media is slowly supplementing, or even replacing, traditional media outlets such as television, newspapers, and radio. However, social media presents some drawbacks when it comes to circulating information. These drawbacks include spreading false information, rumors, and fake news. At least three main factors create these drawbacks: The filter bubble effect, misinformation, and information overload. These factors make gathering accurate and credible information online very challenging, which in turn may affect public trust in online information. These issues are even more challenging when the issue under discussion is a controversial topic. In this thesis, four main controversial topics are studied, each of which comes from a different domain. This variation of domains can give a broad view of how misinformation is manifested in social media, and how it is manifested differently in different domains. This thesis aims to understand misinformation in the context of controversial issue discussions. This can be done through understanding how misinformation is manifested in social media as well as by understanding people’s opinions towards these controversial issues. In this thesis, three different aspects of a tweet are studied. These aspects are 1) the user sharing the information, 2) the information source shared, and 3) whether specific linguistic cues can help in assessing the credibility of information on social media. Finally, the web application tool TweetChecker is used to allow online users to have a more in-depth understanding of the discussions about five different controversial health issues. The results and recommendations of this study can be used to build solutions for the problem of trustworthiness of user-generated content on different social media platforms, especially for controversial issues

    Irony and Sarcasm Detection in Twitter: The Role of Affective Content

    Full text link
    Tesis por compendioSocial media platforms, like Twitter, offer a face-saving ability that allows users to express themselves employing figurative language devices such as irony to achieve different communication purposes. Dealing with such kind of content represents a big challenge for computational linguistics. Irony is closely associated with the indirect expression of feelings, emotions and evaluations. Interest in detecting the presence of irony in social media texts has grown significantly in the recent years. In this thesis, we introduce the problem of detecting irony in social media under a computational linguistics perspective. We propose to address this task by focusing, in particular, on the role of affective information for detecting the presence of such figurative language device. Attempting to take advantage of the subjective intrinsic value enclosed in ironic expressions, we present a novel model, called emotIDM, for detecting irony relying on a wide range of affective features. For characterising an ironic utterance, we used an extensive set of resources covering different facets of affect from sentiment to finer-grained emotions. Results show that emotIDM has a competitive performance across the experiments carried out, validating the effectiveness of the proposed approach. Another objective of the thesis is to investigate the differences among tweets labeled with #irony and #sarcasm. Our aim is to contribute to the less investigated topic in computational linguistics on the separation between irony and sarcasm in social media, again, with a special focus on affective features. We also studied a less explored hashtag: #not. We find data-driven arguments on the differences among tweets containing these hashtags, suggesting that the above mentioned hashtags are used to refer different figurative language devices. We identify promising features based on affect-related phenomena for discriminating among different kinds of figurative language devices. We also analyse the role of polarity reversal in tweets containing ironic hashtags, observing that the impact of such phenomenon varies. In the case of tweets labeled with #sarcasm often there is a full reversal, whereas in the case of those tagged with #irony there is an attenuation of the polarity. We analyse the impact of irony and sarcasm on sentiment analysis, observing a drop in the performance of NLP systems developed for this task when irony is present. Therefore, we explored the possible use of our findings in irony detection for the development of an irony-aware sentiment analysis system, assuming that the identification of ironic content could help to improve the correct identification of sentiment polarity. To this aim, we incorporated emotIDM into a pipeline for determining the polarity of a given Twitter message. We compared our results with the state of the art determined by the "Semeval-2015 Task 11" shared task, demonstrating the relevance of considering affective information together with features alerting on the presence of irony for performing sentiment analysis of figurative language for this kind of social media texts. To summarize, we demonstrated the usefulness of exploiting different facets of affective information for dealing with the presence of irony in Twitter.Las plataformas de redes sociales, como Twitter, ofrecen a los usuarios la posibilidad de expresarse de forma libre y espontanea haciendo uso de diferentes recursos lingüísticos como la ironía para lograr diferentes propósitos de comunicación. Manejar ese tipo de contenido representa un gran reto para la lingüística computacional. La ironía está estrechamente vinculada con la expresión indirecta de sentimientos, emociones y evaluaciones. El interés en detectar la presencia de ironía en textos de redes sociales ha aumentado significativamente en los últimos años. En esta tesis, introducimos el problema de detección de ironía en redes sociales desde una perspectiva de la lingüística computacional. Proponemos abordar dicha tarea enfocándonos, particularmente, en el rol de información relativa al afecto y las emociones para detectar la presencia de dicho recurso lingüístico. Con la intención de aprovechar el valor intrínseco de subjetividad contenido en las expresiones irónicas, presentamos un modelo para detectar la presencia de ironía denominado emotIDM, el cual está basado en una amplia variedad de rasgos afectivos. Para caracterizar instancias irónicas, utilizamos un amplio conjunto de recursos que cubren diferentes ámbitos afectivos: desde sentimientos (positivos o negativos) hasta emociones específicas definidas con una granularidad fina. Los resultados obtenidos muestran que emotIDM tiene un desempeño competitivo en los experimentos realizados, validando la efectividad del enfoque propuesto. Otro objetivo de la tesis es investigar las diferencias entre tweets etiquetados con #irony y #sarcasm. Nuestra finalidad es contribuir a un tema menos investigado en lingüística computacional: la separación entre el uso de ironía y sarcasmo en redes sociales, con especial énfasis en rasgos afectivos. Además, estudiamos un hashtag que ha sido menos analizado: #not. Nuestros resultados parecen evidenciar que existen diferencias entre los tweets que contienen dichos hashtags, sugiriendo que son utilizados para hacer referencia de diferentes recursos lingüísticos. Identificamos un conjunto de características basadas en diferentes fenómenos afectivos que parecen ser útiles para discriminar entre diferentes tipos de recursos lingüísticos. Adicionalmente analizamos la reversión de polaridad en tweets que contienen hashtags irónicos, observamos que el impacto de dicho fenómeno es diferente en cada uno de ellos. En el caso de los tweets que están etiquetados con el hashtag #sarcasm, a menudo hay una reversión total, mientras que en el caso de los tweets etiquetados con el hashtag #irony se produce una atenuación de la polaridad. Llevamos a cabo un estudio del impacto de la ironía y el sarcasmo en el análisis de sentimientos, observamos una disminución en el rendimiento de los sistemas de PLN desarrollados para dicha tarea cuando la ironía está presente. Por consiguiente, exploramos la posibilidad de utilizar nuestros resultados en detección de ironía para el desarrollo de un sistema de análisis de sentimientos que considere de la presencia de ironía, suponiendo que la detección de contenido irónico podría ayudar a mejorar la correcta identificación del sentimiento expresado en un texto dado. Con este objetivo, incorporamos emotIDM como la primera fase en un sistema de análisis de sentimientos para determinar la polaridad de mensajes en Twitter. Comparamos nuestros resultados con el estado del arte establecido en la tarea de evaluación "Semeval-2015 Task 11", demostrando la importancia de utilizar información afectiva en conjunto con características que alertan de la presencia de la ironía para desempeñar análisis de sentimientos en textos con lenguaje figurado que provienen de redes sociales. En resumen, demostramos la utilidad de aprovechar diferentes aspectos de información relativa al afecto y las emociones para tratar cuestiones relativas a la presencia de la ironíLes plataformes de xarxes socials, com Twitter, oferixen als usuaris la possibilitat d'expressar-se de forma lliure i espontània fent ús de diferents recursos lingüístics com la ironia per aconseguir diferents propòsits de comunicació. Manejar aquest tipus de contingut representa un gran repte per a la lingüística computacional. La ironia està estretament vinculada amb l'expressió indirecta de sentiments, emocions i avaluacions. L'interés a detectar la presència d'ironia en textos de xarxes socials ha augmentat significativament en els últims anys. En aquesta tesi, introduïm el problema de detecció d'ironia en xarxes socials des de la perspectiva de la lingüística computacional. Proposem abordar aquesta tasca enfocant-nos, particularment, en el rol d'informació relativa a l'afecte i les emocions per detectar la presència d'aquest recurs lingüístic. Amb la intenció d'aprofitar el valor intrínsec de subjectivitat contingut en les expressions iròniques, presentem un model per a detectar la presència d'ironia denominat emotIDM, el qual està basat en una àmplia varietat de trets afectius. Per caracteritzar instàncies iròniques, utilitzàrem un ampli conjunt de recursos que cobrixen diferents àmbits afectius: des de sentiments (positius o negatius) fins emocions específiques definides de forma molt detallada. Els resultats obtinguts mostres que emotIDM té un rendiment competitiu en els experiments realitzats, validant l'efectivitat de l'enfocament proposat. Un altre objectiu de la tesi és investigar les diferències entre tweets etiquetats com a #irony i #sarcasm. La nostra finalitat és contribuir a un tema menys investigat en lingüística computacional: la separació entre l'ús d'ironia i sarcasme en xarxes socials, amb especial èmfasi amb els trets afectius. A més, estudiem un hashtag que ha sigut menys estudiat: #not. Els nostres resultats pareixen evidenciar que existixen diferències entre els tweets que contenen els hashtags esmentats, cosa que suggerix que s'utilitzen per fer referència de diferents recursos lingüístics. Identifiquem un conjunt de característiques basades en diferents fenòmens afectius que pareixen ser útils per a discriminar entre diferents tipus de recursos lingüístics. Addicionalment analitzem la reversió de polaritat en tweets que continguen hashtags irònics, observant que l'impacte del fenomen esmentat és diferent per a cadascun d'ells. En el cas dels tweet que estan etiquetats amb el hashtag #sarcasm, a sovint hi ha una reversió total, mentre que en el cas dels tweets etiquetats amb el hashtag #irony es produïx una atenuació de polaritat. Duem a terme un estudi de l'impacte de la ironia i el sarcasme en l'anàlisi de sentiments, on observem una disminució en el rendiment dels sistemes de PLN desenvolupats per a aquestes tasques quan la ironia està present. Per consegüent, vam explorar la possibilitat d'utilitzar els nostres resultats en detecció d'ironia per a desenvolupar un sistema d'anàlisi de sentiments que considere la presència d'ironia, suposant que la detecció de contingut irònic podria ajudar a millorar la correcta identificació del sentiment expressat en un text donat. Amb aquest objectiu, incorporem emotIDM com la primera fase en un sistema d'anàlisi de sentiments per determinar la polaritat de missatges en Twitter. Hem comparat els nostres resultats amb l'estat de l'art establert en la tasca d'avaluació "Semeval-2015 Task 11", demostrant la importància d'utilitzar informació afectiva en conjunt amb característiques que alerten de la presència de la ironia per exercir anàlisi de sentiments en textos amb llenguatge figurat que provenen de xarxes socials. En resum, hem demostrat la utilitat d'aprofitar diferents aspectes d'informació relativa a l'afecte i les emocions per tractar qüestions relatives a la presència d'ironia en Twitter.Hernández Farias, DI. (2017). Irony and Sarcasm Detection in Twitter: The Role of Affective Content [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90544TESISCompendi