11,201 research outputs found

    Computational Sarcasm Analysis on Social Media: A Systematic Review

    Full text link
    Sarcasm can be defined as saying or writing the opposite of what one truly wants to express, usually to insult, irritate, or amuse someone. Because of the obscure nature of sarcasm in textual data, detecting it is difficult and of great interest to the sentiment analysis research community. Though the research in sarcasm detection spans more than a decade, some significant advancements have been made recently, including employing unsupervised pre-trained transformers in multimodal environments and integrating context to identify sarcasm. In this study, we aim to provide a brief overview of recent advancements and trends in computational sarcasm research for the English language. We describe relevant datasets, methodologies, trends, issues, challenges, and tasks relating to sarcasm that are beyond detection. Our study provides well-summarized tables of sarcasm datasets, sarcastic features and their extraction methods, and performance analysis of various approaches which can help researchers in related domains understand current state-of-the-art practices in sarcasm detection.Comment: 50 pages, 3 tables, Submitted to 'Data Mining and Knowledge Discovery' for possible publicatio

    Citation Function and Polarity Classification in Biomedical Papers

    Get PDF
    The traditional reference evaluation method treats all citations equally. However, a citation can serve various functions. It may reflect the citing paper author’s motivation as well as his/her true attitude towards the cited paper. Investigating such information can be achieved through citation content analysis. This thesis develops an 8-category classification scheme on citation function and polarity to help understand what role a citation played in scientific papers. A biomedical citation corpus is annotated with this scheme and experimented with supervised machine learning methods. Several types of features that capture the characteristics of citation sentences are extracted by natural language processing techniques to serve as the inputs of automatic classifiers. The importance of cue phrases in citation classification is also addressed and discussed

    Sentiment analysis in geo social streams by using machine learning technique

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesMassive amounts of sentiment rich data are generated on social media in the form of Tweets, status updates, blog post, reviews, etc. Different people and organizations are using these user generated content for decision making. Symbolic techniques or Knowledge base approaches and Machine learning techniques are two main techniques used for analysis sentiments from text. The rapid increase in the volume of sentiment rich data on the web has resulted in an increased interaction among researchers regarding sentiment analysis and opinion (Kaushik & Mishra, 2014). However, limited research has been conducted considering location as another dimension along with the sentiment rich data. In this work, we analyze the sentiments of Geotweets, tweets containing latitude and longitude coordinates, and visualize the results in the form of a map in real time. We collect tweets from Twitter using its Streaming API, filtered by English language and location (bounding box). For those tweets which don’t have geographic coordinates, we geocode them using geocoder from GeoPy. Textblob, an open source library in python was used to calculate the sentiments of Geotweets. Map visualization was implemented using Leaflet. Plugins for clusters, heat maps and real-time have been used in this visualization. The visualization gives an insight of location sentiments

    Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus

    Get PDF
    Aspect-based sentiment analysis (ABSA) is a fine-grained and diverse task in natural language processing. Existing deep learning models for ABSA face the challenge of balancing the demand for finer granularity in sentiment analysis with the scarcity of training corpora for such granularity. To address this issue, we propose an enhanced BERT-based model for multi-dimensional aspect target semantic learning. Our model leverages BERT's pre-training and fine-tuning mechanisms, enabling it to capture rich semantic feature parameters. In addition, we propose a complex semantic enhancement mechanism for aspect targets to enrich and optimize fine-grained training corpora. Third, we combine the aspect recognition enhancement mechanism with a CRF model to achieve more robust and accurate entity recognition for aspect targets. Furthermore, we propose an adaptive local attention mechanism learning model to focus on sentiment elements around rich aspect target semantics. Finally, to address the varying contributions of each task in the joint training mechanism, we carefully optimize this training approach, allowing for a mutually beneficial training of multiple tasks. Experimental results on four Chinese and five English datasets demonstrate that our proposed mechanisms and methods effectively improve ABSA models, surpassing some of the latest models in multi-task and single-task scenarios

    A survey on deep learning in image polarity detection: Balancing generalization performances and computational costs

    Get PDF
    Deep convolutional neural networks (CNNs) provide an effective tool to extract complex information from images. In the area of image polarity detection, CNNs are customarily utilized in combination with transfer learning techniques to tackle a major problem: the unavailability of large sets of labeled data. Thus, polarity predictors in general exploit a pre-trained CNN as the feature extractor that in turn feeds a classification unit. While the latter unit is trained from scratch, the pre-trained CNN is subject to fine-tuning. As a result, the specific CNN architecture employed as the feature extractor strongly affects the overall performance of the model. This paper analyses state-of-the-art literature on image polarity detection and identifies the most reliable CNN architectures. Moreover, the paper provides an experimental protocol that should allow assessing the role played by the baseline architecture in the polarity detection task. Performance is evaluated in terms of both generalization abilities and computational complexity. The latter attribute becomes critical as polarity predictors, in the era of social networks, might need to be updated within hours or even minutes. In this regard, the paper gives practical hints on the advantages and disadvantages of the examined architectures both in terms of generalization and computational cost

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    Sentiment analysis in electronic negotiations

    Get PDF
    The thesis analyzes the applicability of methods of Sentiment Analysis and Predictive Analytics on textual communication in electronic negotiation transcripts. In particular, the thesis focuses on examining whether an automatic classifier can predict the outcome of ongoing, asynchronous electronic negotiations with sufficient accuracy. When combined with influencing factors leading to the specific classification decision, such a classification model could be incorporated into a Negotiation Support System in order to proactively intervene in ongoing negotiations it judges as likely to fail and then to give advice to the negotiators to prevent negotiation failure. To achieve this goal, an existing data set of electronic negotiations was used in a first study to create a Sentiment Lexicon, which tracks verbal indicators for utterances of positive and, respectively, negative polarity. This lexicon was subsequently combined with a simplified, feature-based representation of electronic negotiation transcripts which was then used as training data for various machine learning classifiers in order to let them determine the outcome of the negotiations based on the transcripts in a second study. Here, complete negotiation transcripts were classified as well as partial transcrips in order to assess classification quality in ongoing negotiations. The third study of the thesis sought to refine the classification model with respect to sentence-based granularity. To this end, human coders were classifying negotiation sentences regarding their subjectivity and polarity. The results of this content analysis approach were then used to train sentence-level subjectivity and polarity classifiers. The fourth and final study analyzed different aggregation methods for these sentence-level classification results in order to support the classifiers on negotiation granularity. Different aggregation and classification models were discussed, applied to the negotiation data and subsequently evaluated. The results of the studies show that it is possible to a certain degree to use a sentiment-based representation of negotiation data to automatically determine negotiation outcomes. In combination with the sentence-based classification models, negotiation classification quality increased further. However, this improvement was only found to be significant for complete negotiation transcripts. If only partial transcripts are used specifically to simulate an ongoing negotiation scenario the models tend to behave more erratic and classifcation quality depletes. This result yields the assumption that polarized utterances (positive as well as negative) only carry unequivocal information (with respect to the outcome) towards the end of the negotiation. During the negotiation, the influence of these utterances becomes more ambiguous, hence decreasing classification accuracy on models using a representation based on sentiments. Regarding the original goal of the thesis, which is to provide a basic means to support ongoing negotiations, this means that supporting mechanisms employed by a Negotiation Support System should focus on moderation techniques and resolving of potentially conflicting situations. Approaches that could be used to employ further conflict diagnosis in interaction with the negotiators are given in the final chapter of the thesis, as well as a discussion of potential recommendations and advice the system could give and lastly, approaches to visualize the classification data to the negotiators.Im Rahmen der Arbeit wurde die Anwendbarkeit von Methoden der Sentiment Analysis und Predictive Analytics auf textuelle Kommunikation in elektronischen Verhandlungen untersucht. Insbesondere sollte ermittelt werden, ob ein automatisiertes Klassifikationsverfahren in laufenden, asynchron geführten elektronischen Verhandlungen mit hinreichender Genauigkeit den Verhandlungsausgang vorhersagen kann. Eine solche Klassifikation, kombiniert mit den Einflussfaktoren, die zu der entsprechenden Klassifikation geführt haben, könnte dann im Rahmen eines Verhandlungsunterstützungssystems genutzt werden, um proaktiv in die Verhandlung einzugreifen um ggf. einen erfolglosen Ausgang der Verhandlung zu verhindern. Basierend auf einem existierenden Datensatz elektronischer Verhandlungen wurde hierzu in einer ersten Studie ein sogenanntes Sentiment-Lexikon erstellt, welches Indikatoren für positive bzw. negative Äußerungen sammelt. Dieses Lexikon sowie eine vereinfachte, Feature-basierte Repräsentation der Verhandlungsdaten diente in einer zweiten Studie als Grundlage, um maschinelle Lernverfahren zu trainieren, die das Resultat der Verhandlung basierend auf den textuellen Daten ermitteln sollten. Die Verfahren wurden sowohl auf vollständigen als auch auf partiellen Verhandlungstranskripten angewendet, um die Klassifikationsqualität in laufenden Verhandlungen bestimmen zu können. Im Rahmen einer dritten Studie wurde eine Verfeinerung des Lernverfahrens auf der Granularität einzelner Sätze durchgeführt. Hierzu wurden Sätze aus Verhandlungen von menschlichen Codern hinsichtlich Subjektivität vs. Objektivität und Polarität (positiv vs. negativ) bewertet. Die Resultate dieser Inhaltsanalyse dienten als Input für maschinelle Lernverfahren, die automatisiert Sätze bezüglich der beiden genannten Dimensionen klassifizieren. In einer finalen Integrationsstudie wurden die Ergebnisse der Klassifikationsverfahren auf Satz-Ebene aggregiert und verwendet um die Klassifikation auf Verhandlungsebene zu unterstützen. Hierbei wurden verschiedene Alternativen zur Aggregation durchgeführt und bewertet. Die Resultate der einzelnen Studien zeigen, dass es mit Abstrichen möglich ist, mit einer Sentiment-basierten Repräsentation von Verhandlungsdaten das Ergebnis einer Verhandlung vorherzusagen. Insbesondere wenn die Klassifikationsmodelle mit feingranularen Informationen angereichert werden, steigt die Qualität der Vorhersage für einzelne Modelle weiter signifikant an. Dies trifft jedoch nur auf Transkripte vollständiger Verhandlungen zu werden nur partielle Transkripte verwendet im Sinne einer möglichst frühzeitigen Vorhersage des Resultats verhalten sich die Modelle erratischer und die Genauigkeit degeneriert. Die mit diesem Resultat verbundene Annahme ist, dass polarisierte Äußerungen (positiv wie negativ) in erster Linie gegen Ende der Verhandlung eindeutige Informationen liefern insbesondere Sentiments in der Mitte der Transkripte scheinen der Klassifikationsqualität eher abträglich. Für konkrete proaktive Unterstützungsmaßnahmen, die ein Verhandlungsunterstützungssystem zu diesem Zeitpunkt ergreifen kann bedeutet dies in erster Linie, dass diese Maßnahmen im Falle dass die Verhandlung zu scheitern droht auf eine Moderation und Auflösung eventueller Konfliktsituationen abzielen sollten. Hierzu werden im Rahmen des Ausblicks in der Thesis ausführlich denkbare Ansätze zur weiteren Konfliktdiagnose in Interaktion mit den Nutzern, Ansätze für Empfehlungen und Ratschlägen, die das System geben kann, sowie Visualisierungsansätze diskutiert
    • …
    corecore