285 research outputs found

    Ανάλυση Συναισθημάτων σε Κείμενα Μικρού Μήκους

    Get PDF
    Το θέμα αυτής της πτυχιακής εργασίας είναι η ανάλυση συναισθήματος σε κείμενα μικρού μήκους με την χρήση ακολουθιακών κανόνων κλάσεων, γνωστοί και ως Class Sequential Rules(CSR). Τα κείμενα μικρού μήκους έχουν κάποια ιδιαίτερα χαρακτηριστικά τα οποία εμποδίζουν παλαιότερες τεχνικές μηχανικής μάθησης να αποδώσουν αξιοπρεπώς. Για τον συγκεκριμένο λόγο παρουσιάζεται αυτή η τεχνική για την ανάλυση συναισθημάτων. Στόχος είναι η επεξεργασία των κειμένων , η εξαγωγή χαρακτηριστικών από αυτά και ταξινόμησή τους σε μία από τις κατηγορίες συναισθήματος. Τα συναισθήματα αυτά είναι η χαρά, η έκπληξη, η θλίψη, ο θυμός ή το κενό(κανένα συναίσθημα). Αρχικά έχουμε δύο σετ δεδομένων με πραγματικά παραδείγματα κείμενων μικρού μήκους ένα κύριο και ένα βοηθητικό, τα οποία επεξεργαζόμαστε και βρίσκουμε τις προτάσεις κάθε κειμένου. Με βάση ένα λεξικό βρίσκονται τα δύο κυρίαρχα συναισθήματα για κάθε μια πρόταση κειμένου. Το λεξικό περιέχει χρήσιμες λέξεις μαζί με μια ετικέτα για κάθε κατηγορία συναισθήματος. Με βάση τα παραπάνω συναισθήματα , οι προτάσεις μετατρέπονται σε ακολουθίες συναισθημάτων. Έπειτα, με την χρήση του CSR παράγονται οι ακολουθιακοί κανόνες, οι οποίοι χρησιμοποιούνται για την δημιουργία των χαρακτηριστικών εκπαίδευσης ενός ταξινομητή. Αυτός εξάγει το τελικό συναίσθημα κάθε κειμένου. Τέλος υπολογίζεται το ποσοστό επιτυχίας του ταξινομητή.The present thesis deals with the sentiment analysis in microblog texts with the use of class sequential rules, known as CSR. Microblog texts have some special features that prevent older machine learning techniques from performing properly. For this reason, this technique of emotion classification is presented. The purpose is to edit texts, extract features from them and classify them into one of the categories of emotion. These feelings are joy, surprise, sadness, anger or emptiness (no emotion). Initially we have two datasets with real world examples of microblog texts a training and a test set, which we process and find the sentences of each text. With the use of a lexicon, two dominant emotions are extracted for each text sentence. This lexicon contains words along with a label for each emotion category. Based on the above feelings, the sentences are converted into sequences of emotion labels. Then, using CSR produces the sequential rules, which are used to create a classifier's trai ning features. This extracts the emotion of each text. Finally, the success rate of the classifier is calculated

    On the “Easy” Task of Evaluating Chinese Irony Detection

    Get PDF

    Leveraging writing systems changes for deep learning based Chinese affective analysis

    Get PDF
    Affective analysis of social media text is in great demand. Online text written in Chinese communities often contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and minor text using Latin letters, an alphabet-based writing system. This phenomenon is referred to as writing systems changes (WSCs). Past studies have shown that WSCs often reflect unfiltered immediate affections. However, the use of WSCs poses more challenges in Natural Language Processing tasks because WSCs can break the syntax of the major text. In this work, we present our work to use WSCs as an effective feature in a hybrid deep learning model with attention network. The WSCs scripts are first identified by their encoding range. Then, the document representation of the text is learned through a Long Short-Term Memory model and the minor text is learned by a separate Convolution Neural Network model. To further highlight the WSCs components, an attention mechanism is adopted to re-weight the feature vector before the classification layer. Experiments show that the proposed hybrid deep learning method which better incorporates WSCs features can further improve performance compared to the state-of-the-art classification models. The experimental result indicates that WSCs can serve as effective information in affective analysis of the social media text

    Mining Twitter Sequences of Product Opinions with Multi-Word Aspect Terms

    Get PDF
    Social media platforms have opened doors to users\u27 opinions and perceptions. The text remains the most popular means of contact on social media, despite different means of communication (audio/video and images). Twitter is one such microblogging platform that allows people to express their thoughts within 280 characters per message. The freedom of expression has made it difficult to understand the polarity (Positive, Negative, or Neutral) of the tweets/posts. Given a corpus of microblog texts (e.g., the new iPhone battery life is good, but camera quality is bad ), mining aspects (e.g., battery life, camera quality) and opinions (e.g., good, bad) of these products are challenging due to the vast data being generated. Aspect-Based Opinion Mining (ABOM) is thus a combination of aspect extraction and opinion mining that allows an enterprise to analyze the data in detail, saving time and money automatically. Existing systems such as Hate Crime Twitter Sentiment (HCTS) and Microblog Aspect Miner (MAM) have been recently proposed to perform ABOM on Twitter. These systems generally go through the four-step approach of obtaining microblog posts, identifying frequent nouns (candidate aspects), pruning the candidate aspects, and getting opinion polarity. However, they differ in how well they prune their candidate features. HCTS uses Apriori based Association rule mining to find the important aspects (single and multi word) of a given product. However, the Apriori based system generate many candidate sequences which generates redundant candidate aspects and HCTS also fails to summarize the category of the aspects (Camera? Battery?). MAM follows the similar approach to that of HCTS for finding the relevant aspects but it further clusters the frequent nouns (aspects) to obtain the relevant aspects. However, it does not identify the multi-word aspects and the aspect category of a product. This thesis proposes a system called Microblog Aspect Sequence Miner (MASM) as an extension of Microblog Aspect Miner (MAM) by replacing the Apriori algorithm with the modified frequent sequential pattern mining algorithm. The system uses the power of sequential pattern mining for aspect extraction in ABOM. The sentiments of the tweets are unknown, so we build our approach in an unsupervised learning manner. The input posts are first classified to identify those tweets which contain the opinion (subjective) to those that do not have any opinion (objective). Then we extract the Parts of Speech tags for the explicit aspects to identify the frequent nouns. The novel frequent pattern mining framework (CM-SPAM) is applied to segment the single and multi-word aspects which generates less sequences as compared to previous approaches. This prior knowledge helps us to operate a topic modeling framework (Latent Dirichlet Allocation) to determine the summary of most common aspects (Aspect Category) and their sentiments for a product. Thefindings demonstrate that the MASM model has a promising performance in finding relevant aspects with reduction of average vector size (cost of candidate/aspect generation) against the MAM and HCTS using the Sanders Twitter corpus dataset. Experimental results with evaluation metrics of execution time, precision, recall, and F-measure indicate that our approach has higher recall and precision than the existing systems

    FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT

    Get PDF
    Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text
    corecore