10,551 research outputs found

    Inferring Affective Meanings of Words from Word Embedding

    Get PDF
    Affective lexicon is one of the most important resource in affective computing for text. Manually constructed affective lexicons have limited scale and thus only have limited use in practical systems. In this work, we propose a regression-based method to automatically infer multi-dimensional affective representation of words via their word embedding based on a set of seed words. This method can make use of the rich semantic meanings obtained from word embedding to extract meanings in some specific semantic space. This is based on the assumption that different features in word embedding contribute differently to a particular affective dimension and a particular feature in word embedding contributes differently to different affective dimensions. Evaluation on various affective lexicons shows that our method outperforms the state-of-the-art methods on all the lexicons under different evaluation metrics with large margins. We also explore different regression models and conclude that the Ridge regression model, the Bayesian Ridge regression model and Support Vector Regression with linear kernel are the most suitable models. Comparing to other state-of-the-art methods, our method also has computation advantage. Experiments on a sentiment analysis task show that the lexicons extended by our method achieve better results than publicly available sentiment lexicons on eight sentiment corpora. The extended lexicons are publicly available for access

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    Unsupervised and knowledge-poor approaches to sentiment analysis

    Get PDF
    Sentiment analysis focuses upon automatic classiffication of a document's sentiment (and more generally extraction of opinion from text). Ways of expressing sentiment have been shown to be dependent on what a document is about (domain-dependency). This complicates supervised methods for sentiment analysis which rely on extensive use of training data or linguistic resources that are usually either domain-specific or generic. Both kinds of resources prevent classiffiers from performing well across a range of domains, as this requires appropriate in-domain (domain-specific) data. This thesis presents a novel unsupervised, knowledge-poor approach to sentiment analysis aimed at creating a domain-independent and multilingual sentiment analysis system. The approach extracts domain-specific resources from documents that are to be processed, and uses them for sentiment analysis. This approach does not require any training corpora, large sets of rules or generic sentiment lexicons, which makes it domain- and languageindependent but at the same time able to utilise domain- and language-specific information. The thesis describes and tests the approach, which is applied to diffeerent data, including customer reviews of various types of products, reviews of films and books, and news items; and to four languages: Chinese, English, Russian and Japanese. The approach is applied not only to binary sentiment classiffication, but also to three-way sentiment classiffication (positive, negative and neutral), subjectivity classifiation of documents and sentences, and to the extraction of opinion holders and opinion targets. Experimental results suggest that the approach is often a viable alternative to supervised systems, especially when applied to large document collections

    Making the FTC ☺: An Approach to Material Connections Disclosures in the Emoji Age

    Get PDF
    In examining the rise of influencer marketing and emoji’s concurrent surge in popularity, it naturally follows that emoji should be incorporated into the FTC’s required disclosures for sponsored posts across social media platforms. While current disclosure methods the FTC recommends are easily jumbled or lost in other text, using emoji to disclose material connections would streamline disclosure requirements, leveraging an already-popular method of communication to better reach consumers. This Note proposes that the FTC adopts an emoji as a preferred method of disclosure for influencer marketing on social media. Part I discusses the rise of influencer marketing, the FTC and its history of regulating sponsored content, and the current state of regulation. Part II explores the proliferation of emoji as a method of communication, and the role of the Unicode Consortium in regulating the adoption of new emoji. Part III makes the case for incorporating emoji as a method of disclosure to bridge compliance gaps, and offers additional recommendations to increase compliance with existing regulations

    Rhetorical outcomes: A genre analysis of student service-learning writing

    Get PDF
    Service-learning continues to be a popular pedagogical approach within composition studies. Despite a number of studies that document a range of positive impacts on students, faculty, institutions, and community members, the relationship between service-learning and student writing outcomes is not well understood. This study presents the results of a genre analysis of student-authored ethnographies composed in four distinct sections of a service-learning--based intermediate writing course at a Midwestern urban research university. Results of the analysis are then used to develop a contextualized writing assessment framework to evaluate student writing outcomes and to consider the implications of using contemporary genre theory for both service-learning and writing program assessment

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented

    Algoritmos de análise de alinhamento de ofertas do mercado de trabalho

    Get PDF
    A adoção de tecnologias digitais tem prometido a aceleração e agilidade das atividades laborais, processos e modelos de negócio. No entanto, as promessas de ganho estão associados a uma forte necessidade de profissionais qualificados que possuam a capacidade de aplicar o potencial da tecnologia de forma eficiente. Os contextos de trabalhos estão a ser remodelados à medida que novos modelos de interação e integração humana e tecnológica evoluem. Para ser possível aumentar a prontidão do mercado de trabalho em contextos de rápida mudança, é importante que os intervenientes nas empresas, profissionais e responsáveis por políticas públicas estejam cientes da dinâmica e das necessidades do mercado de trabalho. Isto pode ser observado pela lista de anúncios de emprego, contudo, o seu elevado número, requer ferramentas eficientes com o objetivo de os analisar e simplificá-los para que, consequentemente, seja possível retirar conclusões atempadamente e corretamente. Como as propostas de emprego possuem formulações distintas para postos semelhantes, dependendo das empresas que estão a contratar, surge o desafio de estabelecer um equilíbrio para a comparação de propostas de emprego. Neste trabalho, foi feita uma tentativa de mapeamento das propostas de emprego nas ocupações da ESCO (European Skills/Competences, qualifications and Occupations). ESCO é uma ontologia publicada pela União Europeia e as suas ocupações traduzem-se em cargos num determinado emprego e têm associadas as competências essenciais e facultativas ao exercício da ocupação. A ELK (Elasticsearch, Logstash, Kibana) stack foi usada para lidar com o grande volume de propostas de emprego. ELK é uma ferramenta estável que gere avultadas quantidades de dados e, a camada do Kibana, possibilita a rápida exploração dos dados e a criação de painéis de visualização. Os resultados mostram que a ELK stack é uma ferramenta adequada para providenciar uma interpretação visual das dinâmicas do mercado de trabalho. Foram testadas várias formas de alinhamento entre ofertas reais de emprego e as ocupações ESCO. Os melhores resultados revelaram um f1-score de mais de 0.8 no mapeamento de ofertas de emprego de ocupação de nível 1 da ESCO e uma exatidão de 63.75% quando houve a tentativa de prever a ocupação de nível 5. Estes resultados estão alinhados com o estado da arte presente na literatura e são bastante promissores, especialmente quando comparado ao patamar inicial de 40%, e mostra que a ESCO é um bom candidato para estabelecer esse equilíbrio em que permite a comparação das dinâmicas no mercado de trabalho para ambientes distintos.The adoption of digital technologies promises to accelerate the transformation and the agility of processes, work activities and revenue models. Yet, the promised gains come together with dramatic needs for qualified professionals who can effectively leverage the technology potential. Job contexts are being reshaped as new models for the interaction and integration of humans and technologies take shape. To increase the readiness of the job market in fast-changing contexts all stakeholders {companies, professionals, policymakers must be aware of the job market dynamics and needs. These dynamics can be observed from the collection of job announcements, but its high volume requires effective tools for analyzing and simplifying it to draw timely and correct conclusions. As job announcements have distinct formulations for similar roles, depending on the hiring company, this raises the necessity of establishing a common ground for comparing the job offers. In this work, an attempt at mapping job offers to ESCO (European Skills/Competences, qualifications and Occupations) occupations is made. ESCO is an ontology published by the European Union and its occupations are job positions with the mandatory and optional skills associated. ELK (Elasticsearch, Logstash, Kibana) stack was used for dealing with the high volume of job announcements. ELK is a stable tool that can manage large quantities of data and has an effective text search algorithm, the Kibana layer enables the rapid exploration of data and creation of visualization dashboards. Results show that the ELK stack is a suitable tool for providing a visual interpretation of the job market dynamics. Several strategies were tested to align real job offerings with ESCO occupations and the best one revealed an f1 score of over 0.8 in mapping job offers to level 1 ESCO occupations and an accuracy of 63.75% when trying to predict the level 5 Occupation. These results are comparable to the state-of- the-art and are very promising, especially when compared to the baseline of 40%, and shows that ESCO is a good candidate as common ground to enable the comparison of job market dynamics for distinct environments.Mestrado em Engenharia Informátic

    The Mapmaker’s Dilemma in Evaluating High-End Inequality

    Get PDF
    The last thirty years have witnessed rising income and wealth concentration among the top 0.1% of the population, leading to intense political debate regarding how, if at all, policymakers should respond. Often, this debate emphasizes the tools of public economics, and in particular optimal income taxation. However, while these tools can help us in evaluating the issues raised by high-end inequality, their extreme reductionism—which, in other settings, often offers significant analytic payoffs—here proves to have serious drawbacks. This Article addresses what we do and don’t learn from the optimal income tax literature regarding high-end inequality, and what other inputs might be needed to help one evaluate the relevant issues

    FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT

    Get PDF
    Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text
    corecore