68 research outputs found

    Sentiment Analysis: An Overview from Linguistics

    Get PDF
    Sentiment analysis is a growing field at the intersection of linguistics and computer science, which attempts to automatically determine the sentiment, or positive/negative opinion, contained in text. Sentiment can be characterized as positive or negative evaluation expressed through language. Common applications of sentiment analysis include the automatic determination of whether a review posted online (of a movie, a book, or a consumer product) is positive or negative towards the item being reviewed. Sentiment analysis is now a common tool in the repertoire of social media analysis carried out by companies, marketers and political analysts. Research on sentiment analysis extracts information from positive and negative words in text, from the context of those words, and the linguistic structure of the text. This brief survey examines in particular the contributions that linguistic knowledge can make to the problem of automatically determining sentiment

    Destination image online analyzed through user generated content: a systematic literature review

    Get PDF
    Destination Image is a concept that has been studied for a long time in tourism research. The question of how a destination is perceived by tourists and potential new guests is an important insight, especially for local tourism managers, in order to evaluate the implemented strategies and to plan further tactics. Since the last two decades, due to a drastic digitalization, tourism research is now increasingly examining the Destination Image online. This creates new challenges in the selection of sources, methods, and in data collection. The aim of the present study was to systematically capture the approach to analyze the online Destination Image through User Generated Content using studies from the last ten years. Therefore, a Systematic Literature Review on primary research from academic databases was conducted. As a summary of the findings, a conceptual model was developed, based on the insights of the studies in the dataset, to contribute a guidance for the preparation phase of future online Destination Image research. In short, the main findings are: TripAdvisor.com is the main source for online Destination Image analysis. Researchers recommend using the help of software and programming languages to collect and analyzed the data. Equally to earlier Destination Image studies, the main methods applied in online Destination Image analysis are quantitative content analysis, qualitative content analysis and sentiment analysis. In combination with the examination of cognitive and affective factors, co-occurrence analysis, and correlation analysis. The present study has several limitations, which are: the loss of detail information due to reducing the studies to comparable key parameters, the absence of Anglo-American studies, due to the database selection as well as the lack of quality testing of the studies included.A Destination Image é um conceito que tem sido estudado há muito tempo na investigação turística. A questão de como o destino é visto pelos turistas e pelos potenciais novos hóspedes é uma perspectiva importante, especialmente para os gestores de turismo da região, a fim de avaliar as estratégias implementadas e de planear novas tácticas. Desde as últimas duas décadas, ocorreu uma digitalização drástica, a investigação turística adaptou-se a este fenómeno e está agora a estudar cada vez mais a imagem do destino online. Esta alteração criou novos desafios na selecção de fontes, métodos, e na recolha de dados. O objetivo do presente trabalho foi o de captar, de forma sistemática, as abordagens consideradas para analisar a imagem do destino online utilizando estudos dos últimos dez anos. Para este efeito, os estudos primários dos anos 2010-2020 das bases de dados académicos Web of Science, ProQuest e b-on, foram recolhidos utilizando palavras-chave de pesquisa pré-definidas. O grupo de artigos obtidos como resultado foram subsequentemente sujeitos a avaliação de eligibilidade, como recomendado por Moher et al. (2009). Isto significa que os estudos que não cumpriam os critérios pré-definidos foram excluídos. Os critérios de inclusão foram: O trabalho académico tinha de ser uma referência primária de uma revista científica, escrita em inglês e a amostra analisada tinha de ter uma origem associada à comunicação nas social media online. Posteriormente, os restantes 35 artigos foram transferidos para uma base de dados utilizando uma matriz de codificação. A matriz de codificação foi concebida para capturar os parâmetros-chave de cada estudo primário de uma forma padronizada e, portanto, comparável. Foi considerada informação geral, como o ano, localização e revista publicada, bem como informação temática específica, como o campo do turismo pesquisado e os meios analisados, juntamente com as categorias referentes à metodologia considerada, as ferramentas utilizadas e os resultados obtidos. A base de dados resultante foi então utilizada para obter declarações sobre a abordagem metodológica utilizada na análise da imagem de destinos online. Como resumo dos resultados, foi desenvolvido um modelo conceptual, baseado nos conhecimentos obtidos a partir do grupo de artigos, que constituiu o conjunto de dados para análise, para contribuir com um guião para a fase de preparação de uma futura investigação sobre imagem dos destinos online. Em resumo, as principais conclusões são: TripAdvisor.com é a principal fonte para a análise da imagem de destinos online. Os investigadores recomendam a utilização da ajuda de software e linguagens de programação para a recolha e análise dos dados. À semelhança de estudos anteriores de Destination Image, os principais métodos aplicados na análise imagem dos destinos online são a análise quantitativa do conteúdo, a análise qualitativa do conteúdo e a análise dos sentimentos. Em combinação com a análise dos fatores cognitivos e afectivos, análise de co-ocorrência, e análise de correlação. O presente estudo tem várias limitações. Que são: a perda de informação detalhada devido à redução dos estudos a parâmetros-chave comparáveis, a ausência de estudos anglo-americanos, devido à selecção do banco de dados, bem como a falta de testes de qualidade dos estudos incluídos.(TurExperience - Tourist experiences' impacts on the destination image: searching for new opportunities to the Algarve”)

    Sociolinguistic Variationist Analysis of Word-Emotion Lexicon in Cook Islands English Online News

    Get PDF
    This paper describes how journalists, in the Cook Islands, use sentiment lexicon when reporting online news. To do so, we employ Sentiment Analysis (SA) in combination with sociolinguistic variationist theory and logistic regression analysis. SA relies on the Word-Emotion Association Lexicon source (Mohammad & Turney 2013), which comprises 10,170 lexical items. The bulk of research carried out on sentiment analysis only distinguishes between positive vs. negative emotions. By contrast, we provide a fine-grained coding by exploring how eight specific core emotions (i.e. ANGER, ANTICIPATION, FEAR, DISGUST, JOY, SADNESS, SURPRISE, and TRUST) are socially stratified in formal contexts. We built a small-scale corpus from web-based newspapers to find out (i) whether social factors (age and sex) condition the use of sentiment lexicon and (ii) to evaluate the socially acknowledged generalisations according to which females tend to use sentiment lexicon more than males. The data was quantitatively examined through mixed-effects Rbrul logistic regression analysis. The independent variables include: word class (i.e. nous, adjectives, verbs), sex, age, and word-frequency. Specifically, the latter is a variable involved in language processing and is commonly studied in psycholinguistics, sociolinguistics, and corpus linguistics (Mickiewicz 2019). To account for word-frequency we use the SUBTLEX-US corpus (Brysbaert & New 2009). Our findings suggest that sentiment lexicon is conditioned by age, with young and old speakers favouring the use of sentiment lexicon. Sex, word class, and word-frequency do not have a significant influence on sentiment lexicon in our data.

    Engineers, Aware! Commercial Tools Disagree on Social Media Sentiment : Analyzing the Sentiment Bias of Four Major Tools

    Get PDF
    Large commercial sentiment analysis tools are often deployed in software engineering due to their ease of use. However, it is not known how accurate these tools are, and whether the sentiment ratings given by one tool agree with those given by another tool. We use two datasets - (1) NEWS consisting of 5,880 news stories and 60K comments from four social media platforms: Twitter, Instagram, YouTube, and Facebook; and (2) IMDB consisting of 7,500 positive and 7,500 negative movie reviews - to investigate the agreement and bias of four widely used sentiment analysis (SA) tools: Microsoft Azure (MS), IBM Watson, Google Cloud, and Amazon Web Services (AWS). We find that the four tools assign the same sentiment on less than half (48.1%) of the analyzed content. We also find that AWS exhibits neutrality bias in both datasets, Google exhibits bi-polarity bias in the NEWS dataset but neutrality bias in the IMDB dataset, and IBM and MS exhibit no clear bias in the NEWS dataset but have bi-polarity bias in the IMDB dataset. Overall, IBM has the highest accuracy relative to the known ground truth in the IMDB dataset. Findings indicate that psycholinguistic features - especially affect, tone, and use of adjectives - explain why the tools disagree. Engineers are urged caution when implementing SA tools for applications, as the tool selection affects the obtained sentiment labels.© Owner/Author(s). ACM 2022. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the ACM on Human-Computer Interaction, https://doi.org/10.1145/3532203.fi=vertaisarvioitu|en=peerReviewed

    Founder CEOs and Initial Public Offerings: The Role of Narratives, Institutions and Cultural Context

    Get PDF
    This is a two essay dissertation which explores how founder and non-founder CEOs influence the IPO process and seeks to better understand their impact on IPO performance in a cross-national set of firms. Essay 1 addresses the question ‘how founder and non-founder CEOs’ narratives are portrayed differently in business media.’ Using insights from the narrative paradigm and utilizing qualitative content analysis for 1,057 units of data, I find that founders and non-founders’ media narratives differ in three important ways based on the amount of personal information about founders, how founders talk about their business operations, and positive and negative name association. Essay 2 addresses the related question of ‘how does national context influence the relationship between founder CEO presence and IPO long-run performance across multiple nations?’ Using insights from upper echelon theory and utilizing hierarchical linear modeling to analyze over 1,000 firms, I find that founder CEOs perform best in IPO firms in a national context where managerial discretion is low, uncertainty avoidance is high, and fewer firms have founders as CEO

    Automated Classification of Argument Stance in Student Essays: A Linguistically Motivated Approach with an Application for Supporting Argument Summarization

    Full text link
    This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance. A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text. We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models. An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance

    Towards a science of human stories: using sentiment analysis and emotional arcs to understand the building blocks of complex social systems

    Get PDF
    We can leverage data and complex systems science to better understand society and human nature on a population scale through language --- utilizing tools that include sentiment analysis, machine learning, and data visualization. Data-driven science and the sociotechnical systems that we use every day are enabling a transformation from hypothesis-driven, reductionist methodology to complex systems sciences. Namely, the emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, with profound implications for our understanding of human behavior. Advances in computing power, natural language processing, and digitization of text now make it possible to study a culture\u27s evolution through its texts using a big data lens. Given the growing assortment of sentiment measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts. Most importantly they can aid understanding of texts with reliable and meaningful word shift graphs if (1) the dictionary covers a sufficiently large enough portion of a given text\u27s lexicon when weighted by word usage frequency; and (2) words are scored on a continuous scale. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories, forming patterns that are meaningful to us. By classifying the emotional arcs for a filtered subset of 4,803 stories from Project Gutenberg\u27s fiction collection, we find a set of six core trajectories which form the building blocks of complex narratives. We strengthen our findings by separately applying optimization, linear decomposition, supervised learning, and unsupervised learning. For each of these six core emotional arcs, we examine the closest characteristic stories in publication today and find that particular emotional arcs enjoy greater success, as measured by downloads. Within stories lie the core values of social behavior, rich with both strategies and proper protocol, which we can begin to study more broadly and systematically as a true reflection of culture. Of profound scientific interest will be the degree to which we can eventually understand the full landscape of human stories, and data driven approaches will play a crucial role. Finally, we utilize web-scale data from Twitter to study the limits of what social data can tell us about public health, mental illness, discourse around the protest movement of #BlackLivesMatter, discourse around climate change, and hidden networks. We conclude with a review of published works in complex systems that separately analyze charitable donations, the happiness of words in 10 languages, 100 years of daily temperature data across the United States, and Australian Rules Football games

    Japanese cultural influence in the Philippines through anime\u27s popularity and pervasiveness

    Get PDF
    制度:新 ; 報告番号:甲3676号 ; 学位の種類:博士(学術) ; 授与年月日:2012/6/11 ; 早大学位記番号:新6044Waseda Universit
    corecore