327 research outputs found

    SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations

    Full text link
    Although deep language representations have become the dominant form of language featurization in recent years, in many settings it is important to understand a model's decision-making process. This necessitates not only an interpretable model but also interpretable features. In particular, language must be featurized in a way that is interpretable while still characterizing the original text well. We present SenteCon, a method for introducing human interpretability in deep language representations. Given a passage of text, SenteCon encodes the text as a layer of interpretable categories in which each dimension corresponds to the relevance of a specific category. Our empirical evaluations indicate that encoding language with SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks. Moreover, we find that SenteCon outperforms existing interpretable language representations with respect to both its downstream performance and its agreement with human characterizations of the text.Comment: Accepted to Findings of ACL 202

    Using Text-Analysis Computer Software and Thematic Analysis on the Same Qualitative Data: A Case Example

    Get PDF
    The acceptance and application of qualitative methods has been steadily increasing, and recent advances in computer analytic software programs have produced a rapidly evolving landscape of new methods and analytic tools. However, discussions regarding the use of these new computer-based methods alongside traditional qualitative methods remain sparse. The aim of this article is to present an example of using quantitative text analysis software, the Linguistic Inquiry and Word Count program, alongside a traditional qualitative method, thematic analysis. Data included 46 transcribed life-narratives shared by individuals with schizophrenia. We present findings from both analyses and offer an example of a method that combines these 2 approaches. Results and examples provided are discussed in light of the potential to strengthen analyses by using these methods collaboratively. (PsycINFO Database Record (c) 2017 APA, all rights reserved

    The Impact of Individual and Collective Attribution on Earnings Calls Impression Management By

    Get PDF
    The thesis revolves around the language used by executives on earnings calls to respond to analysts’ questions on business performance and strategy. Every fiscal quarter, most publicly traded companies report financial performance over an audio call to analysts. The equity analysts, in turn, factor in the information conveyed over the call into their fundamental analysis of the company’s stock price, which drives their buy, sell, or hold recommendation. Earning calls consist of two components, the presentation as well as the questions and answers section. The CEO and CFO typically read financial information such as sales numbers modeled after the companies’ 10-K and 10-Q. In the Q&A section, analysts ask questions about specific financial indicators or the firm’s overarching business strategy to which company management can respond. Given that executives cannot predict analysts’ questions with complete certainty, executives’ responses tend to be more unscripted than in the presentation section. Executives often have coaches who provide instruction on how to best respond to questions on challenging situations such as declining profit or impending litigation. Following the completion of the earnings call, the stock price can drastically change if significant news or major guidance revision is disclosed. Written transcripts of earnings calls are typically collected and read post-hoc by investors researching the fundamentals of the company. More and more investment professionals are seeking more information from parsing the tone and syntax of executives’ language on the calls, a field of decision-making literature that this paper seeks to contribute to. The objective of this paper is two-fold: first, understanding how executives currently frame their responses to questions about good and bad events in terms of self-centered and collective attribution. Second, the paper determines the best rhetorical strategy of the two aforementioned options for executives to use to manage the impressions of analysts and clearly communicate business performance. As such, the paper compares and contrasts the language pattern and reveals what pattern garners the most favorable response and perception from investors and the broader audience. The paper finds that in both downturns and strong quarters, analysts and hence the market responds favorably to self-referential, individualist pronoun usage (“I”, “mine”) as opposed to self-referential, collectivist pronouns (“we”, “our”). Other syntactical dimensions such as internal and external attribution are examined as secondary characteristics of executives’ speech, providing additional avenues for further research

    VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter

    Full text link
    Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears and concerns or about their support and belief into the effectiveness of these rapidly developing vaccines. Identifying and understanding the reasons behind public hesitancy towards COVID-19 vaccines is important for policy markers that need to develop actions to better inform the population with the aim of increasing vaccine take-up. In the case of COVID-19, where the fast development of the vaccines was mirrored closely by growth in anti-vaxx disinformation, automatic means of detecting citizen attitudes towards vaccination became necessary. This is an important computational social sciences task that requires data analysis in order to gain in-depth understanding of the phenomena at hand. Annotated data is also necessary for training data-driven models for more nuanced analysis of attitudes towards vaccination. To this end, we created a new collection of over 3,101 tweets annotated with users' attitudes towards COVID-19 vaccination (stance). Besides, we also develop a domain-specific language model (VaxxBERT) that achieves the best predictive performance (73.0 accuracy and 69.3 F1-score) as compared to a robust set of baselines. To the best of our knowledge, these are the first dataset and model that model vaccine hesitancy as a category distinct from pro- and anti-vaccine stance.Comment: Accepted at ICWSM 202

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Deep learning with knowledge graphs for fine-grained emotion classification in text

    Get PDF
    This PhD thesis investigates two key challenges in the area of fine-grained emotion detection in textual data. More specifically, this work focuses on (i) the accurate classification of emotion in tweets and (ii) improving the learning of representations from knowledge graphs using graph convolutional neural networks.The first part of this work outlines the task of emotion keyword detection in tweets and introduces a new resource called the EEK dataset. Tweets have previously been categorised as short sequences or sentence-level sentiment analysis, and it could be argued that this should no longer be the case, especially since Twitter increased its allowed character limit. Recurrent Neural Networks have become a well-established method to classify tweets over recent years, but have struggled with accurately classifying longer sequences due to the vanishing and exploding gradient descent problem. A common technique to overcome this problem has been to prune tweets to a shorter sequence length. However, this also meant that often potentially important emotion carrying information, which is often found towards the end of a tweet, was lost (e.g., emojis and hashtags). As such, tweets mostly face also problems with classifying long sequences, similar to other natural language processing tasks. To overcome these challenges, a multi-scale hierarchical recurrent neural network is proposed and benchmarked against other existing methods. The proposed learning model outperforms existing methods on the same task by up to 10.52%. Another key component for the accurate classification of tweets has been the use of language models, where more recent techniques such as BERT and ELMO have achieved great success in a range of different tasks. However, in Sentiment Analysis, a key challenge has always been to use language models that do not only take advantage of the context a word is used in but also the sentiment it carries. Therefore the second part of this work looks at improving representation learning for emotion classification by introducing both linguistic and emotion knowledge to language models. A new linguistically inspired knowledge graph called RELATE is introduced. Then a new language model is trained on a Graph Convolutional Neural Network and compared against several other existing language models, where it is found that the proposed embedding representations achieve competitive results to other LMs, whilst requiring less pre-training time and data. Finally, it is investigated how the proposed methods can be applied to document-level classification tasks. More specifically, this work focuses on the accurate classification of suicide notes and analyses whether sentiment and linguistic features are important for accurate classification

    The influence of a start-up process on the entrepreneurs’ emotions, deduced by their Twitter accounts

    Get PDF
    In this paper, entrepreneurship has been analysed along a temporal range, stressing the influence that being founded has on an entrepreneur’s emotions. Starting from a vast base of Twitter accounts, through the text analysis, have been extracted ratios concerning the presence within the Tweets of a positive and negative emotions. Those ratios have been calculated equating the amount of specific words referring to that emotions on the total amount of words tweeted every year by the entrepreneurs. Those people are mostly from US, and the investors founding their ideas undertook many different investment strategies. The main focus lays within the effect that the overall founding strategy has on the entrepreneurs’ emotions. The purpose is comparing those emotional ratios with the verification of the founding process. This aims to test how the emotions are affected by the fact that the start-up has been financed, or it has not received any funds. The results have been analysed through Stata, yielding interest findings shown along the whole paper.Nesta dissertação, o empreendedorismo foi analisado dentro de um intervalo de tempo, de modo a fazer entender a maneira como o financiamento influencia as emoções dos empreendedores. Como ponto de partida, vários textos de uma vasta base de contas de Twitter foram analisados com o objectivo de construir rácios que destaquem a presença de emoções positivas e negativas nos Tweets. Estes rácios foram calculados equacionando o montante específico de palavras referentes a este tipo de emoções sobre o montante total de palavras trocadas nos Tweets todos os anos pelos empreendedores. A amostra de empreendedores é maioritariamente dos EUA e os investidores que financiaram as suas ideias utilizaram várias estratégias diferentes de investimento. O propósito é comparar estes “rácios emocionais” com os seus processos de financiamento. Assim sendo, o objetivo é testar como é que as emoções são afetadas pelo facto da start-up ser financiada ou não receber qualquer tipo de fundos. Os resultados foram analisados através do programa Stata e são responsáveis por várias conclusões interessantes que são discutidas ao longo da dissertação

    Linguistic expression and perception of personality in online dating texts and their effect on attraction

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Online daters report difficulties, frustration and anxiety in conveying their desired impression of themselves and from their lack of ability in perceiving another dater’s personality accurately. There is a lack of research on how expression of personality traits in profiles impacts on perception and on assessments of attractiveness. This thesis aims to fill this gap by exploring the expression and perception of personality traits in online dating profile texts, and to examine whether textually expressed personality affects attractiveness. The first two studies employed a linguistic and content analysis approach to determine how personality was expressed in dating profiles across different dating platforms and a comparison creative story text. There was considerable variation in expression indicating that language may not be a reliable indicator of personality. A lens model approach, using Funder’s Realistic Accuracy Model, was taken in the third study where accuracy of personality perception was examined in two contexts to determine whether dating profiles provided more salient trait-related cues to personality. The linguistic and content cues utilised by judges in making personality assessments were investigated. While some accuracy of perception was possible for emotional stability in online dating profiles, it was context dependent and unreliable, and few cues were utilised accurately. The effects of actual and perceived personality, and similarity of personality, on attractiveness were investigated and had not been examined previously in this context. This research shows that actual traits and similarity only affect attraction when it is perceivable, whereas perceived traits and similarity can affect attraction without accurate perception. This thesis illustrates the complexity of accuracy of interpersonal perception in text, and how context drives a considerable amount of the variation in achievement of accuracy. Additionally, the results offer some practical implications for online daters
    • …
    corecore