Search CORE

449 research outputs found

Attribute Sentiment Scoring with Online Text Reviews: Accounting for Language Structure and Missing Attributes

Author: Chakraborty Ishita
Kim Minkyung
Sudhir K.
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 19/05/2019
Field of study

The authors address two significant challenges in using online text reviews to obtain fine-grained attribute level sentiment ratings. First, they develop a deep learning convolutional-LSTM hybrid model to account for language structure, in contrast to methods that rely on word frequency. The convolutional layer accounts for the spatial structure (adjacent word groups or phrases) and LSTM accounts for the sequential structure of language (sentiment distributed and modiﬁed across non-adjacent phrases). Second, they address the problem of missing attributes in text in construct-ing attribute sentiment scores—as reviewers write only about a subset of attributes and remain silent on others. They develop a model-based imputation strategy using a structural model of heterogeneous rating behavior. Using Yelp restaurant review data, they show superior accuracy in converting text to numerical attribute sentiment scores with their model. The structural model finds three reviewer segments with different motivations: status seeking, altruism/want voice, and need to vent/praise. Interestingly, our results show that reviewers write to inform and vent/praise, but not based on attribute importance. Our heterogeneous model-based imputation performs better than other common imputations; and importantly leads to managerially significant corrections in restaurant attribute ratings

Yale University

Attribute Sentiment Scoring With Online Text Reviews : Accounting for Language Structure and Attribute Self-Selection

Author: Chakraborty Ishita
Kim Minkyung
Sudhir K.
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/05/2019
Field of study

The authors address two novel and signiﬁcant challenges in using online text reviews to obtain attribute level ratings. First, they introduce the problem of inferring attribute level sentiment from text data to the marketing literature and develop a deep learning model to address it. While extant bag of words based topic models are fairly good at attribute discovery based on frequency of word or phrase occurrences, associating sentiments to attributes requires exploiting the spatial and sequential structure of language. Second, they illustrate how to correct for attribute self-selection—reviewers choose the subset of attributes to write about—in metrics of attribute level restaurant performance. Using Yelp.com reviews for empirical illustration, they ﬁnd that a hybrid deep learning (CNN-LSTM) model, where CNN and LSTM exploit the spatial and sequential structure of language respectively provide the best performance in accuracy, training speed and training data size requirements. The model does particularly well on the “hard” sentiment classiﬁcation problems. Further, accounting for attribute self-selection signiﬁcantly impacts sentiment scores, especially on attributes that are frequently missing

Yale University

Attribute Sentiment Scoring with Online Text Reviews: Accounting for Language Structure and Missing Attributes

Author: Chakraborty Ishita
Kim Minkyung
Sudhir K.
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/05/2019
Field of study

Yale University

スタンス分類における領域知識の活用に関する研究

Author: Sasaki Akira
Publication venue
Publication date: 31/08/2018
Field of study

Tohoku University乾健太郎課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Finetuning BERT and XLNet for Sentiment Analysis of Stock Market Tweets using Mixout and Dropout Regularization

Author: Jangir Shubham
Publication venue: Technological University Dublin
Publication date: 01/01/2021
Field of study

Sentiment analysis is also known as Opinion mining or emotional mining which aims to identify the way in which sentiments are expressed in text and written data. Sentiment analysis combines different study areas such as Natural Language Processing (NLP), Data Mining, and Text Mining, and is quickly becoming a key concern for businesses and organizations, especially as online commerce data is being used for analysis. Twitter is also becoming a popular microblogging and social networking platform today for information among people as they contribute their opinions, thoughts, and attitudes on social media platforms over the years. Because of the large database created by twitter stock market sentiment analysis has always been the subject of interest for various researchers, investors, and scientists due to its highly unpredictable nature. Sentiment analysis can be performed in different ways, but the focus of this study is to perform sentiment analysis using the transformer-based pre-trained models such as BERT(bi-directional Encoder Representations from Transformers) and XLNet which is a Generalised autoregressive model with fewer training instances using Mixout regularization as the traditional machine and deep learning models such as Random Forest, Naïve Bayes, Recurrent Neural Network (RNN), Long short-term memory (LSTM) because fails when given fewer training instances and it required intense feature engineering and processing of textual data. The objective of this research is to study and understand the performance of BERT and XLNet with fewer training instances using the Mixout regularization for stock market sentiment analysis. The proposed model resulted in improved performance in terms of accuracy, precision, recall and f1-score for both the BERT and XLNet models using mixout regularization when given adequate and under-sampled data

Arrow@TUDublin

Affect Lexicon Induction For the Github Subculture Using Distributed Word Representations

Author: Jiao Yuwei
Publication venue: 'University of Waterloo'
Publication date: 02/11/2018
Field of study

Sentiments and emotions play essential roles in small group interactions, especially in self-organized collaborative groups. Many people view sentiments as universal constructs; however, cultural differences exist in some aspects of sentiments. Understanding the features of sentiment space in small group cultures provides essential insights into the dynamics of self-organized collaborations. However, due to the limit of carefully human annotated data, it is hard to describe sentimental divergences across cultures. In this thesis, we present a new approach to inspect cultural differences on the level of sentiments and compare subculture with the general social environment. We use Github, a collaborative software development network, as an example of self-organized subculture. First, we train word embeddings on large corpora and do embedding alignment using linear transformation method. Then we model finer-grained human sentiment in the Evaluation- Potency-Activity (EPA) space and extend subculture EPA lexicon with two-dense-layered neural networks. Finally, we apply Long Short-Term Memory (LSTM) network to analyze the identities’ sentiments triggered by event-based sentences. We evaluate the predicted EPA lexicon for Github community using a recently collected dataset, and the result proves our approach could capture subtle changes in affective dimensions. Moreover, our induced sentiment lexicon shows individuals from two environments have different understandings to sentiment-related words and phrases but agree on nouns and adjectives. The sentiment features of “Github culture” could explain that people in self-organized groups tend to reduce personal sentiment to improve group collaboration

University of Waterloo's Institutional Repository

Recommended from our members

Text Classification With Deep Neural Networks

Author: Huynh Trung
Publication venue
Publication date: 23/08/2019
Field of study

The thesis explores different extensions of Deep Neural Networks in learning underlying natural language representations and how to apply them in Natural Language Processing tasks. Novel methods of learning lower or higher level features of natural languages are given in which word and phrase dense representations are derived from unlabelled corpora. Word representations are learned by training Deep Neural Networks to predict context from each sentence while phrase representations are learned by unsupervised learning with Convolutional Restricted Boltzmann Machine. It is shown that word representations learned from architectures which preserve text input as sequences have better word similarity and relatedness than bag-of-word approaches. Additionally phrase representations learned with Convolutional Restricted Boltzmann Machine when combined with bag-of-word features improve results of text classification tasks over only bag-of-word features. Beside learning word and phrase representations, to the best of my knowledge, the work in the thesis is first to explore Deep Neural Networks in Adverse Drug Reaction detection task where my architectures when used with pre-trained word representations significantly outperform the state-of-the-art models. In addition, outputs from my proposed attentional architecture can be used to highlight important word spans without explicit training labels. In the future I propose the learned representations to be used with the discussed Deep Neural Networks in different NLP tasks such as Dialog Systems, Machine Translation or Natural Language Inference

Open Research Online (The Open University)

Three Essays on the Role of Unstructured Data in Marketing Research

Author: Chakraborty Ishita Sunity Kumar
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2021
Field of study

This thesis studies the use of firm and user-generated unstructured data (e.g., text and videos) for improving market research combining advances in text, audio and video processing with traditional economic modeling. The first chapter is joint work with K. Sudhir and Minkyung Kim. It addresses two significant challenges in using online text reviews to obtain fine-grained attribute level sentiment ratings. First, we develop a deep learning convolutional-LSTM hybrid model to account for language structure, in contrast to methods that rely on word frequency. The convolutional layer accounts for the spatial structure (adjacent word groups or phrases) and LSTM accounts for the sequential structure of language (sentiment distributed and modified across non-adjacent phrases). Second, we address the problem of missing attributes in text in constructing attribute sentiment scores---as reviewers write only about a subset of attributes and remain silent on others. We develop a model-based imputation strategy using a structural model of heterogeneous rating behavior. Using Yelp restaurant review data, we show superior accuracy in converting text to numerical attribute sentiment scores with our model. The structural model finds three reviewer segments with different motivations: status seeking, altruism/want voice, and need to vent/praise. Interestingly, our results show that reviewers write to inform and vent/praise, but not based on attribute importance. Our heterogeneous model-based imputation performs better than other common imputations; and importantly leads to managerially significant corrections in restaurant attribute ratings. The second essay, which is joint work with Aniko Oery and Joyee Deb is an information-theoretic model to study what causes selection in valence in user-generated reviews. The propensity of consumers to engage in word-of-mouth (WOM) differs after good versus bad experiences, which can result in positive or negative selection of user-generated reviews. We show how the strength of brand image (dispersion of consumer beliefs about quality) and the informativeness of good and bad experiences impacts selection of WOM in equilibrium. WOM is costly: Early adopters talk only if they can affect the receiver’s purchase. If the brand image is strong (consumer beliefs are homogeneous), only negative WOM can arise. With a weak brand image or heterogeneous beliefs, positive WOM can occur if positive experiences are sufficiently informative. Using data from Yelp.com, we show how strong brands (chain restaurants) systematically receive lower evaluations controlling for several restaurant and reviewer characteristics. The third essay which is joint work with K.Sudhir and Khai Chiong studies success factors of persuasive sales pitches from a multi-modal video dataset of buyer-seller interactions. A successful sales pitch is an outcome of both the content of the message as well as style of delivery. Moreover, unlike one-way interactions like speeches, sales pitches are a two-way process and hence interactivity as well as matching the wavelength of the buyer are also critical to the success of the pitch. We extract four groups of features: content-related, style-related, interactivity and similarity in order to build a predictive model of sales pitch effectiveness

Yale University