Search CORE

946 research outputs found

Popularity Prediction of Reddit Texts

Author: Rohlin Tracy
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2016
Field of study

Popularity prediction is a useful technique for marketers to anticipate the success of marketing campaigns, to build recommendation systems that suggest new products to consumers, and to develop targeted advertising. Researchers likewise use popularity prediction to measure how popularity changes within a community or within a given timespan. In this paper, I explore ways to predict popularity of posts in reddit.com, which is a blend of news aggregator and community forum. I frame popularity prediction as a text classification problem and attempt to solve it by first identifying topics in the text and then classifying whether the topics identified are more characteristic of popular or unpopular texts. This classifier is then used to label unseen texts as popular or not dependent on the topics found in these new posts. I explore the use of Latent Dirichlet Allocation and term frequency-inverse document frequency for topic identification and naïve Bayes classifiers and support vector machines for classification. The relation between topics and popularity is dynamic -- topics in Reddit communities can wax and wane in popularity. Despite the inherent variability, the methods explored in the paper are effective, showing prediction accuracy between 60% and 75%. The study contributes to the field in various ways. For example, it provides novel data for research and development, not only for text classification but also for the study of relation between topics and popularity in general. The study also helps us better understand different topic identification and classification methods by illustrating their effectiveness on real-life data from a fast-changing and multi-purpose websit

SJSU ScholarWorks

Topic modelling of Finnish Internet discussion forums as a tool for trend identification and marketing applications

Author: Särkiö Ilkka
Publication venue
Publication date: 12/03/2019
Field of study

The increasing availability of public discussion text data on the Internet motivates to study methods to identify current themes and trends. Being able to extract and summarize relevant information from public data in real time gives rise to competitive advantage and applications in the marketing actions of a company. This thesis presents a method of topic modelling and trend identification to extract information from Finnish Internet discussion forums. The development of text analytics, and especially topic modelling techniques, is reviewed and suitable methods are identified from the literature. The Latent Dirichlet Allocation topic model and the Dynamic Topic Model are applied in finding underlying topics from the Internet discussion forum data. The discussion data collection with web scarping and text data preprocessing methods are presented. Trends are identified with a method derived from outlier detection. Real world events, such as the news about Finnish army vegetarian meal day and the Helsinki summit of presidents Trump and Putin, were identified in an unsupervised manner. Applications for marketing are considered, e.g. automatic search engine advert keyword generation and website content recommendation. Future prospects for further improving the developed topical trend identification method are proposed. This includes the use of more complex topic models, extensive framework for tuning trend identification parameters and studying the use of more domain specific text data sources such as blogs, social media feeds or customer feedback

Aaltodoc Publication Archive

Analyzing the Language of Food on Social Media

Author: Bell Dane
Fried Daniel
Hingle Melanie
Kobourov Stephen
Surdeanu Mihai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/09/2014
Field of study

We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201

arXiv.org e-Print Archive

Crossref

Sentiment analysis in hospitality using text mining: the case of a Portuguese eco-hotel

Author: Calheiros Ana Catarina dos Santos
Publication venue
Publication date: 01/01/2015
Field of study

Jel Classification System: Z32 Tourism and Development; M30 Marketing and AdvertisingThe rapid development of the Internet and mobile devices enabled the emergence of travel and hospitality review sites, leading to a large number of customer opinion posts. While such comments may influence future demand of the targeted hotels, they can also be used by hotel managers for improving customer experience. Nevertheless, this trend poses a problem, considering information is widely scattered, making almost impossible to extract from it useful knowledge. In this study, with the aim of facilitating this process, sentiment classification of an eco-hotel is assessed through a text mining approach using several different sources of customer reviews. Two dictionaries are compiled for building the lexicon used to parse the 401 reviews collected from a Portuguese eco-hotel between January and August of 2015. Then, the latent Dirichlet allocation (LDA) modeling algorithm is applied to gather relevant topics that characterize a given hospitality issue by a sentiment. Findings of this study state that accuracy is influenced by interaction between LDA generated topic models and the correct construction of both dictionaries. These results also reveal that text mining can generate new insights into variables that have been extensively studied in hospitality industry, including that hotel food generates ordinary positive sentiments for the case studied, while hospitality generates both ordinary and strong positive feelings. Such results are valuable for hospitality management, validating the approach proposed.O rápido desenvolvimento da Internet e dos dispositivos móveis possibilitou o aparecimento de sites de viagens e sites de opinião na indústria hoteleira, levando a um grande número opiniões publicadas por parte do cliente. Embora, esses comentários possam influenciar a procura futura de certos hotéis, estes também podem ser usados pelos gestores dos hotéis para melhorar a experiência do cliente. No entanto, esta tendência representa um problema, uma vez que hoje em dia a informação se apresenta bastante ampla e dispersa, tornando quase impossível analisar todas as opiniões de clientes. Neste estudo, com o objetivo de facilitar este processo, a classificação de sentimentos de um hotel ecológico é avaliada através de uma abordagem de “text mining” usando diversas fontes de comentários de clientes. Dois dicionários foram compilados para a construção do léxico usado para analisar os 401 comentários recolhidos a partir de um Eco hotel português entre janeiro e agosto de 2015. Em seguida, o algoritmo de modelação “latent Dirichlet allocation” (LDA) é aplicado para reunir tópicos relevantes que caracterizam uma determinada questão de hospitalidade por um sentimento. Os resultados apurados neste estudo focam essencialmente que a precisão do mesmo é influenciada pela interação entre o modelo LDA, neste caso entre os tópicos por ele gerados e a correta construção de ambos os dicionários. Estes resultados revelam também que o “text mining” pode gerar novas perspetivas acerca de variáveis que têm sido extensivamente estudadas na indústria hoteleira, incluindo, no caso estudado, que a comida do hotel gera sentimentos positivos comuns, enquanto a hospitalidade gera ambos os sentimentos: positivos comuns e positivos fortes. Tais resultados são valiosos para a gestão hoteleira validando a abordagem proposta

Repositório Institucional do ISCTE-IUL

Recommender Systems

Author: Adamic
Adomavicius
Agarwal
Albert
Anderson
Arndt
Balabanović
Barabási
Basu
Bell
Berge
Billsus
Blattner
Blei
Blei
Boccaletti
Bollobás
Bollobás
Bollé
Bollé
Bone
Bonhard
Bouchaud
Breiman
Brin
Brynjolfsson
Buckley
Buckley
Burkard
Burke
Burke
Burke
Cacheda
Caldarelli
Campos
Candés
Candés
Carlin
Castellano
Castells
Cattuto
Cattuto
Chebotarev
Chen
Chevalier
Chi Ho Yeung
Cho
Chou
Cimini
Clauset
Claypool
Cooke
Costa
Dellarocas
Dellarocas
Ding
Dorogovtsev
Ellero
Erdös
Esslimani
Euler
Fortunato
Fouss
Franceschet
Gao
Geman
Gemulla
Ghoshal
Golbeck
Goldberg
Goldberg
Goldstein
Griffiths
Grujić
Gualdi
Gualdi
Guo
Gupta
Hagel
Hanely
He
Herlocker
Herlocker
Herlocker
Herr
Hofmann
Hofmann
Holme
Holmes
Hotho
Hu
Huang
Huang
Huang
Hurley
Hwang
Hwang
Jaccard
Jamali
Jansen
Jeh
Jeong
Jia
Jin
Järvelin
Jøsang
Katz
Kendall
Keshavan
Keshavan
Klamt
Klein
Kobsa
Kolda
Kong
Koren
Koren
Koren
Kwak
Laherrère
Lam
Lambiotte
Lambiotte
Lathia
Lathia
Latora
Laureti
Leicht
Leskovec
Liben-Nowell
Linden
Linyuan Lü
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Lü
Lü
Lü
Lü
Ma
Mantegna
Maslov
Massa
Massa
Matúš Medo
Mcnee
Medo
Medo
Medo
Melville
Mika
Milgram
Min
Mobasher
Moffat
Moreno
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Palla
Pan
Pan
Pastor-Satorras
Pastor-Satorras
Pazzani
Pazzani
Pazzani
Phelps
Popescul
Qiu
Quillian
Ravasz
Ren
Resnick
Resnick
Rodgers
Romero
Sabater
Salganik
Salter
Salton
Schafer
Schein
Shang
Shang
Shang
Shang
Shardanand
Si
Simmel
Smyth
Song
Song
Spearman
Stojmirović
Su
Sun
Symeonidis
Symeonidis
Sørensen
Tang
Tao Zhou
Taramasco
Tong
Tribus
Tso
Turner
van Rijsbergen
Vazquez
Vespignani
Vig
Vázquez
Vázquez
Walter
Wang
Wang
Wang
Wasserman
Watts
Watts
Wei
Weibull
Witten
Wu
Xiang
Xuan
Yang
Yao
Yedidia
Yeung
Yeung
Yi-Cheng Zhang
Yin
Yu
Zeng
Zeng
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zheng
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zi-Ke Zhang
Ziegler
Ziegler
Zlatić
Publication venue: 'Elsevier BV'
Publication date: 06/02/2012
Field of study

The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports

arXiv.org e-Print Archive

Crossref

Aston Publications Explorer

RERO DOC Digital Library

LDA-Based Industry Classification

Author: Datta Anindya
Dutta Kaushik
Fang Fang
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2013
Field of study

Industry classification is a crucial step for financial analysis. However, existing industry classification schemes have several limitations. In order to overcome these limitations, in this paper, we propose an industry classification methodology on the basis of business commonalities using the topic features learned by the Latent Dirichlet Allocation (LDA) from firms’ business descriptions. Two types of classification – firm-centric classification and industry-centric classification were explored. Preliminary evaluation results showed the effectiveness of our method

AIS Electronic Library (AISeL)

ScholarBank@NUS

Modeling Dynamic User Interests: A Neural Matrix Factorization Approach

Author: Aral Sinan
Dhillon Paramveer
Publication venue
Publication date: 12/02/2021
Field of study

In recent years, there has been significant interest in understanding users' online content consumption patterns. But, the unstructured, high-dimensional, and dynamic nature of such data makes extracting valuable insights challenging. Here we propose a model that combines the simplicity of matrix factorization with the flexibility of neural networks to efficiently extract nonlinear patterns from massive text data collections relevant to consumers' online consumption patterns. Our model decomposes a user's content consumption journey into nonlinear user and content factors that are used to model their dynamic interests. This natural decomposition allows us to summarize each user's content consumption journey with a dynamic probabilistic weighting over a set of underlying content attributes. The model is fast to estimate, easy to interpret and can harness external data sources as an empirical prior. These advantages make our method well suited to the challenges posed by modern datasets. We use our model to understand the dynamic news consumption interests of Boston Globe readers over five years. Thorough qualitative studies, including a crowdsourced evaluation, highlight our model's ability to accurately identify nuanced and coherent consumption patterns. These results are supported by our model's superior and robust predictive performance over several competitive baseline methods

arXiv.org e-Print Archive

DSpace@MIT