285 research outputs found
Content and Context: Identifying the Impact of Qualitative Information on Consumer Choice
Managers and researchers alike suspect that the vast amounts of qualitative information in blogs, reviews, news stories, and experts’ advice influence consumer behavior. But, does qualitative information impact or rather reflect consumer choices? We argue that because message content and consumer choice are endogenous, non-random selection and conflation of awareness and persuasion complicate causal estimation of the impact of message content on outcomes. We apply Latent Dirichlet Allocation to characterize the topics of transcribed content from 2,397 stock recommendations provided by Jim Cramer on his show Mad Money. We demonstrate that selection bias and audience prior awareness create measurable biases in estimates of the impact of content on stock prices. Comparing recommendation content to prior news, we show that he is less persuasive when he uses more novel arguments. The technique we develop can be applied in a variety of settings where marketers can present different messages depending on what subjects know
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Text miner's little helper: scalable self-tuning methodologies for knowledge exploration
L'abstract è presente nell'allegato / the abstract is in the attachmen
Recommended from our members
Link formation in mobile and economic networks : model and empirical analysis
In this dissertation, we study three link formation problems in mobile and economic networks: (i) company matching for mergers and acquisitions (M&A) network in the high-technology (high-tech) industry, (ii) mobile application (app) matching for cross promotion network in mobile app markets, and (iii) online friendship formation in mobile social networks. Each problem can be modeled as link formation problem in a graph, where nodes represent independent entities (e.g., companies, apps, users) and edges represent interactions (e.g., transactions, promotions, friendships) among the nodes. First, we propose a new data-analytic approach to measure firms' dyadic business proximity to analyze M&A network in the high-tech industry. Specifically, our method analyzes the unstructured texts that describe firms' businesses using latent Dirichlet allocation (LDA) topic modeling, and constructs a novel business proximity measure based on the output. Using CrunchBase data including 24,382 high-tech companies and 1,689 M&A transactions, we empirically validate our business proximity measure in the context of industry intelligence and show the measure's effectiveness in an application of M&A network analysis. Based on the research, we build a cloud-based information system to facilitate competitive intelligence on the high-tech industry. Second, we analyze mobile app matching for cross promotion network in mobile app markets. Cross promotion (CP) is a new app promotion framework, in which a mobile app is promoted to the users of another app. Using IGAWorks data covering 1,011 CP campaigns, 325 apps, and 301,183 users, we evaluate the effectiveness of CP campaigns in comparison with existing ad channels such as mobile display ads. While CP campaigns, on average, are still suboptimal as compared with display ads, we find evidence that a careful matching of mobile apps can significantly improve the effectiveness of CP campaigns. Our empirical results show that app similarity, measured by LDA from apps' text descriptions, is a significant factor that increases the user engagement in CP campaigns. With this observation, we propose an app matching mechanism for the CP network to improve the ad effectiveness. Third, we study friendship network formation in a location-based social network. We build a structural model of social link creation that incorporates individual characteristics and pairwise user similarities. Specifically, we define four user proximity measures from biography, geography, mobility, and short messages (i.e., tweets). To construct proximity from unstructured text information, we build LDA topic models of user biography texts and tweets. Using Gowalla data with 385,306 users, three million locations, and 35 million check-in records, we empirically estimate the structural model to find evidence on the homophily effect in network formation.Computer Science
Cyberspace and Real-World Behavioral Relationships: Towards the Application of Internet Search Queries to Identify Individuals At-risk for Suicide
The Internet has become an integral and pervasive aspect of society. Not surprisingly, the growth of ecommerce has led to focused research on identifying relationships between user behavior in cyberspace and the real world - retailers are tracking items customers are viewing and purchasing in order to recommend additional products and to better direct advertising. As the relationship between online search patterns and real-world behavior becomes more understood, the practice is likely to expand to other applications. Indeed, Google Flu Trends has implemented an algorithm that accurately charts the relationship between the number of people searching for flu-related topics on the Internet, and the number of people who actually have flu symptoms in that region. Because the results are real-time, studies show Google Flu Trends estimates are typically two weeks ahead of the Center for Disease Control. The Air Force has devoted considerable resources to suicide awareness and prevention. Despite these efforts, suicide rates have remained largely unaffected. The Air Force Suicide Prevention Program assists family, friends, and co-workers of airmen in recognizing and discussing behavioral changes with at-risk individuals. Based on other successes in correlating behaviors in cyberspace and the real world, is it possible to leverage online activities to help identify individuals that exhibit suicidal or depression-related symptoms? This research explores the notion of using Internet search queries to classify individuals with common search patterns. Text mining was performed on user search histories for a one-month period from nine Air Force installations. The search histories were clustered based on search term probabilities, providing the ability to identify relationships between individuals searching for common terms. Analysis was then performed to identify relationships between individuals searching for key terms associated with suicide, anxiety, and post-traumatic stress
Evaluating latent content within unstructured text: an analytical methodology based on a temporal network of associated topics
Abstract In this research various concepts from network theory and topic modelling are combined, to provision a temporal network of associated topics. This solution is presented as a step-by-step process to facilitate the evaluation of latent topics from unstructured text, as well as the domain area that textual documents are sourced from. In addition to ensuring shifts and changes in the structural properties of a given corpus are visible, non-stationary classes of cooccurring topics are determined, and trends in topic prevalence, positioning, and association patterns are evaluated over time. The aforementioned capabilities extend the insights fostered from stand-alone topic modelling outputs, by ensuring latent topics are not only identified and summarized, but more systematically interpreted, analysed, and explained, in a transparent and reliable way
The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation
Abstract We document marked trends in 10-K disclosure over the period 1996–2013, with increases in length, boilerplate, stickiness, and redundancy and decreases in specificity, readability, and the relative amount of hard information. We use Latent Dirichlet Allocation (LDA) to examine specific topics and find that new FASB and SEC requirements explain most of the increase in length and that 3 of the 150 topics—fair value, internal controls, and risk factor disclosures—account for virtually all of the increase. These three disclosures also play a major role in explaining the trends in the remaining textual characteristics
Feature selection strategies for improving data-driven decision support in bank telemarketing
The usage of data mining techniques to unveil previously undiscovered knowledge has
been applied in past years to a wide number of domains, including banking and marketing. Raw
data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw
data manipulation is feature engineering and it is related with the correct characterization or
selection of relevant features (or variables) that conceal relations with the target goal.
This study is particularly focused on feature engineering, aiming at the unfolding
features that best characterize the problem of selling long-term bank deposits through
telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank,
ranging the 2008-2013 year period and encompassing the recent global financial crisis, was
addressed. To assess the relevance of such problem, a novel literature analysis using text
mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a
research gap for bank telemarketing.
Starting from a dataset containing typical telemarketing contacts and client information,
research followed three different and complementary strategies: first, by enriching the dataset
with social and economic context features; then, by including customer lifetime value related
features; finally, by applying a divide and conquer strategy for splitting the problem in smaller
fractions, leading to optimized sub-problems. Each of the three approaches improved previous
results in terms of model metrics related to prediction performance. The relevance of the
proposed features was evaluated, confirming the obtained models as credible and valuable for
telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido
aplicada nos últimos anos a uma grande variedade de domÃnios, incluindo banca e marketing.
Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões
de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia
de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou
variáveis) que se relacionem com o alvo da descoberta de conhecimento.
Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as
variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de
campanhas de telemarketing. Sendo um estudo empÃrico, foi utilizado um caso de estudo de
um banco português, abrangendo o perÃodo 2008-2013, que inclui os efeitos da crise financeira
internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise
da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a
existência de uma lacuna nesta matéria.
Utilizando como base um conjunto de dados de contactos de telemarketing e
informação sobre os clientes, três estratégias diferentes e complementares foram propostas:
primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram
adicionadas caracterÃsticas associadas ao valor do cliente ao longo do seu tempo de vida;
finalmente, o problema foi dividido em problemas mais especÃficos, permitindo abordagens
otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas Ã
capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada,
confirmando os modelos obtidos como credÃveis e valiosos para gestores de campanhas de telemarketing
Hospital data analytics for business intelligence - An analytics tool for patient feedback analysis.
- …