61 research outputs found

    Learning and Inference in Massive Social Networks

    Get PDF
    Researchers and practitioners increasingly are gaining access to data on explicit social networks. For example, telecommunications and technology firms record data on consumer networks (via phone calls, emails, voice-over-IP, instant messaging), and social-network portal sites such as MySpace, Friendster and Facebook record consumer-generated data on social networks. Inference for fraud detection [5, 3, 8], marketing [9], and other tasks can be improved with learned models that take social networks into account and with collective inference [12], which allows inferences about nodes in the network to affect each other. However, these socialnetwork graphs can be huge, comprising millions to billions of nodes and one or two orders of magnitude more links. This paper studies the application of collective inference to improve prediction over a massive graph. Faced initially with a social network comprising hundreds of millions of nodes and a few billion edges, our goal is: to produce an approximate consumer network that is orders of magnitude smaller, but still facilitates improved performance via collective inference. We introduce a sampling technique designed to reduce the size of the network by many orders of magnitude, but to keep linkages that facilitate improved prediction via collective inference. In short, the sampling scheme operates as follows: (1) choose a set of nodes of interest; (2) then, in analogy to snowball sampling [14], grow local graphs around these nodes, adding their social networks, their neighbors’ social networks, and so on; (3) next, prune these local graphs of edges which are expected to contribute little to the collective inference; (4) finally, connect the local graphs together to form a graph with (hopefully) useful inference connectivity. We apply this sampling method to assess whether collective inference can improve learned targeted-marketing models for a social network of consumers of telecommunication services. Prior work [9] has shown improvement to the learning of targeting models by including social-neighborhood information—in particular, information on existing customers in the immediate social network of a potential target. However, the improvement was restricted to the “network neighbors”, those targets linked to a prior customer thought to be good candidates for the new service. Collective inference techniques may extend the predictive influence of existing customers beyond their immediate neighborhoods. For the present work, our motivating conjecture has been that this influence can improve prediction for consumers who are not strongly connected to existing customers. Our results show that this is indeed the case: collective inference on the approximate network enables significantly improved predictive performance for non-network-neighbor consumers, and for consumers who have few links to existing customers. In the rest of this extended abstract we motivate our approach, describe our sampling method, present results on applying our approach to a large real-world target marketing campaign in the telecommunications industry, and finally discuss our findings.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Social TV: Linking TV Content to Buzz and Sales

    Get PDF
    “Social TV” is a term that broadly describes the online social interactions occurring between viewers while watching television. In this paper, we show that TV networks can derive value from social media content placed in shows because it leads to increased word of mouth via online posts, and it highly correlates with TV show related sales. In short, we show that TV event triggers change the online behavior of viewers. In this paper, we first show that using social media content on the televised American reality singing competition, The Voice, led to increased social media engagement during the TV broadcast. We then illustrate that social media buzz about a contestant after a performance is highly correlated with song sales from that contestant’s performance. We believe this to be the first study linking TV content to buzz and sales in real time

    Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype intelligent discovery assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition

    The Gift of Gab: Evidence TelE-Commerce Firms Can Profit from Viral Marketing

    Get PDF
    Viral or buzz marketing takes advantage of communication linkages to propagate positive influence regarding a product or service. TelE-commerce is an ideal domain within which to study viral marketing, because communication linkages can be observed. In this paper, we follow a new telE-commerce service. In particular, we observe how the communication networks of existing customers influence the rate of product diffusion. The main contribution of this paper is evidence that consumers are more likely to purchase a service if they have previously spoken to a person who has the service. In addition, we offer the following three contributions: 1) the clarification that this need not be evidence of viral influence, we suggest different explanations; 2) we also describe the relation of these explanations to theories of purchasing behavior; and 3) we present some evidence to discern from among the explanations.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Expert Stock Picker: The Wisdom of (Experts in) Crowds

    Get PDF
    The phrase the wisdom of crowds suggests that good verdicts can be achieved by averaging the opinions and insights of large, diverse groups of people who possess varied types of information. Online user-generated content enables researchers to view the opinions of large numbers of users publicly. These opinions, in the form of reviews and votes, can be used to automatically generate remarkably accurate verdicts-collective estimations of future performance-about companies, products, and people on the Web to resolve very tough problems. The wealth and richness of user-generated content may enable firms and individuals to aggregate consumer-think for better business understanding. Our main contribution, here applied to user-generated stock pick votes from a widely used online financial newsletter, is a genetic algorithm approach that can be used to identify the appropriate vote weights for users based on their prior individual voting success. Our method allows us to identify and rank experts within the crowd, enabling better stock pick decisions than the S&P 500. We show that the online crowd performs better, on average, than the S&P 500 for two test time periods, 2008 and 2009, in terms of both overall returns and risk-adjusted returns, as measured by the Sharpe ratio. Furthermore, we show that giving more weight to the votes of the experts in the crowds increases the accuracy of the verdicts, yielding an even greater return in the same time periods. We test our approach by utilizing more than three years of publicly available stock pick data. We compare our method to approaches derived from both the computer science and finance literature. We believe that our approach can be generalized to other domains where user opinions are publicly available early and where those opinions can be evaluated. For example, YouTube video ratings may be used to predict downloads, or online reviewer ratings on Digg may be used to predict the success or popularity of a story

    Intelligent Assistance for the Data Mining Process: An Ontology-based Approach

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. We present the concept of Intelligent Discovery Assistants (IDAs), which provide users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use a prototype to show that an IDA can indeed provide useful enumerations and effective rankings. We discuss how an IDA is an important tool for knowledge sharing among a team of data miners. Finally, we illustrate all the claims with a comprehensive demonstration using a more involved process and data from the 1998 KDDCUP competition.Information Systems Working Papers Serie

    Viral Marketing: Identifying Likely Adopters Via Consumer Networks

    Get PDF
    We investigate the hypothesis: those consumers who have communicated with a customer of a particular service have increased likelihood of adopting the service. We survey the diverse literature on such "viral marketing," providing a categorization of the specific research questions asked, the data analyzed, and the statistical methods used. We highlight a striking gap in the literature: no prior study has had both of the two key types of data necessary to provide direct support for the hypothesis: data on communications between consumers, and data on product adoption. We suggest a type of service for which both types of data are available telecommunications services. Then, for a particular telecommunication service, we show support for the hypothesis. Specifically, we show three main results. 1) there is such a "viral" effect and it is statistically significant, resulting in take rates 3-5 times greater than a baseline group; 2) attributes constructed from the consumer network can improve models for ranking of targeted customers by likelihood of adoption, and 3) observing the network allows the firm to target new customers that would have fallen through the cracks, because they would not have been identified based solely on the traditional set of attributes used for marketing by the firm. We close with a discussion of challenges and opportunities for research in this area. For example, can one determine whether the reason for the viral effect is customer advocacy (e.g., via "word of mouth") versus network-identified homophily?Information Systems Working Papers Serie

    Network-Based Marketing: Identifying Likely Adopters via Consumer Networks

    Get PDF
    Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption.NYU, Stern School of Business, IOMS, Center for Digital Economy Researc

    Towards Intelligent Assistance for a Data Mining Process:-

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a comprehensive demonstration of cost-sensitive classification using a more involved process and data from the 1998 KDDCUP competition.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Intelligent Assistance for the Data Mining Process: An Ontology-based Approach

    Get PDF
    A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. We present the concept of Intelligent Discovery Assistants (IDAs), which provide users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use a prototype to show that an IDA can indeed provide useful enumerations and effective rankings. We discuss how an IDA is an important tool for knowledge sharing among a team of data miners. Finally, we illustrate all the claims with a comprehensive demonstration using a more involved process and data from the 1998 KDDCUP competition.Information Systems Working Papers Serie
    • …
    corecore