65 research outputs found

    Relational Patterns

    Get PDF
    Information Systems Working Papers Serie

    Decision-centric Active Learning of Binary-Outcome Models

    Get PDF
    It can be expensive to acquire the data required for businesses to employ data-driven predictive modeling, for example to model consumer preferences to optimize targeting. Prior research has introduced “active learning” policies for identifying data that are particularly useful for model induction, with the goal of decreasing the statistical error for a given acquisition cost (error-centric approaches). However, predictive models are used as part of a decision-making process, and costly improvements in model accuracy do not always result in better decisions. This paper introduces a new approach for active data acquisition that targets decision-making specifically. The new decision-centric approach departs from traditional active learning by placing emphasis on acquisitions that are more likely to affect decision-making. We describe two different types of decision-centric techniques. Next, using direct-marketing data, we compare various data-acquisition techniques. We demonstrate that strategies for reducing statistical error can be wasteful in a decision-making context, and show that one decision-centric technique in particular can improve targeting decisions significantly. We also show that this method is robust in the face of decreasing quality of utility estimations, eventually converging to uniform random sampling, and that it can be extended to situations where different data acquisitions have different costs. The results suggest that businesses should consider modifying their strategies for acquiring information through normal business transactions. For example, a firm such as Amazon.com that models consumer preferences for customized marketing may accelerate learning by proactively offering recommendations—not merely to induce immediate sales, but for improving recommendations in the future.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Active Sampling for Class Probability Estimation and Ranking

    Get PDF
    In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active sampling acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features for an active sampling approach and present an active sampling method for estimating class probabilities and ranking. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and by accounting for a particular data item's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING,a n existing active sampling method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain class probability estimation accuracy and provide insights on the behavior of the algorithms. Finally, to further our understanding of the contributions made by the elements of BOOTSTRAP-LV, we experiment with a new active sampling algorithm drawing from both UNCERTAINIY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.Information Systems Working Papers Serie

    Active Learning for Decision Making

    Get PDF
    This paper addresses focused information acquisition for predictive data mining. As businesses strive to cater to the preferences of individual consumers, they often employ predictive models to customize marketing efforts. Building accurate models requires information about consumer preferences that often is costly to acquire. Prior research has introduced many â active learningâ policies for identifying information that is particularly useful for model induction, the goal being to reduce the acquisition cost necessary to induce a model with a given accuracy. However, predictive models often are used as part of a decision-making process, and costly improvements in model accuracy do not always result in better decisions. This paper develops a new approach for active information acquisition that targets decision-making specifically. The method we introduce departs from the traditional error-reducing paradigm and places emphasis on acquisitions that are more likely to affect decision-making. Empirical evaluations with direct marketing data demonstrate that for a fixed information acquisition cost the method significantly improves the targeting decisions. The method is designed to be genericâ not based on a single model or induction algorithmâ and we show that it can be applied effectively to various predictive modeling techniques.Information Systems Working Papers Serie

    DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation

    Full text link
    In recent years, there has been growing focus on the study of automated recommender systems. Music recommendation systems serve as a prominent domain for such works, both from an academic and a commercial perspective. A fundamental aspect of music perception is that music is experienced in temporal context and in sequence. In this work we present DJ-MC, a novel reinforcement-learning framework for music recommendation that does not recommend songs individually but rather song sequences, or playlists, based on a model of preferences for both songs and song transitions. The model is learned online and is uniquely adapted for each listener. To reduce exploration time, DJ-MC exploits user feedback to initialize a model, which it subsequently updates by reinforcement. We evaluate our framework with human participants using both real song and playlist data. Our results indicate that DJ-MC's ability to recommend sequences of songs provides a significant improvement over more straightforward approaches, which do not take transitions into account.Comment: -Updated to the most recent and completed version (to be presented at AAMAS 2015) -Updated author list. in Autonomous Agents and Multiagent Systems (AAMAS) 2015, Istanbul, Turkey, May 201

    Who’s A Good Decision Maker? Data-Driven Expert Worker Ranking under Unobservable Quality

    Get PDF
    Evaluation of expert workers by their decision quality has substantial practical value, yet using other expert workers for decision quality evaluation tasks is costly and often infeasible. In this work, we frame the Ranking of Expert workers according to their unobserved decision Quality (REQ) -- without resorting to evaluation by other experts -- as a new Data Science problem. This problem is challenging, as the correct decisions are commonly unobservable and substantial parts of the information available to the decision maker is not available for retrospective decision evaluation. We propose a new machine learning approach to address this problem. We evaluate our method on one dataset representing real expert decisions and two public datasets, and find that our approach is successful in generating highly accurate rankings. Moreover, we observe that our approach’s superiority over the baseline is particularly prominent as evaluation settings become increasingly challenging

    Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce

    Get PDF
    Electronic commerce is revolutionizing the way we think about data modeling, by making it possible to integrate the processes of (costly) data acquisition and model induction. The opportunity for improving modeling through costly data acquisition presents itself for a diverse set of electronic commerce modeling tasks, from personalization to customer lifetime value modeling; we illustrate with the running example of choosing offers to display to web-site visitors, which captures important aspects in a familiar setting. Considering data acquisition costs explicitly can allow the building of predictive models at significantly lower costs, and a modeler may be able to improve performance via new sources of information that previously were too expensive to consider. However, existing techniques for integrating modeling and data acquisition cannot deal with the rich environment that electronic commerce presents. We discuss several possible data acquisition settings, the challenges involved in the integration with modeling, and various research areas that may supply parts of an ultimate solution. We also present and demonstrate briefly a unified framework within which one can integrate acquisitions of different types, with any cost structure and any predictive modeling objectiveNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Personality-Based Content Engineering for Rich Digital Media

    Get PDF
    Firms have increasingly turned to rich digital media, such as videos and photos, to attract attention and boost awareness. Although extant research may help firms promote these media more effectively, the marketing process truly begins with creation of the media. Thus, content creators may benefit from understanding what media is likely to achieve greater popularity, based on its content features. We develop a method to understand the effect of content on the consumption of online videos, and employ our method on a unique dataset including 16,414 videos from 363 YouTube channels. Our approach labels videos as high- or low-performing relative to comparable videos, and leverages random forests to identify content features associated with performance level. We test this method using the personality of speech-driven videos, employing NLP to estimate the extent to which video captions exhibit each of the “big five” personality traits. Our analysis uncovers predictive, economic, and prescriptive insights. We find that using just their personality, we can predict whether videos perform better than expectation with 72% accuracy. Furthermore, videos associated with high-performing personalities can expect a nearly 15% increase in consumption. Finally, we examine which personalities are associated with high consumption, offering prescriptive insights for content engineering

    Data-Driven Allocation of Preventive Care With Application to Diabetes Mellitus Type II

    Full text link
    Problem Definition. Increasing costs of healthcare highlight the importance of effective disease prevention. However, decision models for allocating preventive care are lacking. Methodology/Results. In this paper, we develop a data-driven decision model for determining a cost-effective allocation of preventive treatments to patients at risk. Specifically, we combine counterfactual inference, machine learning, and optimization techniques to build a scalable decision model that can exploit high-dimensional medical data, such as the data found in modern electronic health records. Our decision model is evaluated based on electronic health records from 89,191 prediabetic patients. We compare the allocation of preventive treatments (metformin) prescribed by our data-driven decision model with that of current practice. We find that if our approach is applied to the U.S. population, it can yield annual savings of $1.1 billion. Finally, we analyze the cost-effectiveness under varying budget levels. Managerial Implications. Our work supports decision-making in health management, with the goal of achieving effective disease prevention at lower costs. Importantly, our decision model is generic and can thus be used for effective allocation of preventive care for other preventable diseases.Comment: Accepted by Manufacturing & Service Operations Managemen
    • …
    corecore