3,200 research outputs found

    Statistical Significance of the Netflix Challenge

    Full text link
    Inspired by the legacy of the Netflix contest, we provide an overview of what has been learned---from our own efforts, and those of others---concerning the problems of collaborative filtering and recommender systems. The data set consists of about 100 million movie ratings (from 1 to 5 stars) involving some 480 thousand users and some 18 thousand movies; the associated ratings matrix is about 99% sparse. The goal is to predict ratings that users will give to movies; systems which can do this accurately have significant commercial applications, particularly on the world wide web. We discuss, in some detail, approaches to "baseline" modeling, singular value decomposition (SVD), as well as kNN (nearest neighbor) and neural network models; temporal effects, cross-validation issues, ensemble methods and other considerations are discussed as well. We compare existing models in a search for new models, and also discuss the mission-critical issues of penalization and parameter shrinkage which arise when the dimensions of a parameter space reaches into the millions. Although much work on such problems has been carried out by the computer science and machine learning communities, our goal here is to address a statistical audience, and to provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data.Comment: Published in at http://dx.doi.org/10.1214/11-STS368 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

    Full text link
    Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some -- but not all -- inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2×\times on average, and up to 10.9×\times, on widely studied MIPS datasets.Comment: 12 pages, 8 figures, 2 table

    Recommendation, collaboration and social search

    Get PDF
    This chapter considers the social component of interactive information retrieval: what is the role of other people in searching and browsing? For simplicity we begin by considering situations without computers. After all, you can interactively retrieve information without a computer; you just have to interact with someone or something else. Such an analysis can then help us think about the new forms of collaborative interactions that extend our conceptions of information search, made possible by the growth of networked ubiquitous computing technology. Information searching and browsing have often been conceptualized as a solitary activity, however they always have a social component. We may talk about 'the' searcher or 'the' user of a database or information resource. Our focus may be on individual uses and our research may look at individual users. Our experiments may be designed to observe the behaviors of individual subjects. Our models and theories derived from our empirical analyses may focus substantially or exclusively on an individual's evolving goals, thoughts, beliefs, emotions and actions. Nevertheless there are always social aspects of information seeking and use present, both implicitly and explicitly. We start by summarizing some of the history of information access with an emphasis on social and collaborative interactions. Then we look at the nature of recommendations, social search and interfaces to support collaboration between information seekers. Following this we consider how the design of interactive information systems is influenced by their social elements

    Data Science and Machine Learning in mathematics education: Highschool students working on the Netflix Prize

    Get PDF
    One goal of contemporary mathematical modeling classes in schools should be to include up-to-date problems or interesting, new technologies from the everyday life of students-especially if these allow the didactical reduction to elementary (school-)mathematical knowledge and thus have the potential to enrich mathematics education. Data Science and Machine Learning is applied in numerous areas of science and technology and used in many applications in our everyday life. Using movie recommender systems and the so-called Netflix Prize as an example, this paper discusses how mathematics education can be enriched by modeling real-world, student-centered problems from the field of Machine Learning in school. For this purpose, we describe tested digital learning material from guided modeling projects and share our experience with giving the problem of developing a recommender system as a completely open problem to upper secondary students

    How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility

    Full text link
    Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility

    Recommender Systems by means of Information Retrieval

    Full text link
    In this paper we present a method for reformulating the Recommender Systems problem in an Information Retrieval one. In our tests we have a dataset of users who give ratings for some movies; we hide some values from the dataset, and we try to predict them again using its remaining portion (the so-called "leave-n-out approach"). In order to use an Information Retrieval algorithm, we reformulate this Recommender Systems problem in this way: a user corresponds to a document, a movie corresponds to a term, the active user (whose rating we want to predict) plays the role of the query, and the ratings are used as weigths, in place of the weighting schema of the original IR algorithm. The output is the ranking list of the documents ("users") relevant for the query ("active user"). We use the ratings of these users, weighted according to the rank, to predict the rating of the active user. We carry out the comparison by means of a typical metric, namely the accuracy of the predictions returned by the algorithm, and we compare this to the real ratings from users. In our first tests, we use two different Information Retrieval algorithms: LSPR, a recently proposed model based on Discrete Fourier Transform, and a simple vector space model

    And the Winner Is...Capturing the Promise of Philanthropic Prizes

    Get PDF
    Philanthropists and governments have long used prizes to drive innovation and engagement to produce societal benefit, but the use of this powerful instrument is undergoing a renaissance. Philanthropic prizes are growing in number and size, are appearing in new forms, and are being applied to a wider range of societal objectives by a wider range of sponsors than ever before. Not all of the growth has been positive, however, as the many overlapping prizes and growing clutter of the sector attests. In response, current and potential participants are asking when they should use prizes, and how they can develop and deliver effective ones. This report addresses these questions by drawing on academic literature, interviews with analysts and practitioners, surveys of prize sponsors and competitors, databases of small and large awards, and case studies of twelve effective prizes to produce lessons from a range of sectors, goals, and prize types. It aims to help improve current prizes and stimulate effective future use by developing a number of simple frameworks and compiling useful lessons for sponsors. While targeting the philanthropic sponsor, we believe these perspectives will also be helpful to governments and corporations considering prizes. Our research found that prizes are a unique and powerful tool that should be in the basic toolkit of many of today's philanthropists. Their recent renaissance is largely due to a new appreciation for the multiple ways in which they can produce change: not only by identifying new levels of excellence and by encouraging specific innovations, but also by changing wider perceptions, improving the performance of communities of problem-solvers, building the skills of individuals, and mobilizing new talent or capital. These change drivers give prize sponsors compelling opportunities to use the open, competitive, and media-friendly attributes of prizes to stimulate attention and drive innovation in a highly leveraged and result-focused way. Recent prize growth is reinforced by powerful external trends such as the arrival of new philanthropic wealth, different attitudes to shifting risk, interest in open source approaches, and an increasingly networked, media-driven and technology-intensive world. We believe that the outlook for prizes is particularly strong because of the increased interest of philanthropists and the emergence of an industry of prize facilitators that is driving improvements in prize economics and improved practices for managing execution challenges and risks

    Pycobra: A Python Toolbox for Ensemble Learning and Visualisation

    Get PDF
    We introduce \texttt{pycobra}, a Python library devoted to ensemble learning (regression and classification) and visualisation. Its main assets are the implementation of several ensemble learning algorithms, a flexible and generic interface to compare and blend any existing machine learning algorithm available in Python libraries (as long as a \texttt{predict} method is given), and visualisation tools such as Voronoi tessellations. \texttt{pycobra} is fully \texttt{scikit-learn} compatible and is released under the MIT open-source license. \texttt{pycobra} can be downloaded from the Python Package Index (PyPi) and Machine Learning Open Source Software (MLOSS). The current version (along with Jupyter notebooks, extensive documentation, and continuous integration tests) is available at \href{https://github.com/bhargavvader/pycobra}{https://github.com/bhargavvader/pycobra} and official documentation website is \href{https://modal.lille.inria.fr/pycobra}{https://modal.lille.inria.fr/pycobra}
    • 

    corecore