3,200 research outputs found
Statistical Significance of the Netflix Challenge
Inspired by the legacy of the Netflix contest, we provide an overview of what
has been learned---from our own efforts, and those of others---concerning the
problems of collaborative filtering and recommender systems. The data set
consists of about 100 million movie ratings (from 1 to 5 stars) involving some
480 thousand users and some 18 thousand movies; the associated ratings matrix
is about 99% sparse. The goal is to predict ratings that users will give to
movies; systems which can do this accurately have significant commercial
applications, particularly on the world wide web. We discuss, in some detail,
approaches to "baseline" modeling, singular value decomposition (SVD), as well
as kNN (nearest neighbor) and neural network models; temporal effects,
cross-validation issues, ensemble methods and other considerations are
discussed as well. We compare existing models in a search for new models, and
also discuss the mission-critical issues of penalization and parameter
shrinkage which arise when the dimensions of a parameter space reaches into the
millions. Although much work on such problems has been carried out by the
computer science and machine learning communities, our goal here is to address
a statistical audience, and to provide a primarily statistical treatment of the
lessons that have been learned from this remarkable set of data.Comment: Published in at http://dx.doi.org/10.1214/11-STS368 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
To Index or Not to Index: Optimizing Exact Maximum Inner Product Search
Exact Maximum Inner Product Search (MIPS) is an important task that is widely
pertinent to recommender systems and high-dimensional similarity search. The
brute-force approach to solving exact MIPS is computationally expensive, thus
spurring recent development of novel indexes and pruning techniques for this
task. In this paper, we show that a hardware-efficient brute-force approach,
blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers
by over an order of magnitude, for some -- but not all -- inputs.
In this paper, we also present a novel MIPS solution, MAXIMUS, that takes
advantage of hardware efficiency and pruning of the search space. Like BMM,
MAXIMUS is faster than other solvers by up to an order of magnitude, but again
only for some inputs. Since no single solution offers the best runtime
performance for all inputs, we introduce a new data-dependent optimizer,
OPTIMUS, that selects online with minimal overhead the best MIPS solver for a
given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS
solvers by 3.2 on average, and up to 10.9, on widely studied
MIPS datasets.Comment: 12 pages, 8 figures, 2 table
Recommendation, collaboration and social search
This chapter considers the social component of interactive information retrieval: what is the role of other people in searching and browsing? For simplicity we begin by considering situations without computers. After all, you can interactively retrieve information without a computer; you just have to interact with someone or something else. Such an analysis can then help us think about the new forms of collaborative interactions that extend our conceptions of information search, made possible by the growth of networked ubiquitous computing technology.
Information searching and browsing have often been conceptualized as a solitary activity, however they always have a social component. We may talk about 'the' searcher or 'the' user of a database or information resource. Our focus may be on individual uses and our research may look at individual users. Our experiments may be designed to observe the behaviors of individual subjects. Our models and theories derived from our empirical analyses may focus substantially or exclusively on an individual's evolving goals, thoughts, beliefs, emotions and actions. Nevertheless there are always social aspects of information seeking and use present, both implicitly and explicitly.
We start by summarizing some of the history of information access with an emphasis on social and collaborative interactions. Then we look at the nature of recommendations, social search and interfaces to support collaboration between information seekers. Following this we consider how the design of interactive information systems is influenced by their social elements
Data Science and Machine Learning in mathematics education: Highschool students working on the Netflix Prize
One goal of contemporary mathematical modeling classes in schools should be to include up-to-date problems or interesting, new technologies from the everyday life of students-especially if these allow the didactical reduction to elementary (school-)mathematical knowledge and thus have the potential to enrich mathematics education. Data Science and Machine Learning is applied in numerous areas of science and technology and used in many applications in our everyday life. Using movie recommender systems and the so-called Netflix Prize as an example, this paper discusses how mathematics education can be enriched by modeling real-world, student-centered problems from the field of Machine Learning in school. For this purpose, we describe tested digital learning material from guided modeling projects and share our experience with giving the problem of developing a recommender system as a completely open problem to upper secondary students
How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility
Recommendation systems are ubiquitous and impact many domains; they have the
potential to influence product consumption, individuals' perceptions of the
world, and life-altering decisions. These systems are often evaluated or
trained with data from users already exposed to algorithmic recommendations;
this creates a pernicious feedback loop. Using simulations, we demonstrate how
using data confounded in this way homogenizes user behavior without increasing
utility
Recommender Systems by means of Information Retrieval
In this paper we present a method for reformulating the Recommender Systems
problem in an Information Retrieval one. In our tests we have a dataset of
users who give ratings for some movies; we hide some values from the dataset,
and we try to predict them again using its remaining portion (the so-called
"leave-n-out approach"). In order to use an Information Retrieval algorithm, we
reformulate this Recommender Systems problem in this way: a user corresponds to
a document, a movie corresponds to a term, the active user (whose rating we
want to predict) plays the role of the query, and the ratings are used as
weigths, in place of the weighting schema of the original IR algorithm. The
output is the ranking list of the documents ("users") relevant for the query
("active user"). We use the ratings of these users, weighted according to the
rank, to predict the rating of the active user. We carry out the comparison by
means of a typical metric, namely the accuracy of the predictions returned by
the algorithm, and we compare this to the real ratings from users. In our first
tests, we use two different Information Retrieval algorithms: LSPR, a recently
proposed model based on Discrete Fourier Transform, and a simple vector space
model
And the Winner Is...Capturing the Promise of Philanthropic Prizes
Philanthropists and governments have long used prizes to drive innovation and engagement to produce societal benefit, but the use of this powerful instrument is undergoing a renaissance. Philanthropic prizes are growing in number and size, are appearing in new forms, and are being applied to a wider range of societal objectives by a wider range of sponsors than ever before. Not all of the growth has been positive, however, as the many overlapping prizes and growing clutter of the sector attests. In response, current and potential participants are asking when they should use prizes, and how they can develop and deliver effective ones. This report addresses these questions by drawing on academic literature, interviews with analysts and practitioners, surveys of prize sponsors and competitors, databases of small and large awards, and case studies of twelve effective prizes to produce lessons from a range of sectors, goals, and prize types. It aims to help improve current prizes and stimulate effective future use by developing a number of simple frameworks and compiling useful lessons for sponsors. While targeting the philanthropic sponsor, we believe these perspectives will also be helpful to governments and corporations considering prizes. Our research found that prizes are a unique and powerful tool that should be in the basic toolkit of many of today's philanthropists. Their recent renaissance is largely due to a new appreciation for the multiple ways in which they can produce change: not only by identifying new levels of excellence and by encouraging specific innovations, but also by changing wider perceptions, improving the performance of communities of problem-solvers, building the skills of individuals, and mobilizing new talent or capital. These change drivers give prize sponsors compelling opportunities to use the open, competitive, and media-friendly attributes of prizes to stimulate attention and drive innovation in a highly leveraged and result-focused way. Recent prize growth is reinforced by powerful external trends such as the arrival of new philanthropic wealth, different attitudes to shifting risk, interest in open source approaches, and an increasingly networked, media-driven and technology-intensive world. We believe that the outlook for prizes is particularly strong because of the increased interest of philanthropists and the emergence of an industry of prize facilitators that is driving improvements in prize economics and improved practices for managing execution challenges and risks
Pycobra: A Python Toolbox for Ensemble Learning and Visualisation
We introduce \texttt{pycobra}, a Python library devoted to ensemble learning
(regression and classification) and visualisation. Its main assets are the
implementation of several ensemble learning algorithms, a flexible and generic
interface to compare and blend any existing machine learning algorithm
available in Python libraries (as long as a \texttt{predict} method is given),
and visualisation tools such as Voronoi tessellations. \texttt{pycobra} is
fully \texttt{scikit-learn} compatible and is released under the MIT
open-source license. \texttt{pycobra} can be downloaded from the Python Package
Index (PyPi) and Machine Learning Open Source Software (MLOSS). The current
version (along with Jupyter notebooks, extensive documentation, and continuous
integration tests) is available at
\href{https://github.com/bhargavvader/pycobra}{https://github.com/bhargavvader/pycobra}
and official documentation website is
\href{https://modal.lille.inria.fr/pycobra}{https://modal.lille.inria.fr/pycobra}
- âŠ