10,097 research outputs found
How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility
Recommendation systems are ubiquitous and impact many domains; they have the
potential to influence product consumption, individuals' perceptions of the
world, and life-altering decisions. These systems are often evaluated or
trained with data from users already exposed to algorithmic recommendations;
this creates a pernicious feedback loop. Using simulations, we demonstrate how
using data confounded in this way homogenizes user behavior without increasing
utility
Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning
Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps.
For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries.
We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget
Bayesian Inference of Networks Across Multiple Sample Groups and Data Types
In this paper, we develop a graphical modeling framework for the inference of
networks across multiple sample groups and data types. In medical studies, this
setting arises whenever a set of subjects, which may be heterogeneous due to
differing disease stage or subtype, is profiled across multiple platforms, such
as metabolomics, proteomics, or transcriptomics data. Our proposed Bayesian
hierarchical model first links the network structures within each platform
using a Markov random field prior to relate edge selection across sample
groups, and then links the network similarity parameters across platforms. This
enables joint estimation in a flexible manner, as we make no assumptions on the
directionality of influence across the data types or the extent of network
similarity across the sample groups and platforms. In addition, our model
formulation allows the number of variables and number of subjects to differ
across the data types, and only requires that we have data for the same set of
groups. We illustrate the proposed approach through both simulation studies and
an application to gene expression levels and metabolite abundances on subjects
with varying severity levels of Chronic Obstructive Pulmonary Disease (COPD)
Neural Graph Collaborative Filtering
Learning vector representations (aka. embeddings) of users and items lies at
the core of modern recommender systems. Ranging from early matrix factorization
to recently emerged deep learning based methods, existing efforts typically
obtain a user's (or an item's) embedding by mapping from pre-existing features
that describe the user (or the item), such as ID and attributes. We argue that
an inherent drawback of such methods is that, the collaborative signal, which
is latent in user-item interactions, is not encoded in the embedding process.
As such, the resultant embeddings may not be sufficient to capture the
collaborative filtering effect.
In this work, we propose to integrate the user-item interactions -- more
specifically the bipartite graph structure -- into the embedding process. We
develop a new recommendation framework Neural Graph Collaborative Filtering
(NGCF), which exploits the user-item graph structure by propagating embeddings
on it. This leads to the expressive modeling of high-order connectivity in
user-item graph, effectively injecting the collaborative signal into the
embedding process in an explicit manner. We conduct extensive experiments on
three public benchmarks, demonstrating significant improvements over several
state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further
analysis verifies the importance of embedding propagation for learning better
user and item representations, justifying the rationality and effectiveness of
NGCF. Codes are available at
https://github.com/xiangwang1223/neural_graph_collaborative_filtering.Comment: SIGIR 2019; the latest version of NGCF paper, which is distinct from
the version published in ACM Digital Librar
- …