27 research outputs found
Safe Collaborative Filtering
Excellent tail performance is crucial for modern machine learning tasks, such
as algorithmic fairness, class imbalance, and risk-sensitive decision making,
as it ensures the effective handling of challenging samples within a dataset.
Tail performance is also a vital determinant of success for personalised
recommender systems to reduce the risk of losing users with low satisfaction.
This study introduces a "safe" collaborative filtering method that prioritises
recommendation quality for less-satisfied users rather than focusing on the
average performance. Our approach minimises the conditional value at risk
(CVaR), which represents the average risk over the tails of users' loss. To
overcome computational challenges for web-scale recommender systems, we develop
a robust yet practical algorithm that extends the most scalable method,
implicit alternating least squares (iALS). Empirical evaluation on real-world
datasets demonstrates the excellent tail performance of our approach while
maintaining competitive computational efficiency
Machine learning in space forms: Embeddings, classification, and similarity comparisons
We take a non-Euclidean view at three classical machine learning subjects: low-dimensional embedding, classification, and similarity comparisons.
We first introduce kinetic Euclidean distance matrices to solve kinetic distance geometry problems. In distance geometry problems (DGPs), the task is to find a geometric representation, that is, an embedding, for a collection of entities consistent with pairwise distance (metric) or similarity (nonmetric) measurements. In kinetic DGPs, the twist is that the points are dynamic. And our goal is to localize them by exploiting the information about their trajectory class. We show that a semidefinite relaxation can reconstruct trajectories from incomplete, noisy, time-varying distance observations. We then introduce another distance-geometric object: hyperbolic distance matrices. Recent works have focused on hyperbolic embedding methods for low-distortion embedding of distance measurements associated with hierarchical data. We derive a semidefinite relaxation to estimate the missing distance measurements and denoise them. Further, we formalize the hyperbolic Procrustes analysis, which uses extraneous information in the form of anchor points, to uniquely identify the embedded points.
Next, we address the design of learning algorithms in mixed-curvature spaces. Learning algorithms in low-dimensional mixed-curvature spaces have been limited to certain non-Euclidean neural networks. Here, we study the problem of learning a linear classifier (a perceptron) in product of Euclidean, spherical, and hyperbolic spaces, i.e., space forms. We introduce a notion of linear separation surfaces in Riemannian manifolds and use a metric that renders distances in different space forms compatible with each other and integrates them into one classifier.
Lastly, we show how similarity comparisons carry information about the underlying space of geometric graphs. We introduce the ordinal spread of a distance list and relate it to the ordinal capacity of their underlying space, a notion that quantifies the space's ability to host extreme patterns in nonmetric measurements. Then, we use the distribution of random ordinal spread variables as a practical tool to identify the underlying space form
On Estimating Recommendation Evaluation Metrics under Sampling
Since the recent study (Krichene and Rendle 2020) done by Krichene and Rendle
on the sampling-based top-k evaluation metric for recommendation, there has
been a lot of debates on the validity of using sampling to evaluate
recommendation algorithms. Though their work and the recent work (Li et
al.2020) have proposed some basic approaches for mapping the sampling-based
metrics to their global counterparts which rank the entire set of items, there
is still a lack of understanding and consensus on how sampling should be used
for recommendation evaluation. The proposed approaches either are rather
uninformative (linking sampling to metric evaluation) or can only work on
simple metrics, such as Recall/Precision (Krichene and Rendle 2020; Li et al.
2020). In this paper, we introduce a new research problem on learning the
empirical rank distribution, and a new approach based on the estimated rank
distribution, to estimate the top-k metrics. Since this question is closely
related to the underlying mechanism of sampling for recommendation, tackling it
can help better understand the power of sampling and can help resolve the
questions of if and how should we use sampling for evaluating recommendation.
We introduce two approaches based on MLE (MaximalLikelihood Estimation) and its
weighted variants, and ME(Maximal Entropy) principals to recover the empirical
rank distribution, and then utilize them for metrics estimation. The
experimental results show the advantages of using the new approaches for
evaluating recommendation algorithms based on top-k metrics
On Sampling Top-K Recommendation Evaluation
Recently, Rendle has warned that the use of sampling-based top- metrics
might not suffice. This throws a number of recent studies on deep
learning-based recommendation algorithms, and classic non-deep-learning
algorithms using such a metric, into jeopardy. In this work, we thoroughly
investigate the relationship between the sampling and global top- Hit-Ratio
(HR, or Recall), originally proposed by Koren[2] and extensively used by
others. By formulating the problem of aligning sampling top- () and
global top- () Hit-Ratios through a mapping function , so that
, we demonstrate both theoretically and experimentally
that the sampling top- Hit-Ratio provides an accurate approximation of its
global (exact) counterpart, and can consistently predict the correct winners
(the same as indicate by their corresponding global Hit-Ratios)