Search CORE

23 research outputs found

Robust Recommender System: A Survey and Future Directions

Author: Cao Qi
Cheng Xueqi
Shen Huawei
Sun Fei
Tao Shuchang
Wu Yunfan
Zhang Kaike
Publication venue
Publication date: 05/09/2023
Field of study

With the rapid growth of information, recommender systems have become integral for providing personalized suggestions and overcoming information overload. However, their practical deployment often encounters "dirty" data, where noise or malicious information can lead to abnormal recommendations. Research on improving recommender systems' robustness against such dirty data has thus gained significant attention. This survey provides a comprehensive review of recent work on recommender systems' robustness. We first present a taxonomy to organize current techniques for withstanding malicious attacks and natural noise. We then explore state-of-the-art methods in each category, including fraudster detection, adversarial training, certifiable robust training against malicious attacks, and regularization, purification, self-supervised learning against natural noise. Additionally, we summarize evaluation metrics and common datasets used to assess robustness. We discuss robustness across varying recommendation scenarios and its interplay with other properties like accuracy, interpretability, privacy, and fairness. Finally, we delve into open issues and future research directions in this emerging field. Our goal is to equip readers with a holistic understanding of robust recommender systems and spotlight pathways for future research and development

arXiv.org e-Print Archive

Interpreting Neural Networks Using Flip Points

Author: Yousefzadeh Roozbeh
O'Leary Dianne P.
Publication venue
Publication date: 20/10/1981
Field of study

Neural networks have been criticized for their lack of easy interpretation, which undermines confidence in their use for important applications. Here, we introduce a novel technique, interpreting a trained neural network by investigating its flip points. A flip point is any point that lies on the boundary between two output classes: e.g. for a neural network with a binary yes/no output, a flip point is any input that generates equal scores for "yes" and "no". The flip point closest to a given input is of particular importance, and this point is the solution to a well-posed optimization problem. This paper gives an overview of the uses of flip points and how they are computed. Through results on standard datasets, we demonstrate how flip points can be used to provide detailed interpretation of the output produced by a neural network. Moreover, for a given input, flip points enable us to measure confidence in the correctness of outputs much more effectively than softmax score. They also identify influential features of the inputs, identify bias, and find changes in the input that change the output of the model. We show that distance between an input and the closest flip point identifies the most influential points in the training data. Using principal component analysis (PCA) and rank-revealing QR factorization (RR-QR), the set of directions from each training input to its closest flip point provides explanations of how a trained neural network processes an entire dataset: what features are most important for classification into a given class, which features are most responsible for particular misclassifications, how an adversary might fool the network, etc. Although we investigate flip points for neural networks, their usefulness is actually model-agnostic

arXiv.org e-Print Archive

Trinity College

Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach

Author: Bertsimas Dimitris
Cory-Wright Ryan
Johnson Nicholas A. G.
Publication venue
Publication date: 19/04/2023
Field of study

We study the Sparse Plus Low-Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix into a sparse matrix of perturbations plus a low-rank matrix containing the ground truth. SLR is a fundamental problem in Operations Research and Machine Learning which arises in various applications, including data compression, latent semantic indexing, collaborative filtering, and medical imaging. We introduce a novel formulation for SLR that directly models its underlying discreteness. For this formulation, we develop an alternating minimization heuristic that computes high-quality solutions and a novel semidefinite relaxation that provides meaningful bounds for the solutions returned by our heuristic. We also develop a custom branch-and-bound algorithm that leverages our heuristic and convex relaxations to solve small instances of SLR to certifiable (near) optimality. Given an input

n

-by-

n

matrix, our heuristic scales to solve instances where

n=10000

in minutes, our relaxation scales to instances where

n=200

in hours, and our branch-and-bound algorithm scales to instances where

n=25

in minutes. Our numerical results demonstrate that our approach outperforms existing state-of-the-art approaches in terms of rank, sparsity, and mean-square error while maintaining a comparable runtime

arXiv.org e-Print Archive

Towards private and robust machine learning for information security

Author: Hayes Jamie
Publication venue: UCL (University College London)
Publication date: 28/12/2021
Field of study

Many problems in information security are pattern recognition problems. For example, determining if a digital communication can be trusted amounts to certifying that the communication does not carry malicious or secret content, which can be distilled into the problem of recognising the difference between benign and malicious content. At a high level, machine learning is the study of how patterns are formed within data, and how learning these patterns generalises beyond the potentially limited data pool at a practitioner’s disposal, and so has become a powerful tool in information security. In this work, we study the benefits machine learning can bring to two problems in information security. Firstly, we show that machine learning can be used to detect which websites are visited by an internet user over an encrypted connection. By analysing timing and packet size information of encrypted network traffic, we train a machine learning model that predicts the target website given a stream of encrypted network traffic, even if browsing is performed over an anonymous communication network. Secondly, in addition to studying how machine learning can be used to design attacks, we study how it can be used to solve the problem of hiding information within a cover medium, such as an image or an audio recording, which is commonly referred to as steganography. How well an algorithm can hide information within a cover medium amounts to how well the algorithm models and exploits areas of redundancy. This can again be reduced to a pattern recognition problem, and so we apply machine learning to design a steganographic algorithm that efficiently hides a secret message with an image. Following this, we proceed with discussions surrounding why machine learning is not a panacea for information security, and can be an attack vector in and of itself. We show that machine learning can leak private and sensitive information about the data it used to learn, and how malicious actors can exploit vulnerabilities in these learning algorithms to compel them to exhibit adversarial behaviours. Finally, we examine the problem of the disconnect between image recognition systems learned by humans and by machine learning models. While human classification of an image is relatively robust to noise, machine learning models do not possess this property. We show how an attacker can cause targeted misclassifications against an entire data distribution by exploiting this property, and go onto introduce a mitigation that ameliorates this undesirable trait of machine learning

UCL Discovery

Training Datasets for Machine Reading Comprehension and Their Limitations

Author: Welbl Johannes
Publication venue: UCL (University College London)
Publication date: 28/09/2020
Field of study

Neural networks are a powerful model class to learn machine Reading Comprehen- sion (RC), yet they crucially depend on the availability of suitable training datasets. In this thesis we describe methods for data collection, evaluate the performance of established models, and examine a number of model behaviours and dataset limita- tions. We first describe the creation of a data resource for the science exam QA do- main, and compare existing models on the resulting dataset. The collected ques- tions are plausible – non-experts can distinguish them from real exam questions with 55% accuracy – and using them as additional training data leads to improved model scores on real science exam questions. Second, we describe and apply a distant supervision dataset construction method for multi-hop RC across documents. We identify and mitigate several dataset assembly pitfalls – a lack of unanswerable candidates, label imbalance, and spurious correlations between documents and particular candidates – which often leave shallow predictive cues for the answer. Furthermore we demonstrate that se- lecting relevant document combinations is a critical performance bottleneck on the datasets created. We thus investigate Pseudo-Relevance Feedback, which leads to improvements compared to TF-IDF-based document combination selection both in retrieval metrics and answer accuracy. Third, we investigate model undersensitivity: model predictions do not change when given adversarially altered questions in SQUAD2.0 and NEWSQA, even though they should. We characterise affected samples, and show that the phe- nomenon is related to a lack of structurally similar but unanswerable samples during training: data augmentation reduces the adversarial error rate, e.g. from 51.7% to 20.7% for a BERT model on SQUAD2.0, and improves robustness also in other settings. Finally we explore efficient formal model verification via Interval Bound Propagation (IBP) to measure and address model undersensitivity, and show that using an IBP-derived auxiliary loss can improve verification rates, e.g. from 2.8% to 18.4% on the SNLI test set

UCL Discovery

Recommended from our members

Distributionally Robust Performance Analysis: Data, Dependence and Extremes

Author: He Fei
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

This dissertation focuses on distributionally robust performance analysis, which is an area of applied probability whose aim is to quantify the impact of model errors. Stochastic models are built to describe phenomena of interest with the intent of gaining insights or making informed decisions. Typically, however, the fidelity of these models (i.e. how closely they describe the underlying reality) may be compromised due to either the lack of information available or tractability considerations. The goal of distributionally robust performance analysis is then to quantify, and potentially mitigate, the impact of errors or model misspecifications. As such, distributionally robust performance analysis affects virtually any area in which stochastic modelling is used for analysis or decision making. This dissertation studies various aspects of distributionally robust performance analysis. For example, we are concerned with quantifying the impact of model error in tail estimation using extreme value theory. We are also concerned with the impact of the dependence structure in risk analysis when marginal distributions of risk factors are known. In addition, we also are interested in connections recently found to machine learning and other statistical estimators which are based on distributionally robust optimization. The first problem that we consider consists in studying the impact of model specification in the context of extreme quantiles and tail probabilities. There is a rich statistical theory that allows to extrapolate tail behavior based on limited information. This body of theory is known as extreme value theory and it has been successfully applied to a wide range of settings, including building physical infrastructure to withstand extreme environmental events and also guiding the capital requirements of insurance companies to ensure their financial solvency. Not surprisingly, attempting to extrapolate out into the tail of a distribution from limited observations requires imposing assumptions which are impossible to verify. The assumptions imposed in extreme value theory imply that a parametric family of models (known as generalized extreme value distributions) can be used to perform tail estimation. Because such assumptions are so difficult (or impossible) to be verified, we use distributionally robust optimization to enhance extreme value statistical analysis. Our approach results in a procedure which can be easily applied in conjunction with standard extreme value analysis and we show that our estimators enjoy correct coverage even in settings in which the assumptions imposed by extreme value theory fail to hold. In addition to extreme value estimation, which is associated to risk analysis via extreme events, another feature which often plays a role in the risk analysis is the impact of dependence structure among risk factors. In the second chapter we study the question of evaluating the worst-case expected cost involving two sources of uncertainty, each of them with a specific marginal probability distribution. The worst-case expectation is optimized over all joint probability distributions which are consistent with the marginal distributions specified for each source of uncertainty. So, our formulation allows to capture the impact of the dependence structure of the risk factors. This formulation is equivalent to the so-called Monge-Kantorovich problem studied in optimal transport theory, whose theoretical properties have been studied in the literature substantially. However, rates of convergence of computational algorithms for this problem have been studied only recently. We show that if one of the random variables takes finitely many values, a direct Monte Carlo approach allows to evaluate such worst case expectation with

O(n^{-1/2})

convergence rate as the number of Monte Carlo samples,

n

, increases to infinity. Next, we continue our investigation of worst-case expectations in the context of multiple risk factors, not only two of them, assuming that their marginal probability distributions are fixed. This problem does not fit the mold of standard optimal transport (or Monge-Kantorovich) problems. We consider, however, cost functions which are separable in the sense of being a sum of functions which depend on adjacent pairs of risk factors (think of the factors indexed by time). In this setting, we are able to reduce the problem to the study of several separate Monge-Kantorovich problems. Moreover, we explain how we can even include martingale constraints which are often natural to consider in settings such as financial applications. While in the previous chapters we focused on the impact of tail modeling or dependence, in the later parts of the dissertation we take a broader view by studying decisions which are made based on empirical observations. So, we focus on so-called distributionally robust optimization formulations. We use optimal transport theory to model the degree of distributional uncertainty or model misspecification. Distributionally robust optimization based on optimal transport has been a very active research topic in recent years, our contribution consists in studying how to specify the optimal transport metric in a data-driven way. We explain our procedure in the context of classification, which is of substantial importance in machine learning applications

Columbia University Academic Commons

Dealing with Intransitivity, Non-Convexity, and Algorithmic Bias in Preference Learning

Author: Bower Amanda Ruth
Publication venue
Publication date: 01/01/2020
Field of study

Rankings are ubiquitous since they are a natural way to present information to people who are making decisions. There are seemingly countless scenarios where rankings arise, such as deciding whom to hire at a company, determining what movies to watch, purchasing products, understanding human perception, judging science fair projects, voting for political candidates, and so on. In many of these scenarios, the number of items in consideration is prohibitively large, such that asking someone to rank all of the choices is essentially impossible. On the other hand, collecting preference data on a small subset of the items is feasible, e.g., collecting answers to ``Do you prefer item A or item B?" or ``Is item A closer to item B or item C?". Therefore, an important machine learning task is to learn a ranking of the items based on this preference data. This thesis theoretically and empirically addresses three key challenges of preference learning: intransitivity in preference data, non-convex optimization, and algorithmic bias. Chapter 2 addresses the challenge of learning a ranking given pairwise comparison data that violates rational choice such as intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the salient feature preference model and prove a sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. Chapter 3 addresses the non-convexity of an optimization problem inspired by ordinal embedding, which is a preference learning task. We aim to understand the landscape, that is local minimizers and global minimizers, of the non-convex objective, which corresponds to the hinge loss arising from quadratic constraints. Under certain assumptions, we give necessary conditions for non-global, local minimizers of our objective and additionally show that in two dimensions, every local minimizer is a global minimizer. Chapters 4 and 5 address the challenge of algorithmic bias. We consider training machine learning models that are fair in the sense that their performance is invariant under certain sensitive perturbations to the inputs. For example, the performance of a resume screening system should be invariant under changes to the gender and ethnicity of the applicant. We formalize this notion of algorithmic fairness as a variant of individual fairness. In Chapter 4, we consider classification and develop a distributionally robust optimization approach, SenSR, that enforces this notion of individual fairness during training and provably learns individually fair classifiers. Chapter 5 builds upon Chapter 4. We develop a related algorithm, SenSTIR, to train provably individually fair learning-to-rank (LTR) models. The proposed approach ensures items from minority groups appear alongside similar items from majority groups. This notion of fair ranking is based on the individual fairness definition considered in Chapter 4 for the supervised learning context and is more nuanced than prior fair LTR approaches that simply provide underrepresented items with a basic level of exposure. The crux of our method is an optimal transport-based regularizer that enforces individual fairness and an efficient algorithm for optimizing the regularizer.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/166120/1/amandarg_1.pd

Deep Blue Documents at the University of Michigan