72 research outputs found
A Faster Algorithm to Build New Users Similarity List in Neighbourhood-based Collaborative Filtering
Neighbourhood-based Collaborative Filtering (CF) has been applied in the
industry for several decades, because of the easy implementation and high
recommendation accuracy. As the core of neighbourhood-based CF, the task of
dynamically maintaining users' similarity list is challenged by cold-start
problem and scalability problem. Recently, several methods are presented on
solving the two problems. However, these methods applied an algorithm
to compute the similarity list in a special case, where the new users, with
enough recommendation data, have the same rating list. To address the problem
of large computational cost caused by the special case, we design a faster
() algorithm, TwinSearch Algorithm, to avoid computing and
sorting the similarity list for the new users repeatedly to save the
computational resources. Both theoretical and experimental results show that
the TwinSearch Algorithm achieves better running time than the traditional
method
RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
Anonymized electronic medical records are an increasingly popular source of
research data. However, these datasets often lack race and ethnicity
information. This creates problems for researchers modeling human disease, as
race and ethnicity are powerful confounders for many health exposures and
treatment outcomes; race and ethnicity are closely linked to
population-specific genetic variation. We showed that deep neural networks
generate more accurate estimates for missing racial and ethnic information than
competing methods (e.g., logistic regression, random forest). RIDDLE yielded
significantly better classification performance across all metrics that were
considered: accuracy, cross-entropy loss (error), and area under the curve for
receiver operating characteristic plots (all ). We made specific
efforts to interpret the trained neural network models to identify, quantify,
and visualize medical features which are predictive of race and ethnicity. We
used these characterizations of informative features to perform a systematic
comparison of differential disease patterns by race and ethnicity. The fact
that clinical histories are informative for imputing race and ethnicity could
reflect (1) a skewed distribution of blue- and white-collar professions across
racial and ethnic groups, (2) uneven accessibility and subjective importance of
prophylactic health, (3) possible variation in lifestyle, such as dietary
habits, and (4) differences in background genetic variation which predispose to
diseases
Algorithms for Differentially Private Multi-Armed Bandits
We present differentially private algorithms for the stochastic Multi-Armed
Bandit (MAB) problem. This is a problem for applications such as adaptive
clinical trials, experiment design, and user-targeted advertising where private
information is connected to individual rewards. Our major contribution is to
show that there exist differentially private variants of
Upper Confidence Bound algorithms which have optimal regret, . This is a significant improvement over previous results, which only
achieve poly-log regret , because of our use of a
novel interval-based mechanism. We also substantially improve the bounds of
previous family of algorithms which use a continual release mechanism.
Experiments clearly validate our theoretical bounds
When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy
In recent years, it has become easy to obtain location information quite
precisely. However, the acquisition of such information has risks such as
individual identification and leakage of sensitive information, so it is
necessary to protect the privacy of location information. For this purpose,
people should know their location privacy preferences, that is, whether or not
he/she can release location information at each place and time. However, it is
not easy for each user to make such decisions and it is troublesome to set the
privacy preference at each time. Therefore, we propose a method to recommend
location privacy preferences for decision making. Comparing to existing method,
our method can improve the accuracy of recommendation by using matrix
factorization and preserve privacy strictly by local differential privacy,
whereas the existing method does not achieve formal privacy guarantee. In
addition, we found the best granularity of a location privacy preference, that
is, how to express the information in location privacy protection. To evaluate
and verify the utility of our method, we have integrated two existing datasets
to create a rich information in term of user number. From the results of the
evaluation using this dataset, we confirmed that our method can predict
location privacy preferences accurately and that it provides a suitable method
to define the location privacy preference
Concrete Problems in AI Safety, Revisited
As AI systems proliferate in society, the AI community is increasingly
preoccupied with the concept of AI Safety, namely the prevention of failures
due to accidents that arise from an unanticipated departure of a system's
behavior from designer intent in AI deployment. We demonstrate through an
analysis of real world cases of such incidents that although current vocabulary
captures a range of the encountered issues of AI deployment, an expanded
socio-technical framing will be required for a more complete understanding of
how AI systems and implemented safety mechanisms fail and succeed in real life.Comment: Published at ICLR workshop on ML in the Real World, 202
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
An Accuracy-Assured Privacy-Preserving Recommender System for Internet Commerce
Recommender systems, tool for predicting users' potential preferences by
computing history data and users' interests, show an increasing importance in
various Internet applications such as online shopping. As a well-known
recommendation method, neighbourhood-based collaborative filtering has
attracted considerable attention recently. The risk of revealing users' private
information during the process of filtering has attracted noticeable research
interests. Among the current solutions, the probabilistic techniques have shown
a powerful privacy preserving effect. When facing Nearest Neighbour attack,
all the existing methods provide no data utility guarantee, for the
introduction of global randomness. In this paper, to overcome the problem of
recommendation accuracy loss, we propose a novel approach, Partitioned
Probabilistic Neighbour Selection, to ensure a required prediction accuracy
while maintaining high security against NN attack. We define the sum of
neighbours' similarity as the accuracy metric alpha, the number of user
partitions, across which we select the neighbours, as the security metric
beta. We generalise the Nearest Neighbour attack to beta k Nearest
Neighbours attack. Differing from the existing approach that selects neighbours
across the entire candidate list randomly, our method selects neighbours from
each exclusive partition of size with a decreasing probability. Theoretical
and experimental analysis show that to provide an accuracy-assured
recommendation, our Partitioned Probabilistic Neighbour Selection method yields
a better trade-off between the recommendation accuracy and system security.Comment: replacement for the previous versio
- …