72 research outputs found

    A Faster Algorithm to Build New Users Similarity List in Neighbourhood-based Collaborative Filtering

    Full text link
    Neighbourhood-based Collaborative Filtering (CF) has been applied in the industry for several decades, because of the easy implementation and high recommendation accuracy. As the core of neighbourhood-based CF, the task of dynamically maintaining users' similarity list is challenged by cold-start problem and scalability problem. Recently, several methods are presented on solving the two problems. However, these methods applied an O(n2)O(n^2) algorithm to compute the similarity list in a special case, where the new users, with enough recommendation data, have the same rating list. To address the problem of large computational cost caused by the special case, we design a faster (O(1125n2)O(\frac{1}{125}n^2)) algorithm, TwinSearch Algorithm, to avoid computing and sorting the similarity list for the new users repeatedly to save the computational resources. Both theoretical and experimental results show that the TwinSearch Algorithm achieves better running time than the traditional method

    RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    Full text link
    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all p<106p < 10^{-6}). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases

    Algorithms for Differentially Private Multi-Armed Bandits

    Get PDF
    We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ϵ,δ)(\epsilon, \delta) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ϵ1+logT)O(\epsilon^{-1} + \log T). This is a significant improvement over previous results, which only achieve poly-log regret O(ϵ2log2T)O(\epsilon^{-2} \log^{2} T), because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds

    When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy

    Full text link
    In recent years, it has become easy to obtain location information quite precisely. However, the acquisition of such information has risks such as individual identification and leakage of sensitive information, so it is necessary to protect the privacy of location information. For this purpose, people should know their location privacy preferences, that is, whether or not he/she can release location information at each place and time. However, it is not easy for each user to make such decisions and it is troublesome to set the privacy preference at each time. Therefore, we propose a method to recommend location privacy preferences for decision making. Comparing to existing method, our method can improve the accuracy of recommendation by using matrix factorization and preserve privacy strictly by local differential privacy, whereas the existing method does not achieve formal privacy guarantee. In addition, we found the best granularity of a location privacy preference, that is, how to express the information in location privacy protection. To evaluate and verify the utility of our method, we have integrated two existing datasets to create a rich information in term of user number. From the results of the evaluation using this dataset, we confirmed that our method can predict location privacy preferences accurately and that it provides a suitable method to define the location privacy preference

    Concrete Problems in AI Safety, Revisited

    Full text link
    As AI systems proliferate in society, the AI community is increasingly preoccupied with the concept of AI Safety, namely the prevention of failures due to accidents that arise from an unanticipated departure of a system's behavior from designer intent in AI deployment. We demonstrate through an analysis of real world cases of such incidents that although current vocabulary captures a range of the encountered issues of AI deployment, an expanded socio-technical framing will be required for a more complete understanding of how AI systems and implemented safety mechanisms fail and succeed in real life.Comment: Published at ICLR workshop on ML in the Real World, 202

    Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization

    Full text link
    Recommender systems leverage user demographic information, such as age, gender, etc., to personalize recommendations and better place their targeted ads. Oftentimes, users do not volunteer this information due to privacy concerns, or due to a lack of initiative in filling out their online profiles. We illustrate a new threat in which a recommender learns private attributes of users who do not voluntarily disclose them. We design both passive and active attacks that solicit ratings for strategically selected items, and could thus be used by a recommender system to pursue this hidden agenda. Our methods are based on a novel usage of Bayesian matrix factorization in an active learning setting. Evaluations on multiple datasets illustrate that such attacks are indeed feasible and use significantly fewer rated items than static inference methods. Importantly, they succeed without sacrificing the quality of recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys 201

    An Accuracy-Assured Privacy-Preserving Recommender System for Internet Commerce

    Full text link
    Recommender systems, tool for predicting users' potential preferences by computing history data and users' interests, show an increasing importance in various Internet applications such as online shopping. As a well-known recommendation method, neighbourhood-based collaborative filtering has attracted considerable attention recently. The risk of revealing users' private information during the process of filtering has attracted noticeable research interests. Among the current solutions, the probabilistic techniques have shown a powerful privacy preserving effect. When facing kk Nearest Neighbour attack, all the existing methods provide no data utility guarantee, for the introduction of global randomness. In this paper, to overcome the problem of recommendation accuracy loss, we propose a novel approach, Partitioned Probabilistic Neighbour Selection, to ensure a required prediction accuracy while maintaining high security against kkNN attack. We define the sum of kk neighbours' similarity as the accuracy metric alpha, the number of user partitions, across which we select the kk neighbours, as the security metric beta. We generalise the kk Nearest Neighbour attack to beta k Nearest Neighbours attack. Differing from the existing approach that selects neighbours across the entire candidate list randomly, our method selects neighbours from each exclusive partition of size kk with a decreasing probability. Theoretical and experimental analysis show that to provide an accuracy-assured recommendation, our Partitioned Probabilistic Neighbour Selection method yields a better trade-off between the recommendation accuracy and system security.Comment: replacement for the previous versio
    corecore