Search CORE

13,970 research outputs found

Fast Differentially Private Matrix Factorization

Author: Ahn S.
Chen T.
Ding N.
Hartstein A.
Keshavan R.
Kyrola A.
Marsaglia G.
Meka R.
Mir D. J.
Neal R. M.
Niu F.
Sato I.
Srebro N.
Wang Y.-X.
Wang Y.-X.
Welling M.
Xin Y.
Zhao H.
Publication venue
Publication date: 07/05/2015
Field of study

Differentially private collaborative filtering is a challenging task, both in terms of accuracy and speed. We present a simple algorithm that is provably differentially private, while offering good performance, using a novel connection of differential privacy to Bayesian posterior sampling via Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm lends itself to efficient implementation. By careful systems design and by exploiting the power law behavior of the data to maximize CPU cache bandwidth we are able to generate 1024 dimensional models at a rate of 8.5 million recommendations per second on a single PC

arXiv.org e-Print Archive

Crossref

Synthetic sequence generator for recommender systems - memory biased random walk on sequence multilayer network

Author: B. Berendt
B. Kenig
B.C. Chen
C. Dwork
C.C. Aggarwal
G. Adomavicius
J. Brookshear
J. Reiter
M. Deshpande
P. Samarati
R. Burke
R.A. Dandekar
R.A. Dandekar
T. Raghunathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Personalized recommender systems rely on each user's personal usage data in the system, in order to assist in decision making. However, privacy policies protecting users' rights prevent these highly personal data from being publicly available to a wider researcher audience. In this work, we propose a memory biased random walk model on multilayer sequence network, as a generator of synthetic sequential data for recommender systems. We demonstrate the applicability of the synthetic data in training recommender system models for cases when privacy policies restrict clickstream publishing.Comment: The new updated version of the pape

arXiv.org e-Print Archive

Crossref

Prochlo: Strong Privacy for Analytics in the Crowd

Author: Abadi M.
Abadi M.
Abadi M.
Avent B.
Bellare M.
Bulck J. V.
Buse R. P. L.
Chen R.
Corrigan-Gibbs H.
Dang H.
Denning D. E. R.
Dinh T. T. A.
Dwork
Lee S.
Maniatis P.
Ohrimenko O.
Ravindranath L.
Roy I.
Saltzer J. H.
Viega J.
Wang T.
Warner
Zheng W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/10/2017
Field of study

The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. (cont.; see the paper

arXiv.org e-Print Archive

University of Toronto Research Repository

Crossref

Let Your CyberAlter Ego Share Information and Manage Spam

Author: Boykin P. Oscar
Kong Joseph S.
Rezaei Behnam A.
Roychowdhury Vwani P.
Sarshar Nima
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/05/2005
Field of study

Almost all of us have multiple cyberspace identities, and these {\em cyber}alter egos are networked together to form a vast cyberspace social network. This network is distinct from the world-wide-web (WWW), which is being queried and mined to the tune of billions of dollars everyday, and until recently, has gone largely unexplored. Empirically, the cyberspace social networks have been found to possess many of the same complex features that characterize its real counterparts, including scale-free degree distributions, low diameter, and extensive connectivity. We show that these topological features make the latent networks particularly suitable for explorations and management via local-only messaging protocols. {\em Cyber}alter egos can communicate via their direct links (i.e., using only their own address books) and set up a highly decentralized and scalable message passing network that can allow large-scale sharing of information and data. As one particular example of such collaborative systems, we provide a design of a spam filtering system, and our large-scale simulations show that the system achieves a spam detection rate close to 100%, while the false positive rate is kept around zero. This system has several advantages over other recent proposals (i) It uses an already existing network, created by the same social dynamics that govern our daily lives, and no dedicated peer-to-peer (P2P) systems or centralized server-based systems need be constructed; (ii) It utilizes a percolation search algorithm that makes the query-generated traffic scalable; (iii) The network has a built in trust system (just as in social networks) that can be used to thwart malicious attacks; iv) It can be implemented right now as a plugin to popular email programs, such as MS Outlook, Eudora, and Sendmail.Comment: 13 pages, 10 figure

arXiv.org e-Print Archive

Crossref

Privacy-Friendly Collaboration for Cyber Threat Mitigation

Author: Brito Alex
De Cristofaro Emiliano
Freudiger Julien
Publication venue
Publication date: 01/03/2017
Field of study

Sharing of security data across organizational boundaries has often been advocated as a promising way to enhance cyber threat mitigation. However, collaborative security faces a number of important challenges, including privacy, trust, and liability concerns with the potential disclosure of sensitive data. In this paper, we focus on data sharing for predictive blacklisting, i.e., forecasting attack sources based on past attack information. We propose a novel privacy-enhanced data sharing approach in which organizations estimate collaboration benefits without disclosing their datasets, organize into coalitions of allied organizations, and securely share data within these coalitions. We study how different partner selection strategies affect prediction accuracy by experimenting on a real-world dataset of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by arXiv:1502.0533

arXiv.org e-Print Archive

CiteSeerX

Data Leak Detection As a Service: Challenges and Solutions

Author: Shu Xiaokui
Yao Danfeng (Daphne)
Publication venue
Publication date: 01/01/2012
Field of study

We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework

Computer Science Technical Reports @Virginia Tech