Search CORE

11,593 research outputs found

Supporting Regularized Logistic Regression Privately and Efficiently

Author: Li Wenfa
Liu Hongzhe
Xie Wei
Yang Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/09/2015
Field of study

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Controlled Data Sharing for Collaborative Predictive Blacklisting

Author: B Applebaum
C Blundo
C Song
D Gusfield
E Cristofaro De
E Cristofaro De
E De Cristofaro
E Kenneally
I Bilogrevic
MJ Freedman
Publication venue
Publication date: 16/04/2015
Field of study

Although sharing data across organizations is often advocated as a promising way to enhance cybersecurity, collaborative initiatives are rarely put into practice owing to confidentiality, trust, and liability challenges. In this paper, we investigate whether collaborative threat mitigation can be realized via a controlled data sharing approach, whereby organizations make informed decisions as to whether or not, and how much, to share. Using appropriate cryptographic tools, entities can estimate the benefits of collaboration and agree on what to share in a privacy-preserving way, without having to disclose their datasets. We focus on collaborative predictive blacklisting, i.e., forecasting attack sources based on one's logs and those contributed by other organizations. We study the impact of different sharing strategies by experimenting on a real-world dataset of two billion suspicious IP addresses collected from Dshield over two months. We find that controlled data sharing yields up to 105% accuracy improvement on average, while also reducing the false positive rate.Comment: A preliminary version of this paper appears in DIMVA 2015. This is the full version. arXiv admin note: substantial text overlap with arXiv:1403.212

arXiv.org e-Print Archive

Crossref

UCL Discovery

Privacy-Friendly Collaboration for Cyber Threat Mitigation

Author: Brito Alex
De Cristofaro Emiliano
Freudiger Julien
Publication venue
Publication date: 01/03/2017
Field of study

Sharing of security data across organizational boundaries has often been advocated as a promising way to enhance cyber threat mitigation. However, collaborative security faces a number of important challenges, including privacy, trust, and liability concerns with the potential disclosure of sensitive data. In this paper, we focus on data sharing for predictive blacklisting, i.e., forecasting attack sources based on past attack information. We propose a novel privacy-enhanced data sharing approach in which organizations estimate collaboration benefits without disclosing their datasets, organize into coalitions of allied organizations, and securely share data within these coalitions. We study how different partner selection strategies affect prediction accuracy by experimenting on a real-world dataset of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by arXiv:1502.0533

arXiv.org e-Print Archive

CiteSeerX

Secure and Private Data Aggregation in WSN

Author: Taban Gelareh
Publication venue
Publication date: 21/11/2008
Field of study

Data aggregation is an important efficiency mechanism for large scale, resource constrained networks such as wireless sensor networks (WSN). Security and privacy are central for many data aggregation applications: (1) entities make decisions based on the results of the data aggregation, so the entities need to be assured that the aggregation process and in particular the aggregate data they receive has not been corrupted (i.e., verify the integrity of the aggregation); (2) If the aggregation application has been attacked, then the attack must be handled efficiently; (3) the privacy requirements of the sensor network must be preserved. The nature of both wireless sensor networks and data aggregation make it particularly challenging to provide the desired security and privacy requirements: (1) sensors in WSN can be easily compromised and subsequently corrupted by an adversary since they are unmonitored and have little physical security; (2) a malicious aggregator node at the root of an aggregation subtree can corrupt not just its own value but also that of all the nodes in its entire aggregation subtree; (3) since sensors have limited resourced, it is crucial to achieve the security objectives while adopting only cheap symmetric-key based operations and minimizing communication cost. In this thesis, we ﬁrst address the problem of efficient handling of adversarial attacks on data aggregation applications in WSN. We propose and analyze a detection and identification solution, presenting a precise cost-based characterization when in-network data aggregation retains its assumed benefits under persistent attacks. Second, we address the issue of data privacy in WSN in the context of data aggregation. We introduce and analyze the problem of privacy-preserving integrity-assured data aggregation (PIA) and show that there is an inherent tension between preservation of data privacy and secure data aggregation. Additionally, we look at the problem of PIA in publish-subscribe networks when there are multiple, collaborative yet competing subscribers

Digital Repository at the University of Maryland