11,593 research outputs found
Supporting Regularized Logistic Regression Privately and Efficiently
As one of the most popular statistical and machine learning models, logistic
regression with regularization has found wide adoption in biomedicine, social
sciences, information technology, and so on. These domains often involve data
of human subjects that are contingent upon strict privacy regulations.
Increasing concerns over data privacy make it more and more difficult to
coordinate and conduct large-scale collaborative studies, which typically rely
on cross-institution data sharing and joint analysis. Our work here focuses on
safeguarding regularized logistic regression, a widely-used machine learning
model in various disciplines while at the same time has not been investigated
from a data security and privacy perspective. We consider a common use scenario
of multi-institution collaborative studies, such as in the form of research
consortia or networks as widely seen in genetics, epidemiology, social
sciences, etc. To make our privacy-enhancing solution practical, we demonstrate
a non-conventional and computationally efficient method leveraging distributing
computing and strong cryptography to provide comprehensive protection over
individual-level and summary data. Extensive empirical evaluation on several
studies validated the privacy guarantees, efficiency and scalability of our
proposal. We also discuss the practical implications of our solution for
large-scale studies and applications from various disciplines, including
genetic and biomedical studies, smart grid, network analysis, etc
Controlled Data Sharing for Collaborative Predictive Blacklisting
Although sharing data across organizations is often advocated as a promising
way to enhance cybersecurity, collaborative initiatives are rarely put into
practice owing to confidentiality, trust, and liability challenges. In this
paper, we investigate whether collaborative threat mitigation can be realized
via a controlled data sharing approach, whereby organizations make informed
decisions as to whether or not, and how much, to share. Using appropriate
cryptographic tools, entities can estimate the benefits of collaboration and
agree on what to share in a privacy-preserving way, without having to disclose
their datasets. We focus on collaborative predictive blacklisting, i.e.,
forecasting attack sources based on one's logs and those contributed by other
organizations. We study the impact of different sharing strategies by
experimenting on a real-world dataset of two billion suspicious IP addresses
collected from Dshield over two months. We find that controlled data sharing
yields up to 105% accuracy improvement on average, while also reducing the
false positive rate.Comment: A preliminary version of this paper appears in DIMVA 2015. This is
the full version. arXiv admin note: substantial text overlap with
arXiv:1403.212
Privacy-Friendly Collaboration for Cyber Threat Mitigation
Sharing of security data across organizational boundaries has often been
advocated as a promising way to enhance cyber threat mitigation. However,
collaborative security faces a number of important challenges, including
privacy, trust, and liability concerns with the potential disclosure of
sensitive data. In this paper, we focus on data sharing for predictive
blacklisting, i.e., forecasting attack sources based on past attack
information. We propose a novel privacy-enhanced data sharing approach in which
organizations estimate collaboration benefits without disclosing their
datasets, organize into coalitions of allied organizations, and securely share
data within these coalitions. We study how different partner selection
strategies affect prediction accuracy by experimenting on a real-world dataset
of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by
arXiv:1502.0533
Secure and Private Data Aggregation in WSN
Data aggregation is an important efficiency mechanism for large scale, resource constrained networks such as wireless sensor networks (WSN). Security and privacy are central for many data aggregation applications: (1) entities make decisions based on the results of the data aggregation, so the entities need to be assured that the aggregation process and in particular the aggregate data they receive has not been corrupted (i.e., verify the integrity of the aggregation); (2) If the aggregation application has been attacked, then the attack must be handled efficiently; (3) the privacy requirements of the sensor network must be preserved.
The nature of both wireless sensor networks and data aggregation make it particularly challenging to provide the desired security and privacy requirements: (1) sensors in WSN can be easily compromised and subsequently corrupted by an adversary since they are unmonitored and have little physical security; (2) a malicious aggregator node at the root of an aggregation subtree can corrupt not just its own value but also that of all the nodes in its entire aggregation subtree; (3) since sensors have limited resourced, it is crucial to achieve the security objectives while adopting only cheap symmetric-key based operations and minimizing communication cost.
In this thesis, we first address the problem of efficient handling of adversarial attacks on data aggregation applications in WSN. We propose and analyze a detection and identification solution, presenting a precise cost-based characterization when in-network data aggregation retains its assumed benefits under persistent attacks. Second, we address the issue of data privacy in WSN in the context of data aggregation. We introduce and analyze the problem of privacy-preserving integrity-assured data aggregation (PIA) and show that there is an inherent tension between preservation of data privacy and secure data aggregation. Additionally, we look at the problem of PIA in publish-subscribe networks when there are multiple, collaborative yet competing subscribers
- …