802 research outputs found
Controlled Data Sharing for Collaborative Predictive Blacklisting
Although sharing data across organizations is often advocated as a promising
way to enhance cybersecurity, collaborative initiatives are rarely put into
practice owing to confidentiality, trust, and liability challenges. In this
paper, we investigate whether collaborative threat mitigation can be realized
via a controlled data sharing approach, whereby organizations make informed
decisions as to whether or not, and how much, to share. Using appropriate
cryptographic tools, entities can estimate the benefits of collaboration and
agree on what to share in a privacy-preserving way, without having to disclose
their datasets. We focus on collaborative predictive blacklisting, i.e.,
forecasting attack sources based on one's logs and those contributed by other
organizations. We study the impact of different sharing strategies by
experimenting on a real-world dataset of two billion suspicious IP addresses
collected from Dshield over two months. We find that controlled data sharing
yields up to 105% accuracy improvement on average, while also reducing the
false positive rate.Comment: A preliminary version of this paper appears in DIMVA 2015. This is
the full version. arXiv admin note: substantial text overlap with
arXiv:1403.212
Privacy-Friendly Collaboration for Cyber Threat Mitigation
Sharing of security data across organizational boundaries has often been
advocated as a promising way to enhance cyber threat mitigation. However,
collaborative security faces a number of important challenges, including
privacy, trust, and liability concerns with the potential disclosure of
sensitive data. In this paper, we focus on data sharing for predictive
blacklisting, i.e., forecasting attack sources based on past attack
information. We propose a novel privacy-enhanced data sharing approach in which
organizations estimate collaboration benefits without disclosing their
datasets, organize into coalitions of allied organizations, and securely share
data within these coalitions. We study how different partner selection
strategies affect prediction accuracy by experimenting on a real-world dataset
of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by
arXiv:1502.0533
On Collaborative Predictive Blacklisting
Collaborative predictive blacklisting (CPB) allows to forecast future attack
sources based on logs and alerts contributed by multiple organizations.
Unfortunately, however, research on CPB has only focused on increasing the
number of predicted attacks but has not considered the impact on false
positives and false negatives. Moreover, sharing alerts is often hindered by
confidentiality, trust, and liability issues, which motivates the need for
privacy-preserving approaches to the problem. In this paper, we present a
measurement study of state-of-the-art CPB techniques, aiming to shed light on
the actual impact of collaboration. To this end, we reproduce and measure two
systems: a non privacy-friendly one that uses a trusted coordinating party with
access to all alerts (Soldo et al., 2010) and a peer-to-peer one using
privacy-preserving data sharing (Freudiger et al., 2015). We show that, while
collaboration boosts the number of predicted attacks, it also yields high false
positives, ultimately leading to poor accuracy. This motivates us to present a
hybrid approach, using a semi-trusted central entity, aiming to increase
utility from collaboration while, at the same time, limiting information
disclosure and false positives. This leads to a better trade-off of true and
false positive rates, while at the same time addressing privacy concerns.Comment: A preliminary version of this paper appears in ACM SIGCOMM's Computer
Communication Review (Volume 48 Issue 5, October 2018). This is the full
versio
Hardening DGA classifiers utilizing IVAP
Domain Generation Algorithms (DGAs) are used by malware to generate a deterministic set of domains, usually by utilizing a pseudo-random seed. A malicious botmaster can establish connections between their command-and-control center (C&C) and any malware-infected machines by registering domains that will be DGA-generated given a specific seed, rendering traditional domain blacklisting ineffective. Given the nature of this threat, the real-time detection of DGA domains based on incoming DNS traffic is highly important. The use of neural network machine learning (ML) models for this task has been well-studied, but there is still substantial room for improvement. In this paper, we propose to use Inductive Venn-Abers predictors (IVAPs) to calibrate the output of existing ML models for DGA classification. The IVAP is a computationally efficient procedure which consistently improves the predictive accuracy of classifiers at the expense of not offering predictions for a small subset of inputs and consuming an additional amount of training data
Predictive Cyber Situational Awareness and Personalized Blacklisting: A Sequential Rule Mining Approach
Cybersecurity adopts data mining for its ability to extract concealed and indistinct patterns in the data, such as for the needs of alert correlation. Inferring common attack patterns and rules from the alerts helps in understanding the threat landscape for the defenders and allows for the realization of cyber situational awareness, including the projection of ongoing attacks. In this paper, we explore the use of data mining, namely sequential rule mining, in the analysis of intrusion detection alerts. We employed a dataset of 12 million alerts from 34 intrusion detection systems in 3 organizations gathered in an alert sharing platform, and processed it using our analytical framework. We execute the mining of sequential rules that we use to predict security events, which we utilize to create a predictive blacklist. Thus, the recipients of the data from the sharing platform will receive only a small number of alerts of events that are likely to occur instead of a large number of alerts of past events. The predictive blacklist has the size of only 3 % of the raw data, and more than 60 % of its entries are shown to be successful in performing accurate predictions in operational, real-world settings
Big Data Blacklisting
“Big data blacklisting” is the process of categorizing individuals as administratively “guilty until proven innocent” by virtue of suspicious digital data and database screening results. Database screening and digital watchlisting systems are increasingly used to determine who can work, vote, fly, etc. In a big data world, through the deployment of these big data tools, both substantive and procedural due process protections may be threatened in new and nearly invisible ways. Substantive due process rights safeguard fundamental liberty interests. Procedural due process rights prevent arbitrary deprivations by the government of constitutionally protected interests. This Article frames the increasing digital mediation of rights and privileges through government-led big data programs as a constitutional harm under substantive due process, and identifies the obstruction of core liberties with big data tools as rapidly evolving and systemic.
To illustrate the mass scale and unprecedented nature of the big data blacklisting phenomenon, this Article undertakes a significant descriptive burden to introduce and contextualize big data blacklisting programs. Through this descriptive effort, this Article explores how a commonality of big data harms may be associated with nonclassified big data programs, such as the No Work List and No Vote List-programs that the government uses to establish or deny an individual\u27s eligibility for certain benefits or rights through database screening. The big data blacklisting harms of big data tools to make eligibility decisions are not, of course, limited to nonclassified programs. This Article also suggests how the same consequences may be at play with classified and semi-classified big data programs such as the Terrorist Watchlist and No Fly List. This Article concludes that big data blacklisting harms interfere with and obstruct fundamental liberty interests in a way that now necessitates an evolution of the existing due process jurisprudence
Predictive Methods in Cyber Defense: Current Experience and Research Challenges
Predictive analysis allows next-generation cyber defense that is more proactive than current approaches based on intrusion detection. In this paper, we discuss various aspects of predictive methods in cyber defense and illustrate them on three examples of recent approaches. The first approach uses data mining to extract frequent attack scenarios and uses them to project ongoing cyberattacks. The second approach uses a dynamic network entity reputation score to predict malicious actors. The third approach uses time series analysis to forecast attack rates in the network. This paper presents a unique evaluation of the three distinct methods in a common environment of an intrusion detection alert sharing platform, which allows for a comparison of the approaches and illustrates the capabilities of predictive analysis for current and future research and cybersecurity operations. Our experiments show that all three methods achieved a sufficient technology readiness level for experimental deployment in an operational setting with promising accuracy and usability. Namely prediction and projection methods, despite their differences, are highly usable for predictive blacklisting, the first provides a more detailed output, and the second is more extensible. Network security situation forecasting is lightweight and displays very high accuracy, but does not provide details on predicted events
- …