4,284 research outputs found
Controlled Data Sharing for Collaborative Predictive Blacklisting
Although sharing data across organizations is often advocated as a promising
way to enhance cybersecurity, collaborative initiatives are rarely put into
practice owing to confidentiality, trust, and liability challenges. In this
paper, we investigate whether collaborative threat mitigation can be realized
via a controlled data sharing approach, whereby organizations make informed
decisions as to whether or not, and how much, to share. Using appropriate
cryptographic tools, entities can estimate the benefits of collaboration and
agree on what to share in a privacy-preserving way, without having to disclose
their datasets. We focus on collaborative predictive blacklisting, i.e.,
forecasting attack sources based on one's logs and those contributed by other
organizations. We study the impact of different sharing strategies by
experimenting on a real-world dataset of two billion suspicious IP addresses
collected from Dshield over two months. We find that controlled data sharing
yields up to 105% accuracy improvement on average, while also reducing the
false positive rate.Comment: A preliminary version of this paper appears in DIMVA 2015. This is
the full version. arXiv admin note: substantial text overlap with
arXiv:1403.212
Privacy-Friendly Collaboration for Cyber Threat Mitigation
Sharing of security data across organizational boundaries has often been
advocated as a promising way to enhance cyber threat mitigation. However,
collaborative security faces a number of important challenges, including
privacy, trust, and liability concerns with the potential disclosure of
sensitive data. In this paper, we focus on data sharing for predictive
blacklisting, i.e., forecasting attack sources based on past attack
information. We propose a novel privacy-enhanced data sharing approach in which
organizations estimate collaboration benefits without disclosing their
datasets, organize into coalitions of allied organizations, and securely share
data within these coalitions. We study how different partner selection
strategies affect prediction accuracy by experimenting on a real-world dataset
of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by
arXiv:1502.0533
P2KMV: A Privacy-preserving Counting Sketch for Efficient and Accurate Set Intersection Cardinality Estimations
In this paper, we propose P2KMV, a novel privacy-preserving counting sketch, based on the k minimum values algorithm. With P2KMV, we offer a versatile privacy-enhanced technology for obtaining statistics, following the principle of data minimization, and aiming for the sweet spot between privacy, accuracy, and computational efficiency. As our main contribution, we develop methods to perform set operations, which facilitate cardinality estimates under strong privacy requirements. Most notably, we propose an efficient, privacy-preserving algorithm to estimate the set intersection cardinality. P2KMV provides plausible deniability for all data items contained in the sketch. We discuss the algorithm's privacy guarantees as well as the accuracy of the obtained estimates. An experimental evaluation confirms our analytical expectations and provides insights regarding parameter choices
On Multidimensional Inequality in Partitions of Multisets
We study multidimensional inequality in partitions of finite multisets with thresholds. In such a setting, a Lorenz-like preorder, a family of functions preserving such a preorder, and a counterpart of the Pigou-Dalton transfers are defined, and a version of the celebrated Hardy-Littlewood-Pölya characterization results is provided.Multisets, majorization, Lorenz preorder, Hardy-Littlewood-Polya theorem, transfers
Aspects of generic entanglement
We study entanglement and other correlation properties of random states in
high-dimensional bipartite systems. These correlations are quantified by
parameters that are subject to the "concentration of measure" phenomenon,
meaning that on a large-probability set these parameters are close to their
expectation. For the entropy of entanglement, this has the counterintuitive
consequence that there exist large subspaces in which all pure states are close
to maximally entangled. This, in turn, implies the existence of mixed states
with entanglement of formation near that of a maximally entangled state, but
with negligible quantum mutual information and, therefore, negligible
distillable entanglement, secret key, and common randomness. It also implies a
very strong locking effect for the entanglement of formation: its value can
jump from maximal to near zero by tracing over a number of qubits negligible
compared to the size of total system. Furthermore, such properties are generic.
Similar phenomena are observed for random multiparty states, leading us to
speculate on the possibility that the theory of entanglement is much simplified
when restricted to asymptotically generic states. Further consequences of our
results include a complete derandomization of the protocol for universal
superdense coding of quantum states.Comment: 22 pages, 1 figure, 1 tabl
Quantum-locked key distribution at nearly the classical capacity rate
Quantum data locking is a protocol that allows for a small secret key to
(un)lock an exponentially larger amount of information, hence yielding the
strongest violation of the classical one-time pad encryption in the quantum
setting. This violation mirrors a large gap existing between two security
criteria for quantum cryptography quantified by two entropic quantities: the
Holevo information and the accessible information. We show that the latter
becomes a sensible security criterion if an upper bound on the coherence time
of the eavesdropper's quantum memory is known. Under this condition we
introduce a protocol for secret key generation through a memoryless qudit
channel. For channels with enough symmetry, such as the d-dimensional erasure
and depolarizing channels, this protocol allows secret key generation at an
asymptotic rate as high as the classical capacity minus one bit.Comment: v2 is close to the published version and contains only the key
distribution protocols (4+5 pages), an extended version of the direct
communication protocol is posted in arXiv:1410.4748 Comments always welcom
Recommended from our members
Prediction of claims in export credit finance: a comparison of four machine learning techniques
This study evaluates four machine learning (ML) techniques (Decision Trees (DT), Random Forests (RF), Neural Networks (NN) and Probabilistic Neural Networks (PNN)) on their ability to accurately predict export credit insurance claims. Additionally, we compare the performance of the ML techniques against a simple benchmark (BM) heuristic. The analysis is based on the utilisation of a dataset provided by the Berne Union, which is the most comprehensive collection of export credit insurance data and has been used in only two scientific studies so far. All ML techniques performed relatively well in predicting whether or not claims would be incurred, and, with limitations, in predicting the order of magnitude of the claims. No satisfactory results were achieved predicting actual claim ratios. RF performed significantly better than DT, NN and PNN against all prediction tasks, and most reliably carried their validation performance forward to test performance
Mining Network Events using Traceroute Empathy
In the never-ending quest for tools that enable an ISP to smooth
troubleshooting and improve awareness of network behavior, very much effort has
been devoted in the collection of data by active and passive measurement at the
data plane and at the control plane level. Exploitation of collected data has
been mostly focused on anomaly detection and on root-cause analysis. Our
objective is somewhat in the middle. We consider traceroutes collected by a
network of probes and aim at introducing a practically applicable methodology
to quickly spot measurements that are related to high-impact events happened in
the network. Such filtering process eases further in- depth human-based
analysis, for example with visual tools which are effective only when handling
a limited amount of data. We introduce the empathy relation between traceroutes
as the cornerstone of our formal characterization of the traceroutes related to
a network event. Based on this model, we describe an algorithm that finds
traceroutes related to high-impact events in an arbitrary set of measurements.
Evidence of the effectiveness of our approach is given by experimental results
produced on real-world data.Comment: 8 pages, 7 figures, extended version of Discovering High-Impact
Routing Events using Traceroutes, in Proc. 20th International Symposium on
Computers and Communications (ISCC 2015
- …