123 research outputs found

    Markov Models for Network-Behavior Modeling and Anonymization

    Get PDF
    Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks -- particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymity-by-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network "behavior" and introduce a kernel-based framework for anonymity which fits together naturally with network-data modeling

    Deterministic Heavy Hitters with Sublinear Query Time

    Get PDF
    We study the classic problem of finding l_1 heavy hitters in the streaming model. In the general turnstile model, we give the first deterministic sublinear-time sketching algorithm which takes a linear sketch of length O(epsilon^{-2} log n * log^*(epsilon^{-1})), which is only a factor of log^*(epsilon^{-1}) more than the best existing polynomial-time sketching algorithm (Nelson et al., RANDOM \u2712). Our approach is based on an iterative procedure, where most unrecovered heavy hitters are identified in each iteration. Although this technique has been extensively employed in the related problem of sparse recovery, this is the first time, to the best of our knowledge, that it has been used in the context of heavy hitters. Along the way we also obtain a sublinear time algorithm for the closely related problem of the l_1/l_1 compressed sensing, matching the space usage of previous (super-)linear time algorithms. In the strict turnstile model, we show that the runtime can be improved and the sketching matrix can be made strongly explicit with O(epsilon^{-2}log^3 n/log^3(1/epsilon)) rows

    Sublinear-Time Algorithms for Compressive Phase Retrieval

    Full text link
    In the compressive phase retrieval problem, or phaseless compressed sensing, or compressed sensing from intensity only measurements, the goal is to reconstruct a sparse or approximately kk-sparse vector x∈Rnx \in \mathbb{R}^n given access to y=∣Φx∣y= |\Phi x|, where ∣v∣|v| denotes the vector obtained from taking the absolute value of v∈Rnv\in\mathbb{R}^n coordinate-wise. In this paper we present sublinear-time algorithms for different variants of the compressive phase retrieval problem which are akin to the variants considered for the classical compressive sensing problem in theoretical computer science. Our algorithms use pure combinatorial techniques and near-optimal number of measurements.Comment: The ell_2/ell_2 algorithm was substituted by a modification of the ell_infty/ell_2 algorithm which strictly subsumes i

    Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows

    Get PDF
    We study the distinct elements and l_p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram{} along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and l_p-heavy hitters that are nearly optimal in both n and epsilon. Applying our new composable histogram framework, we provide an algorithm that outputs a (1+epsilon)-approximation to the number of distinct elements in the sliding window model and uses O{1/(epsilon^2) log n log (1/epsilon)log log n+ (1/epsilon) log^2 n} bits of space. For l_p-heavy hitters, we provide an algorithm using space O{(1/epsilon^p) log^2 n (log^2 log n+log 1/epsilon)} for 0<p <=2, improving upon the best-known algorithm for l_2-heavy hitters (Braverman et al., COCOON 2014), which has space complexity O{1/epsilon^4 log^3 n}. We also show complementing nearly optimal lower bounds of Omega ((1/epsilon) log^2 n+(1/epsilon^2) log n) for distinct elements and Omega ((1/epsilon^p) log^2 n) for l_p-heavy hitters, both tight up to O{log log n} and O{log 1/epsilon} factors
    • …
    corecore