580 research outputs found

    The Sketching Complexity of Graph and Hypergraph Counting

    Full text link
    Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph HH in a bounded degree graph GG presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph HH. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of HH. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs HH, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity

    Traversing combinatorial 0/1-polytopes via optimization

    Get PDF
    In this paper, we present a new framework that exploits combinatorial optimization for efficiently generating a large variety of combinatorial objects based on graphs, matroids, posets and polytopes. Our method relies on a simple and versatile algorithm for computing a Hamilton path on the skeleton of any 0/1-polytope \conv(X), where X\seq \{0,1\}^n. The algorithm uses as a black box any algorithm that solves a variant of the classical linear optimization problem~min⁥{w⋅x∣x∈X}\min\{w\cdot x\mid x\in X\}, and the resulting delay, i.e., the running time per visited vertex on the Hamilton path, is only by a factor of log⁥n\log n larger than the running time of the optimization algorithm. When XX encodes a particular class of combinatorial objects, then traversing the skeleton of the polytope~\conv(X) along a Hamilton path corresponds to listing the combinatorial objects by local change operations, i.e., we obtain Gray code listings. As concrete results of our general framework, we obtain efficient algorithms for generating all (cc-optimal) bases and independent sets in a matroid; (cc-optimal) spanning trees, forests, matchings, maximum matchings, and cc-optimal matchings in a general graph; vertex covers, minimum vertex covers, cc-optimal vertex covers, stable sets, maximum stable sets and cc-optimal stable sets in a bipartite graph; as well as antichains, maximum antichains, cc-optimal antichains, and cc-optimal ideals of a poset. Specifically, the delay and space required by these algorithms are polynomial in the size of the matroid ground set, graph, or poset, respectively. Furthermore, all of these listings correspond to Hamilton paths on the corresponding combinatorial polytopes, namely the base polytope, matching polytope, vertex cover polytope, stable set polytope, chain polytope and order polytope, respectively. As another corollary from our framework, we obtain an \cO(t_{\upright{LP}} \log n) delay algorithm for the vertex enumeration problem on 0/1-polytopes {x∈Rn∣Ax≀b}\{x\in\mathbb{R}^n\mid Ax\leq b\}, where A∈Rm×nA\in \mathbb{R}^{m\times n} and~b∈Rmb\in\mathbb{R}^m, and t_{\upright{LP}} is the time needed to solve the linear program min⁥{w⋅x∣Ax≀b}\min\{w\cdot x\mid Ax\leq b\}. This improves upon the 25-year old \cO(t_{\upright{LP}}\,n) delay algorithm due to Bussieck and L\"ubbecke

    Analyzing massive datasets with missing entries: models and algorithms

    Get PDF
    We initiate a systematic study of computational models to analyze algorithms for massive datasets with missing or erased entries and study the relationship of our models with existing algorithmic models for large datasets. We focus on algorithms whose inputs are naturally represented as functions, codewords, or graphs. First, we generalize the property testing model, one of the most widely studied models of sublinear-time algorithms, to account for the presence of adversarially erased function values. We design efficient erasure-resilient property testing algorithms for several fundamental properties of real-valued functions such as monotonicity, Lipschitz property, convexity, and linearity. We then investigate the problems of local decoding and local list decoding of codewords containing erasures. We show that, in some cases, these problems are strictly easier than the corresponding problems of decoding codewords containing errors. Moreover, we use this understanding to show a separation between our erasure-resilient property testing model and the (error) tolerant property testing model. The philosophical message of this separation is that errors occurring in large datasets are, in general, harder to deal with, than erasures. Finally, we develop models and notions to reason about algorithms that are intended to run on large graphs with missing edges. While running algorithms on large graphs containing several missing edges, it is desirable to output solutions that are close to the solutions output when there are no missing edges. With this motivation, we define average sensitivity, a robustness metric for graph algorithms. We discuss various useful features of our definition and design approximation algorithms with good average sensitivity bounds for several optimization problems on graphs. We also define a model of erasure-resilient sublinear-time graph algorithms and design an efficient algorithm for testing connectivity of graphs

    Advances in SCA and RF-DNA Fingerprinting Through Enhanced Linear Regression Attacks and Application of Random Forest Classifiers

    Get PDF
    Radio Frequency (RF) emissions from electronic devices expose security vulnerabilities that can be used by an attacker to extract otherwise unobtainable information. Two realms of study were investigated here, including the exploitation of 1) unintentional RF emissions in the field of Side Channel Analysis (SCA), and 2) intentional RF emissions from physical devices in the field of RF-Distinct Native Attribute (RF-DNA) fingerprinting. Statistical analysis on the linear model fit to measured SCA data in Linear Regression Attacks (LRA) improved performance, achieving 98% success rate for AES key-byte identification from unintentional emissions. However, the presence of non-Gaussian noise required the use of a non-parametric classifier to further improve key guessing attacks. RndF based profiling attacks were successful in very high dimensional data sets, correctly guessing all 16 bytes of the AES key with a 50,000 variable dataset. With variable reduction, Random Forest still outperformed Template Attack for this data set, requiring fewer traces and achieving higher success rates with lower misclassification rate. Finally, the use of a RndF classifier is examined for intentional RF emissions from ZigBee devices to enhance security using RF-DNA fingerprinting. RndF outperformed parametric MDA/ML and non-parametric GRLVQI classifiers, providing up to GS =18.0 dB improvement (reduction in required SNR). Network penetration, measured using rogue ZigBee devices, show that the RndF method improved rogue rejection in noisier environments - gains of up to GS =18.0 dB are realized over previous methods

    Quantization and erasures in frame representations

    Get PDF
    Thesis (Sc. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 123-126).Frame representations, which correspond to overcomplete generalizations to basis expansions, are often used in signal processing to provide robustness to errors. In this thesis robustness is provided through the use of projections to compensate for errors in the representation coefficients, with specific focus on quantization and erasure errors. The projections are implemented by modifying the unaffected coefficients using an additive term, which is linear in the error. This low-complexity implementation only assumes linear reconstruction using a pre-determined synthesis frame, and makes no assumption on how the representation coefficients are generated. In the context of quantization, the limits of scalar quantization of frame representations are first examined, assuming the analysis is using inner products with the frame vectors. Bounds on the error and the bit-efficiency are derived, demonstrating that scalar quantization of the coefficients is suboptimal. As an alternative to scalar quantization, a generalization of Sigma-Delta noise shaping to arbitrary frame representations is developed by reformulating noise shaping as a sequence of compensations for the quantization error using projections.(cont.) The total error is quantified using both the additive noise model of quantization, and a deterministic upper bound based on the triangle inequality. It is thus shown that the average and the worst-case error is reduced compared to scalar quantization of the coefficients. The projection principle is also used to provide robustness to erasures. Specifically, the case of a transmitter that is aware of the erasure occurrence is considered, which compensates for the erasure error by projecting it to the subsequent frame vectors. It is further demonstrated that the transmitter can be split to a transmitter/receiver combination that performs the same compensation, but in which only the receiver is aware of the erasure occurrence. Furthermore, an algorithm to puncture dense representations in order to produce sparse approximate ones is introduced. In this algorithm the error due to the puncturing is also projected to the span of the remaining coefficients. The algorithm can be combined with quantization to produce quantized sparse representations approximating the original dense representation.by Petros T. Boufounos.Sc.D

    On the Stability of Distribution Topologies in Peer-to-Peer Live Streaming Systems

    Get PDF
    ï»żPeer-to-Peer Live-Streaming-Systeme sind stĂ€ndigen Störungen ausgesetzt.Insbesondere ermöglichen unzuverlĂ€ssige Teilnehmer AusfĂ€lle und Angriffe, welche ĂŒberraschend Peers aus dem System entfernen. Die Folgen solcher VorfĂ€lle werden großteils von der Verteilungstopologie bestimmt, d.h. der Kommunikationsstruktur zwischen den Peers.In dieser Arbeit analysieren wir Optimierungsprobleme welche bei der Betrachtung von StabilitĂ€tsbegriffen fĂŒr solche Verteilungstopologien auftreten. Dabei werden sowohl Angriffe als auch unkoordinierte AusfĂ€lle berĂŒcksichtigt.ZunĂ€chst untersuchen wir die BerechnungskomplexitĂ€t und Approximierbarkeit des Problems resourcen-effiziente Angriffe zu bestimmen. Dies demonstriert BeschrĂ€nkungen in den Planungsmöglichkeiten von Angreifern und zeigt inwieweit die Topologieparameter die Schwierigkeit solcher Angriffsrobleme beeinflussen. Anschließend studieren wir Topologieformationsprobleme. Dabei sind Topologieparameter vorgegeben und es muss eine passende Verteilungstopologie gefunden werden. Ziel ist es Topologien zu erzeugen, welche den durch Angriffe mit beliebigen Parametern erzeugbaren maximalen Schaden minimieren.Wir identifizieren notwendige und hinreichende Eigenschaften solcher Verteilungstopologien. Dies fĂŒhrt zu mathematisch fundierten Zielstellungen fĂŒr das Topologie-Management von Peer-to-Peer Live-Streaming-Systemen.Wir zeigen zwei große Klassen effizient konstruierbarer Verteilungstopologien, welche den maximal möglichen, durch Angriffe verursachten Paketverlust minimieren. ZusĂ€tzlich beweisen wir, dass die Bestimmung dieser Eigenschaft fĂŒr beliebige Topologien coNP-vollstĂ€ndig ist.Soll die maximale Anzahl von Peers minimiert werden, bei denen ein Angriff zu ungenĂŒgender Stream-QualitĂ€t fĂŒhrt, Ă€ndern sich die Anforderungen an Verteilungstopologien. Wir zeigen, dass dieses Topologieformationsproblem eng mit offenen Problemen aus Design- und Kodierungstheorie verwandt ist.Schließlich analysieren wir Verteilungstopologien die den durch unkoordinierte AusfĂ€lle zu erwartetenden Paketverlust minimieren. Wir zeigen Eigenschaften und Existenzbedingungen. Außerdem bestimmen wir die BerechnungskomplexitĂ€t des Auffindens solcher Topologien. Unsere Ergebnisse liefern Richtlinien fĂŒr das Topologie-Management von Peer-to-Peer Live-Streaming-Systemen und zeigen auf, welche StabilitĂ€tsziele effizient erreicht werden können.The stability of peer-to-peer live streaming systems is constantly challenged. Especially, the unreliability and vulnerability of their participants allows for failures and attacks suddenly disabling certain sets of peers. The consequences of such events are largely determined by the distribution topology, i.e., the pattern of communication between the peers.In this thesis, we analyze a broad range of optimization problems concerning the stability of distribution topologies. For this, we discuss notions of stability against both attacks and failures.At first, we investigate the computational complexity and approximability of finding resource-efficient attacks. This allows to point out limitations of an attacker's planning capabilities and demonstrates the influence of the chosen system parameters on the hardness of such attack problems.Then, we turn to study topology formation problems. Here, a set of topology parameters is given and the task consists in finding an eligible distribution topology. In particular, it has to minimize the maximum damage achievable by attacks with arbitrary attack parameters.We identify necessary and sufficient conditions on attack-stable distribution topologies. Thereby, we give mathematically sound guidelines for the topology management of peer-to-peer live streaming systems.We find large classes of efficiently-constructable topologies minimizing the system-wide packet loss under attacks. Additionally, we show that determining this feature for arbitrary topologies is coNP-complete.Considering topologies minimizing the maximum number of peers for which an attack leads to a heavy decrease in perceived streaming quality, the requirements change. Here, we show that the corresponding topology formation problem is closely related to long-standing open problems of Design and Coding Theory.Finally, we study topologies minimizing the expected packet loss due to uncoordinated peer failures. We investigate properties and existence conditions of such topologies. Furthermore, we determine the computational complexity of constructing them.Our results provide guidelines for the topology management of peer-to-peer live streaming systems and mathematically determine which goals can be achieved efficiently
    • 

    corecore