    Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays

    Understanding the local behaviour of structured multi-dimensional data is a fundamental problem in various areas of computer science. As the amount of data is often huge, it is desirable to obtain sublinear time algorithms, and specifically property testers, to understand local properties of the data. We focus on the natural local problem of testing pattern freeness: given a large dd-dimensional array AA and a fixed dd-dimensional pattern PP over a finite alphabet, we say that AA is PP-free if it does not contain a copy of the forbidden pattern PP as a consecutive subarray. The distance of AA to PP-freeness is the fraction of entries of AA that need to be modified to make it PP-free. For any ϵ∈[0,1]\epsilon \in [0,1] and any large enough pattern PP over any alphabet, other than a very small set of exceptional patterns, we design a tolerant tester that distinguishes between the case that the distance is at least ϵ\epsilon and the case that it is at most adϵa_d \epsilon, with query complexity and running time cdϵ−1c_d \epsilon^{-1}, where ad<1a_d < 1 and cdc_d depend only on dd. To analyze the testers we establish several combinatorial results, including the following dd-dimensional modification lemma, which might be of independent interest: for any large enough pattern PP over any alphabet (excluding a small set of exceptional patterns for the binary case), and any array AA containing a copy of PP, one can delete this copy by modifying one of its locations without creating new PP-copies in AA. Our results address an open question of Fischer and Newman, who asked whether there exist efficient testers for properties related to tight substructures in multi-dimensional structured data. They serve as a first step towards a general understanding of local properties of multi-dimensional arrays, as any such property can be characterized by a fixed family of forbidden patterns

    Testing Local Properties of Arrays

    We study testing of local properties in one-dimensional and multi-dimensional arrays. A property of d-dimensional arrays f:[n]^d -> Sigma is k-local if it can be defined by a family of k x ... x k forbidden consecutive patterns. This definition captures numerous interesting properties. For example, monotonicity, Lipschitz continuity and submodularity are 2-local; convexity is (usually) 3-local; and many typical problems in computational biology and computer vision involve o(n)-local properties. In this work, we present a generic approach to test all local properties of arrays over any finite (and not necessarily bounded size) alphabet. We show that any k-local property of d-dimensional arrays is testable by a simple canonical one-sided error non-adaptive epsilon-test, whose query complexity is O(epsilon^{-1}k log{(epsilon n)/k}) for d = 1 and O(c_d epsilon^{-1/d} k * n^{d-1}) for d > 1. The queries made by the canonical test constitute sphere-like structures of varying sizes, and are completely independent of the property and the alphabet Sigma. The query complexity is optimal for a wide range of parameters: For d=1, this matches the query complexity of many previously investigated local properties, while for d > 1 we design and analyze new constructions of k-local properties whose one-sided non-adaptive query complexity matches our upper bounds. For some previously studied properties, our method provides the first known sublinear upper bound on the query complexity

    Efficient Removal Lemmas for Matrices

    The authors and Fischer recently proved that any hereditary property of two-dimensional matrices (where the row and column order is not ignored) over a finite alphabet is testable with a constant number of queries, by establishing an (ordered) matrix removal lemma, which states the following: If a matrix is far from satisfying some hereditary property, then a large enough constant-size random submatrix of it does not satisfy the property with probability at least 9/10. Here being far from the property means that one needs to modify a constant fraction of the entries of the matrix to make it satisfy the property. However, in the above general removal lemma, the required size of the random submatrix grows very fast as a function of the distance of the matrix from satisfying the property. In this work we establish much more efficient removal lemmas for several special cases of the above problem. In particular, we show the following: If an epsilon-fraction of the entries of a binary matrix M can be covered by pairwise-disjoint copies of some (s x t) matrix A, then a delta-fraction of the (s x t)-submatrices of M are equal to A, where delta is polynomial in epsilon. We generalize the work of Alon, Fischer and Newman [SICOMP\u2707] and make progress towards proving one of their conjectures. The proofs combine their efficient conditional regularity lemma for matrices with additional combinatorial and probabilistic ideas

    Geometric Inhomogeneous Random Graphs for Algorithm Engineering

    The design and analysis of graph algorithms is heavily based on the worst case. In practice, however, many algorithms perform much better than the worst case would suggest. Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic. The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties. A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs). Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering. Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications. They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed. Moreover, they can be efficiently generated which allows for experimental analysis. While realistic instances are often rare, generated instances are readily available. Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure. The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability. We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs. In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set. For all four problems, our implementations beat the state-of-the-art on realistic inputs. On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights. Most notably, our efficient generator allows us to experimentally show sublinear running time of our flow algorithm, investigate the solution structure of cluster editing, complement our benchmark set of arborescence instances with a density for which there are no real-world networks available, and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms

    Fingerprinting Codes and Related Combinatorial Structures

    Fingerprinting codes were introduced by Boneh and Shaw in 1998 as a method of copyright control. The desired properties of a good fingerprinting code has been found to have deep connections to combinatorial structures such as error-correcting codes and cover-free families. The particular property that motivated our research is called "frameproof". This has been studied extensively when the alphabet size q is at least as large as the colluder size w. Much less is known about the case q < w, and we prove several interesting properties about the binary case q = 2 in this thesis. When the length of the code N is relatively small, we have shown that the number of codewords n cannot exceed N, which is a tight bound since the n = N case can be satisfied a trivial construction using permutation matrices. Furthermore, the only possible candidates are equivalent to this trivial construction. Generalization to a restricted parameter set of separating hash families is also given. As a consequence, the above result motivates the question of when a non-trivial construction can be found, and we give some definitive answers by considering combinatorial designs. In particular, we give a necessary and sufficient condition for a symmetric design to be a binary 3-frameproof code, and provide example classes of symmetric designs that satisfy or fail this condition. Finally, we apply our results to a problem of constructing short binary frameproof codes

    GigaFitter at CDF: Offline-Quality Track Fitting in a Nanosecond for Hadron Collider Triggers

    Il sistema di selezione online degli eventi e` un aspetto fondamentale per la riuscita di un esperimento di Fisica delle Alte Energie ai collisionatori adronici. Infatti il numero di eventi interessanti per la fisica e` soppresso da un numero molti ordini di grandezza piu` elevato di eventi non interessanti; a questo si unisce il problema che il numero di eventi scrivibili su disco per l'analisi e` una piccola frazione di quelli prodotti. Dalla selezione online dipende la qualita` dei dati acquisiti, in termini di eventi di segnale interessante raccolti, ed e` necessario dotarsi di un sistema di selezione (trigger) sofisticato ed efficiente. La ricostruzione della traiettoria delle particelle cariche fornisce un'informazione dettagliata sull'evento che permette di costruire selezioni molto efficaci. E` pero` un problema estremamente difficile da risolvere nei tempi richiesti per il suo impiego online. Il processore SVT di CDF riesce a ricostruire per ogni evento le traiettorie di tutte le particelle cariche con pT > 2 GeV in poche decine di microsecondi e con qualita` paragonabile a quella offline. Le informazioni ricostruite da SVT vengono usate nella selezione di trigger di secondo livello. L'impiego di SVT a CDF e` alla base di molti dei successi di fisica dell'esperimento. Per rimanere sempre attuale e in grado di far fronte alla nuove caratteristiche dell'acceleratore, il quale incrementa le proprie prestazione nel corso del tempo, il sistema SVT si e` sottoposto ad una serie di upgrade nel corso della sua storia. In questa tesi e` descritto il processore GigaFitter, l'ultimo upgrade di SVT. Lo scopo del processore GigaFitter e` di sostituire il precedente sistema di calcolo di SVT, il Track Fitter++ (16 schede), con un piccolo (1 scheda), ma potente processore. Il GigaFitter rimuove alcune delle limitazioni poste dalle capacita` di calcolo del Track Fitter e permette di aumentare l'accettanza e l'efficienza del sistema SVT. Migliorare accettanza ed efficienza permette di migliorare la statistica del segnale interessante nel campione raccolto. Il GigaFitter e` basato sull'utilizzo di un potente FPGA con unita` di calcolo dedicate. Il sistema sviluppato e` in grado di eseguire 1.4 fit/ns. Il design dell'architettura e i dettagli di hardware e firmware sono descritti nella tesi. Sono riportati gli studi effettuati per la validazione delle performance dell'hardware, sia in termini di aumento d'efficienza che in termini di timing. E` inoltre mostrato uno studio su come sia possibile grazie al GigaFitter il recupero di traiettorie attualmente non ricostruite, portando l'efficienza di SVT sulla singola traccia dal 75% del vecchio sistema all'80%. Il GigaFitter e` parte di SVT da Febbraio 2010 e ha completamente sostituito il vecchio sistema di Track Fitter++. Le ottime prestazioni dimostrate dal processore GigaFitter e il suo impiego con successo in CDF sono inoltre un'importante validazione in prospettiva della costruzione del processore FTK per l'esperimento ATLAS ad LHC

    Compiling Programs for Nonshared Memory Machines

    Nonshared-memory parallel computers promise scalable performance for scientific computing needs. Unfortunately, these machines are now difficult to program because the message-passing languages available for them do not reflect the computational models used in designing algorithms. This introduces a semantic gap in the programming process which is difficult for the programmer to fill. The purpose of this research is to show how nonshared-memory machines can be programmed at a higher level than is currently possible. We do this by developing techniques for compiling shared-memory programs for execution on those architectures. The heart of the compilation process is translating references to shared memory into explicit messages between processors. To do this, we first define a formal model for distribution data structures across processor memories. Several abstract results describing the messages needed to execute a program are immediately derived from this formalism. We then develop two distinct forms of analysis to translate these formulas into actual programs. Compile-time analysis is used when enough information is available to the compiler to completely characterize the data sent in the messages. This allows excellent code to be generated for a program. Run-time analysis produces code to examine data references while the program is running. This allows dynamic generation of messages and a correct implementation of the program. While the over-head of the run-time approach is higher than the compile-time approach, run-time analysis is applicable to any program. Performance data from an initial implementation show that both approaches are practical and produce code with acceptable efficiency
