13 research outputs found

    On the Parikh-de-Bruijn grid

    Full text link
    We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed-order Parikh vectors, and whose edges are given by a simple shift operation. This graph gives structural insight into the nature of sets of Parikh vectors as well as that of the Parikh set of a given string. We show its utility by proving some results on Parikh-de-Bruijn strings, the abelian analog of de-Bruijn sequences.Comment: 18 pages, 3 figures, 1 tabl

    Bubble-Flip---A New Generation Algorithm for Prefix Normal Words

    Full text link
    We present a new recursive generation algorithm for prefix normal words. These are binary strings with the property that no substring has more 1s than the prefix of the same length. The new algorithm uses two operations on binary strings, which exploit certain properties of prefix normal words in a smart way. We introduce infinite prefix normal words and show that one of the operations used by the algorithm, if applied repeatedly to extend the string, produces an ultimately periodic infinite word, which is prefix normal. Moreover, based on the original finite word, we can predict both the length and the density of an ultimate period of this infinite word.Comment: 30 pages, 3 figures, accepted in Theoret. Comp. Sc.. This is the journal version of the paper with the same title at LATA 2018 (12th International Conference on Language and Automata Theory and Applications, Tel Aviv, April 9-11, 2018

    Algorithms and Data Structures for Coding, Indexing, and Mining of Sequential Data

    Get PDF
    In recent years, the production of sequential data has been rapidly increasing. This requires solving challenging problems about how to represent information, how to retrieve information, and how to extract knowledge, from sequential data. These questions belong to the areas of coding, indexing, and mining, respectively. In this thesis, we investigate problems from those three areas. Coding refers to the way in which information is represented. Coding aims at generating optimal codes, that are codes having a minimum expected length. Codes can be generated for different purposes, from data compression to error detection/correction. The Lempel-Ziv 77 parsing produces an asymptotically optimal code in terms of compression. We study algorithms to efficiently decompress strings from the Lempel-Ziv 77 parsing, using memory proportional to the size of the parsing itself. We provide the first implementation of an algorithm by Bille et al., the only work we are aware of on this problem. We present a practical evaluation of this approach and several optimizations which improve the performance on all datasets we tested. Through the Ulam-R{'e}nyi game, it is possible to provide optimal adaptive error-correcting codes. The game consists of discovering an unknown mm-bit number by asking membership questions the answers to which can be erroneous. Questions are formulated knowing the answers to all previous ones. We want to find an optimal strategy, i.e., a strategy that can identify any mm-bit number using the theoretical minimum number of questions. We studied the case where questions are a union of up to a fixed number of intervals, and up to three answers can be erroneous. We first show that for any sufficiently large mm, there exists a strategy to identify an initially unknown mm-bit number which uses at most four intervals per question. We further refine our main tool to turn the above asymptotic result into a complete characterization of those instances of the Ulam-R{'e}nyi game that admit optimal strategies. Indexing refers to the way in which information is retrieved. An index for texts permits finding all occurrences of any substring, without traversing the whole text. Many applications require to look for approximate substrings. One of these is the problem of jumbled pattern matching, where two strings match if one is a permutation of the other. We study combinatorial aspects of prefix normal words, a class of binary words introduced in this context. These words can be used as indices for the Indexed Binary Jumbled Pattern Matching problem. We present a new recursive generation algorithm for prefix normal words that is competitive with the previous one but allows to list all prefix normal words sharing the same prefix. This sheds lights on novel insights that may help solving the problem of counting the number of prefix normal words of a given length. We then introduce infinite prefix normal words, and we show that one of the operations used by the algorithm, when repeatedly applied to extend a word, produces an infinite prefix normal word. This motivates the seeking for other operations that produce infinite prefix normal words. We found that one of these operations establishes a connection between prefix normal words and Sturmian words. We also explored the relationship between prefix normal words and Abelian complexity, as well as between prefix normal words and lexicographic order. Mining refers to the way in which information is converted into knowledge. The process of knowledge discovery covers several processing steps, including knowledge extraction. We analyze the problem of mining assertions for an embedded system from its simulation traces. This problem can be modeled as a pattern discovery problem on colored strings. We present two problems of pattern discovery on colored strings: patterns for one color only, or for all colors at the same time. We present two suffix tree-based algorithms. The first algorithm solves both the one color problem and the all colors problem. We then, introduce modifications which improve performance of the algorithm both on synthetic and on real data. We implemented and evaluated the proposed approaches, highlighting time trade-offs that can be obtained. A different way of knowledge extraction is based on the information-theoretic perspective of Pearl's model of causality. It has been postulated that the true causality direction between two phenomena A and B is related to the problem of finding the minimum entropy joint distribution between A and B. This problem is known to be NP-hard, and greedy algorithms have recently been proposed. We provide a novel analysis of one of the proposed heuristic showing that this algorithm guarantees an additive approximation of 1 bit. We then, provide a general criterion for guaranteeing an additive approximation factor of 1. This criterion may be of independent interest in other contexts where couplings are used

    The Telecommunications and Data Acquisition Report

    Get PDF
    Deep Space Network (DSN) progress in flight project support, tracking and data acquisition research and technology, network engineering, hardware and software implementation, and operation is discussed. In addition, developments in Earth-based radio technology as applied to geodynamics, astrophysics and the radio search for extraterrestrial intelligence are reported

    Theoretical and Numerical Approaches to Co-/Sparse Recovery in Discrete Tomography

    Get PDF
    We investigate theoretical and numerical results that guarantee the exact reconstruction of piecewise constant images from insufficient projections in Discrete Tomography. This is often the case in non-destructive quality inspection of industrial objects, made of few homogeneous materials, where fast scanning times do not allow for full sampling. As a consequence, this low number of projections presents us with an underdetermined linear system of equations. We restrict the solution space by requiring that solutions (a) must possess a sparse image gradient, and (b) have constrained pixel values. To that end, we develop an lower bound, using compressed sensing theory, on the number of measurements required to uniquely recover, by convex programming, an image in our constrained setting. We also develop a second bound, in the non-convex setting, whose novelty is to use the number of connected components when bounding the number of linear measurements for unique reconstruction. Having established theoretical lower bounds on the number of required measurements, we then examine several optimization models that enforce sparse gradients or restrict the image domain. We provide a novel convex relaxation that is provably tighter than existing models, assuming the target image to be gradient sparse and integer-valued. Given that the number of connected components in an image is critical for unique reconstruction, we provide an integer program model that restricts the maximum number of connected components in the reconstructed image. When solving the convex models, we view the image domain as a manifold and use tools from differential geometry and optimization on manifolds to develop a first-order multilevel optimization algorithm. The developed multilevel algorithm exhibits fast convergence and enables us to recover images of higher resolution

    Approximate text generation from non-hierarchical representations in a declarative framework

    Get PDF
    This thesis is on Natural Language Generation. It describes a linguistic realisation system that translates the semantic information encoded in a conceptual graph into an English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem where one is not pre-committed to a choice of the syntactically prominent elements in the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation — we use D-Tree Grammars which stem from work on Tree-Adjoining Grammars. The declarative specification of the mapping between semantics and syntax allows for different processing strategies to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful computations to be reused in other branches of the search space. Having a generator with increased paraphrasing power as a consequence of using non-hierarchical input and approximate matching raises the issue whether certain 'better' paraphrases can be generated before others. We investigate preference-based processing in the context of generation

    Algorithms for Integer Programming and Allocation

    Get PDF
    The first part of the thesis contains pseudo-polynomial algorithms for integer linear programs (ILP). When certain parameters of an ILP are fixed, that is, they are treated as constants in the running time, it is possible to obtain algorithms with a running time that is pseudo-polynomial in the entries of the ILP’s matrix. We present a tight pseudo-polynomial running time for ILPs with a constant number of constraints. Furthermore, we study an extension of this model to MILPs (linear programs that contain both fractional and integer variables). Then we move to n-fold ILPs, a class of ILPs with block structured matrices. We present the first algorithm for n-folds, which is near-linear in the dimensions of the ILP. The second part is about scheduling in non-identical machine models, more precisely, restricted allocation problems. Here a set of jobs has to be allocated to a set of machines. However, every job has a subset of machines and may only be assigned to a machine from this subset. We consider the objectives of minimizing the makespan or maximizing the minimum load. We study the integrality gap of a particularly strong linear programming relaxation, the configuration LP, for variations of this problem. The integrality gap can be seen as a measure of strength of an LP relaxation. A local search technique can be used to bound this value. However, the proofs are generally non-constructive, i.e., they do not give an efficient approximation algorithm right away. We derive better upper bounds on the integrality gap of the problems Restricted Assignment, Restricted Santa Claus, and Graph Balancing. Furthermore, we give the first (constructive) quasi-polynomial time approximation algorithm for Restricted Assignment with an approximation ratio strictly less than 2.Der erste Teil der Thesis umfasst pseudopolynomielle Algorithmen fĂŒr ganzzahlige lineare Programme (ILP). Wenn bestimmte Parameter eines ILPs fixiert sind, d.h. sie werden in der Laufzeit als Konstanten betrachtet, dann ist es möglich Algorithmen zu entwerfen, deren Laufzeit pseudopolynomiell in dem grĂ¶ĂŸten absoluten Wert eines Eintrags der Matrix des ILPs ist. Ein Ergebnis, das wir prĂ€sentieren, ist eine scharfe Schranke fĂŒr die pseudopolynomielle Laufzeit, die nötig ist um ein ILP mit konstant vielen Bedingungen zu lösen. Danach befassen wir uns mit n-fold ILPs, eine Klasse von ILPs, deren matrix eine Blockstruktur besitzt. Wir geben den ersten Algorithmus fĂŒr n-folds an, dessen Laufzeit gleichzeitig nahezu linear in der Dimension des ILPs ist. Der zweite Teil handelt von nicht-identischen (heterogenen) Maschinen Modellen, genauer gesagt restricted allocation problems. Hier soll eine Menge von Jobs auf eine Menge von Maschinen verteilt werden. Jeder Job darf aber nur auf bestimmte Maschinen zugewiesen werden. Wir betrachten als Zielfunktionen sowohl die Minimierung des Makespans als auch die Maximierung der minimalen Last einer Maschine. Wir untersuchen den integrality gap einer besonders starken LP Relaxierung, dem Konfigurations LP, fĂŒr Variationen dieses Problems. Der integrality gap kann als Maß fĂŒr die StĂ€rke einer LP Relaxierung gesehen werden. Über ein Argument mittels einer lokalen Suche wird dieser Wert beschrĂ€nkt. Jedoch sind die Beweise typischerweise nicht konstruktiv, d.h. sie implizieren nicht direkt effiziente Approximationsalgorithmen. Wir beweisen neue obere Schranken an den integrality gap fĂŒr die Probleme Restricted Assignment, Restricted Santa Claus und Graph Balancing. Desweiteren prĂ€sentieren wir den ersten (konstruktiven) Quasipolynomialzeit Approximationsalgorithmus fĂŒr das Restricted Assignment Problem mit Approximationsrate echt kleiner als 2

    Notes on the combinatorial fundamentals of algebra

    Full text link
    This is a detailed survey -- with rigorous and self-contained proofs -- of some of the basics of elementary combinatorics and algebra, including the properties of finite sums, binomial coefficients, permutations and determinants. It is entirely expository (and written to a large extent as a repository for folklore proofs); no new results (and few, if any, new proofs) appear.Comment: 1360 pages. v2 corrects typos and adds Exercises 6.62--6.64. Not a textbook; rather a repository of proofs I could cite. Posted here for easier referencing (and long-term archival). This project is tracked on https://github.com/darijgr/detnote
    corecore