17 research outputs found
Circular pattern matching with k mismatches
The k-mismatch problem consists in computing the Hamming distance between a pattern P of length m and every length-m substring of a text T of length n, if this distance is no more than k. In many real-world applications, any cyclic shift of P is a relevant pattern, and thus one is interested in computing the minimal distance of every length-m substring of T and any cyclic shift of P. This is the circular pattern m
Circular pattern matching with k mismatches
We consider the circular pattern matching with k mismatches (k-CPM) problem in which one is to compute the minimal Hamming distance of every length-m substring of T and any cyclic rotation of P, if this distance is no more than k. It is a variation of the well-studied k-mismatch problem. A multitude of papers has been devoted
Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve
Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size of data that originally has size , and we want to solve a problem with time complexity . The naive strategy of "decompress-and-solve" gives time , whereas "the gold standard" is time : to analyze the compression as efficiently as if the original data was small. We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design. We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: - The bound for LCS and the bound for Pattern Matching with Wildcards are optimal up to factors, under the Strong Exponential Time Hypothesis. (Here, denotes the uncompressed length of the compressed pattern.) - Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the -Clique conjecture. - We give an algorithm showing that decompress-and-solve is not optimal for Disjointness
Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve
Can we analyze data without decompressing it? As our data keeps growing,
understanding the time complexity of problems on compressed inputs, rather than
in convenient uncompressed forms, becomes more and more relevant. Suppose we
are given a compression of size of data that originally has size , and
we want to solve a problem with time complexity . The naive strategy
of "decompress-and-solve" gives time , whereas "the gold standard" is
time : to analyze the compression as efficiently as if the original data
was small.
We restrict our attention to data in the form of a string (text, files,
genomes, etc.) and study the most ubiquitous tasks. While the challenge might
seem to depend heavily on the specific compression scheme, most methods of
practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be
unified under the elegant notion of Grammar Compressions. A vast literature,
across many disciplines, established this as an influential notion for
Algorithm design.
We introduce a framework for proving (conditional) lower bounds in this
field, allowing us to assess whether decompress-and-solve can be improved, and
by how much. Our main results are:
- The bound for LCS and the
bound for Pattern Matching with Wildcards are optimal up to factors,
under the Strong Exponential Time Hypothesis. (Here, denotes the
uncompressed length of the compressed pattern.)
- Decompress-and-solve is essentially optimal for Context-Free Grammar
Parsing and RNA Folding, under the -Clique conjecture.
- We give an algorithm showing that decompress-and-solve is not optimal for
Disjointness.Comment: Presented at FOCS'17. Full version. 63 page
Faster Approximate Pattern Matching: {A} Unified Approach
Approximate pattern matching is a natural and well-studied problem on strings: Given a text , a pattern , and a threshold , find (the starting positions of) all substrings of that are at distance at most from . We consider the two most fundamental string metrics: the Hamming distance and the edit distance. Under the Hamming distance, we search for substrings of that have at most mismatches with , while under the edit distance, we search for substrings of that can be transformed to with at most edits. Exact occurrences of in have a very simple structure: If we assume for simplicity that and trim so that occurs both as a prefix and as a suffix of , then both and are periodic with a common period. However, an analogous characterization for the structure of occurrences with up to mismatches was proved only recently by Bringmann et al. [SODA'19]: Either there are -mismatch occurrences of in , or both and are at Hamming distance from strings with a common period . We tighten this characterization by showing that there are -mismatch occurrences in the case when the pattern is not (approximately) periodic, and we lift it to the edit distance setting, where we tightly bound the number of -edit occurrences by in the non-periodic case. Our proofs are constructive and let us obtain a unified framework for approximate pattern matching for both considered distances. We showcase the generality of our framework with results for the fully-compressed setting (where and are given as a straight-line program) and for the dynamic setting (where we extend a data structure of Gawrychowski et al. [SODA'18])
Approximating Properties of Data Streams
In this dissertation, we present algorithms that approximate properties in the data stream model, where elements of an underlying data set arrive sequentially, but algorithms must use space sublinear in the size of the underlying data set. We first study the problem of finding all k-periods of a length-n string S, presented as a data stream. S is said to have k-period p if its prefix of length n − p differs from its suffix of length n − p in at most k locations. We give algorithms to compute the k-periods of a string S using poly(k, log n) bits of space and we complement these results with comparable lower bounds. We then study the problem of identifying a longest substring of strings S and T of length n that forms a d-near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings S and T are streamed at the same time and form a d-near-alignment if the distance between them in some given metric is at most d. We give several algorithms, including an exact one-pass algorithm that uses O(d2 + d log n) bits of space. We then consider the distinct elements and `p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and `p-heavy hitters that is nearly optimal in both n and c. Finally, we consider the problem of estimating the maximum weighted matching of a graph whose edges are revealed in a streaming fashion. We develop a reduction from the maximum weighted matching problem to the maximum cardinality matching problem that only doubles the approximation factor of a streaming algorithm developed for the maximum cardinality matching problem. As an application, we obtain an estimator for the weight of a maximum weighted matching in bounded-arboricity graphs and in particular, a (48 + )-approximation estimator for the weight of a maximum weighted matching in planar graphs
Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)
The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbr¨ucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), W¨urzburg (1993), Caen (1994), M¨unchen (1995), Grenoble (1996), L¨ubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..
Técnicas de compresión de imágenes hiperespectrales sobre hardware reconfigurable
Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 18-12-2020Sensors are nowadays in all aspects of human life. When possible, sensors are used remotely. This is less intrusive, avoids interferces in the measuring process, and more convenient for the scientist. One of the most recurrent concerns in the last decades has been sustainability of the planet, and how the changes it is facing can be monitored. Remote sensing of the earth has seen an explosion in activity, with satellites now being launched on a weekly basis to perform remote analysis of the earth, and planes surveying vast areas for closer analysis...Los sensores aparecen hoy en día en todos los aspectos de nuestra vida. Cuando es posible, de manera remota. Esto es menos intrusivo, evita interferencias en el proceso de medida, y además facilita el trabajo científico. Una de las preocupaciones recurrentes en las últimas décadas ha sido la sotenibilidad del planeta, y cómo menitoirzar los cambios a los que se enfrenta. Los estudios remotos de la tierra han visto un gran crecimiento, con satélites lanzados semanalmente para analizar la superficie, y aviones sobrevolando grades áreas para análisis más precisos...Fac. de InformáticaTRUEunpu