15,024 research outputs found

    Parameterized Strings: Algorithms and Data Structures

    Get PDF
    A parameterized string (p-string) T = T[1] T[2]...T[n] is a sophisticated string of length n composed of symbols from a constant alphabet Sigma and a parameter alphabet pi. Given a pair of p-strings S and T, the parameterized pattern matching (p-match) problem is to verify whether the individual constant symbols match and whether there exists a bijection between the parameter symbols of S and T. If the two conditions are met, S is said to be a p-match of T. A significant breakthrough in the p-match area is the prev encoding, which is proven to identify a p-match between S and T if and only if prev(S) == prev(T). In order to utilize suffix data structures in terms of p-matching, we must account for the dynamic nature of the parameterized suffixes (p-suffixes) of T, namely prev(T[ i...n]) ∀ i, 1 ≤ i ≤ n.;In this work, we propose transformative approaches to the direct parameterized suffix sorting (p-suffix sorting) problem by generating and sorting lexicographically numeric fingerprints and arithmetic codes that correspond to individual p-suffixes. Our algorithm to p-suffix sort via fingerprints is the first theoretical linear time algorithm for p-suffix sorting for non-binary parameter alphabets, which assumes that each code is represented by a practical integer. We eliminate the key problems of fingerprints by introducing an algorithm that exploits the ordering of arithmetic codes to sort p-suffixes in linear time on average.;The longest previous factor (LPF) problem is defined for traditional strings exclusively from the constant alphabet Sigma. We generalize the LPF problem to the parameterized longest previous factor (pLPF) problem defined for p-strings. Subsequently, we present a linear time solution to construct the pLPF array. Given our pLPF algorithm, we show how to construct the pLCP (parameterized longest common prefix) array in linear time. Our algorithm is further exploited to construct the standard LPF and LCP arrays all in linear time.;We then study the structural string (s-string), a variant of the p-string that extends the p-string alphabets to include complementary parameters that correspond to one another. The s-string problem involves the new encoding schemes sencode and compl in order to identify a structural match (s-match). Current s-match solutions use a structural suffix tree (s-suffix tree) to study structural matches in RNA sequences. We introduce the suffix array, LCP, and LPF data structures for the s-string encoding schemes. Using our new data structures, we identify the first suffix array solution to the s-match problem. Our algorithms and data structures are shown to apply to s-strings and also p-strings and traditional strings

    Parameterized Strings: Algorithms and Applications

    Get PDF
    The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols. A parameterized match (p-match) exists between two p-strings if the constants match exactly and there exists a bijection between the parameter symbols. Historically, p-strings have been employed in source code cloning, plagiarism detection, and structural similarity between biological sequences. By handling the intricacies of the parameterized suffix, we can efficiently address complex applications with data structures also reusable in traditional matching scenarios. In this dissertation, we extend data structures for p-strings (and variants) to address sophisticated string computations.;We introduce a taxonomy of classes for longest factor problems. Using this taxonomy, we show an interesting connection between the parameterized longest previous factor (pLPF) and familiar data structures in string theory, including the border array, prefix array, longest common prefix array, and analogous p-string data structures. Exploiting this connection, we construct a multitude of data structures using the same general pLPF framework.;Before this dissertation, the p-match was defined predominately by the matching between uncompressed p-strings. Here, we introduce the compressed parameterized pattern match to find all p-matches between a pattern and a text, using only the pattern and a compressed form of the text. We present parameterized compression (p-compression) as a new way to losslessly compress data to support p-matching. Experimentally, it is shown that p-compression is competitive with standard compression schemes. Using p-compression, we address the compressed p-match independent of the underlying compression routine.;Currently, p-string theory lacks the capability to support indeterminate symbols, a staple essential for applications involving inexact matching such as in music analysis. In this work, we propose and efficiently address two new types of p-matching with indeterminate symbols. (1) We introduce the indeterminate parameterized match (ip-match) to permit matching with indeterminate holes in a p-string. We support the ip-match by introducing data structures that extend the prefix array. (2) From a different perspective, the equivalence parameterized match (e-match) evolves the p-match to consider intra-alphabet symbol classes as equivalence classes. We propose a method to perform the e-match using the p-string suffix array framework, i.e. the parameterized suffix array (pSA) and parameterized longest common prefix array (pLCP). Historically, direct constructions of the pSA and pLCP have suffered from quadratic time bounds in the worst-case. Here, we introduce new p-string theory to efficiently construct the pSA/pLCP and break the theoretical worst-case time barrier.;Biological applications have become a classical use of p-string theory. Here, we introduce the structural border array to provide a lightweight solution to the biologically-oriented variant of the p-match, i.e. the structural match (s-match) on structural strings (s-strings). Following the s-match, we show how to use s-string suffix structures to support various pattern matching problems involving RNA secondary structures. Finally, we propose/construct the forward stem matrix (FSM), a data structure to access RNA stem structures, and we apply the FSM to the detection of hairpins and pseudoknots in an RNA sequence.;This dissertation advances the state-of-the-art in p-string theory by developing data structures for p-strings/s-strings and using p-string/s-string theory in new and old contexts to address various applications. Due to the flexibility of the p-string/s-string, the data structures and algorithms in this work are also applicable to the myriad of problems in the string community that involve traditional strings

    Polynomial fixed-parameter algorithms : a case study for longest path on interval graphs.

    Get PDF
    We study the design of fixed-parameter algorithms for problems already known to be solvable in polynomial time. The main motivation is to get more efficient algorithms for problems with unattractive polynomial running times. Here, we focus on a fundamental graph problem: Longest Path; it is NP-hard in general but known to be solvable in O(n^4) time on n-vertex interval graphs. We show how to solve Longest Path on Interval Graphs, parameterized by vertex deletion number k to proper interval graphs, in O(k^9n) time. Notably, Longest Path is trivially solvable in linear time on proper interval graphs, and the parameter value k can be approximated up to a factor of 4 in linear time. From a more general perspective, we believe that using parameterized complexity analysis for polynomial-time solvable problems offers a very fertile ground for future studies for all sorts of algorithmic problems. It may enable a refined understanding of efficiency aspects for polynomial-time solvable problems, similarly to what classical parameterized complexity analysis does for NP-hard problems

    Lossy Kernelization

    Get PDF
    In this paper we propose a new framework for analyzing the performance of preprocessing algorithms. Our framework builds on the notion of kernelization from parameterized complexity. However, as opposed to the original notion of kernelization, our definitions combine well with approximation algorithms and heuristics. The key new definition is that of a polynomial size α\alpha-approximate kernel. Loosely speaking, a polynomial size α\alpha-approximate kernel is a polynomial time pre-processing algorithm that takes as input an instance (I,k)(I,k) to a parameterized problem, and outputs another instance (I′,k′)(I',k') to the same problem, such that ∣I′∣+k′≤kO(1)|I'|+k' \leq k^{O(1)}. Additionally, for every c≥1c \geq 1, a cc-approximate solution s′s' to the pre-processed instance (I′,k′)(I',k') can be turned in polynomial time into a (c⋅α)(c \cdot \alpha)-approximate solution ss to the original instance (I,k)(I,k). Our main technical contribution are α\alpha-approximate kernels of polynomial size for three problems, namely Connected Vertex Cover, Disjoint Cycle Packing and Disjoint Factors. These problems are known not to admit any polynomial size kernels unless NP⊆coNP/polyNP \subseteq coNP/poly. Our approximate kernels simultaneously beat both the lower bounds on the (normal) kernel size, and the hardness of approximation lower bounds for all three problems. On the negative side we prove that Longest Path parameterized by the length of the path and Set Cover parameterized by the universe size do not admit even an α\alpha-approximate kernel of polynomial size, for any α≥1\alpha \geq 1, unless NP⊆coNP/polyNP \subseteq coNP/poly. In order to prove this lower bound we need to combine in a non-trivial way the techniques used for showing kernelization lower bounds with the methods for showing hardness of approximationComment: 58 pages. Version 2 contain new results: PSAKS for Cycle Packing and approximate kernel lower bounds for Set Cover and Hitting Set parameterized by universe siz

    Parameterized and approximation complexity of the detection pair problem in graphs

    Full text link
    We study the complexity of the problem DETECTION PAIR. A detection pair of a graph GG is a pair (W,L)(W,L) of sets of detectors with W⊆V(G)W\subseteq V(G), the watchers, and L⊆V(G)L\subseteq V(G), the listeners, such that for every pair u,vu,v of vertices that are not dominated by a watcher of WW, there is a listener of LL whose distances to uu and to vv are different. The goal is to minimize ∣W∣+∣L∣|W|+|L|. This problem generalizes the two classic problems DOMINATING SET and METRIC DIMENSION, that correspond to the restrictions L=∅L=\emptyset and W=∅W=\emptyset, respectively. DETECTION PAIR was recently introduced by Finbow, Hartnell and Young [A. S. Finbow, B. L. Hartnell and J. R. Young. The complexity of monitoring a network with both watchers and listeners. Manuscript, 2015], who proved it to be NP-complete on trees, a surprising result given that both DOMINATING SET and METRIC DIMENSION are known to be linear-time solvable on trees. It follows from an existing reduction by Hartung and Nichterlein for METRIC DIMENSION that even on bipartite subcubic graphs of arbitrarily large girth, DETECTION PAIR is NP-hard to approximate within a sub-logarithmic factor and W[2]-hard (when parameterized by solution size). We show, using a reduction to SET COVER, that DETECTION PAIR is approximable within a factor logarithmic in the number of vertices of the input graph. Our two main results are a linear-time 22-approximation algorithm and an FPT algorithm for DETECTION PAIR on trees.Comment: 13 page

    A Logical Framework for Reputation Systems

    No full text
    Reputation systems are meta systems that record, aggregate and distribute information about the past behaviour of principals in an application. Typically, these applications are large-scale open distributed systems where principals are virtually anonymous, and (a priori) have no knowledge about the trustworthiness of each other. Reputation systems serve two primary purposes: helping principals decide whom to trust, and providing an incentive for principals to well-behave. A logical policy-based framework for reputation systems is presented. In the framework, principals specify policies which state precise requirements on the past behaviour of other principals that must be fulfilled in order for interaction to take place. The framework consists of a formal model of behaviour, based on event structures; a declarative logical language for specifying properties of past behaviour; and efficient dynamic algorithms for checking whether a particular behaviour satisfies a property from the language. It is shown how the framework can be extended in several ways, most notably to encompass parameterized events and quantification over parameters. In an extended application, it is illustrated how the framework can be applied for dynamic history-based access control for safe execution of unknown and untrusted programs
    • …
    corecore