277 research outputs found

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Fast Searching in Packed Strings

    Get PDF
    Given strings PP and QQ the (exact) string matching problem is to find all positions of substrings in QQ matching PP. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let mnm \leq n be the lengths PP and QQ, respectively, and let σ\sigma denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of PP in QQ. For m=o(n)m = o(n) this improves the O(n)O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m=O(n/logσn)m = O(n/\log_\sigma n) our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

    Efficient LZ78 factorization of grammar compressed text

    Full text link
    We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size nn representing a text SS of length NN, our algorithm computes the LZ78 factorization of TT in O(nN+mlogN)O(n\sqrt{N}+m\log N) time and O(nN+m)O(n\sqrt{N}+m) space, where mm is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nNn\sqrt{N} term in the time and space complexities becomes either nLnL, where LL is the length of the longest LZ78 factor, or (Nα)(N - \alpha) where α0\alpha \geq 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of SS of a certain length. Since m=O(N/logσN)m = O(N/\log_\sigma N) where σ\sigma is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ\sigma is constant, and can be more efficient when the text is compressible, i.e. when mm and nn are small.Comment: SPIRE 201

    Fingerprints in Compressed Strings

    Get PDF
    The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log log N) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(log N log l) and O(log l log log l + log log N) for SLPs and Linear SLPs, respectively. Here, l denotes the length of the LCE

    The Nexus of Political Violence and Economic Deprivation: Pakistani Migrants Disrupt the Refugee / Migrant Dichotomy

    Get PDF
    There have been discussions about how the labels “forced migrants,” related to political violence, and “voluntary migrants,” associated with economic factors, cannot be understood in categorical ways. However, there has been less focus on the specificities of the asylum-migrant nexus from the perspective of migrants. This essay discusses how such factors intersect as understood by Pakistani migrants residing in Germany. Through enacting a critical view of Pakistan, the migrants demonstrate how aspects of corruption, economic deprivation, and political violence come to intersect so that is becomes impossible to classify asylum seekers in binary/dichotomous ways

    Cryptosporidium Priming Is More Effective than Vaccine for Protection against Cryptosporidiosis in a Murine Protein Malnutrition Model

    Get PDF
    Cryptosporidium is a major cause of severe diarrhea, especially in malnourished children. Using a murine model of C. parvum oocyst challenge that recapitulates clinical features of severe cryptosporidiosis during malnutrition, we interrogated the effect of protein malnutrition (PM) on primary and secondary responses to C. parvum challenge, and tested the differential ability of mucosal priming strategies to overcome the PM-induced susceptibility. We determined that while PM fundamentally alters systemic and mucosal primary immune responses to Cryptosporidium, priming with C. parvum (106 oocysts) provides robust protective immunity against re-challenge despite ongoing PM. C. parvum priming restores mucosal Th1-type effectors (CD3+CD8+CD103+ T-cells) and cytokines (IFNγ, and IL12p40) that otherwise decrease with ongoing PM. Vaccination strategies with Cryptosporidium antigens expressed in the S. Typhi vector 908htr, however, do not enhance Th1-type responses to C. parvum challenge during PM, even though vaccination strongly boosts immunity in challenged fully nourished hosts. Remote non-specific exposures to the attenuated S. Typhi vector alone or the TLR9 agonist CpG ODN-1668 can partially attenuate C. parvum severity during PM, but neither as effectively as viable C. parvum priming. We conclude that although PM interferes with basal and vaccine-boosted immune responses to C. parvum, sustained reductions in disease severity are possible through mucosal activators of host defenses, and specifically C. parvum priming can elicit impressively robust Th1-type protective immunity despite ongoing protein malnutrition. These findings add insight into potential correlates of Cryptosporidium immunity and future vaccine strategies in malnourished children

    How generalist are these forest specialists? What Sweden's avian indicators indicate

    Get PDF
    Monitoring of forest biodiversity and habitats is an important part of forest conservation, but due to the impossible task of monitoring all species, indicator species are frequently used. However, reliance on an incorrect indicator of valuable habitat can reduce the efficiency of conservation efforts. Birds are often used as indicators as they are charismatic, relatively easy to survey, and because we often have knowledge of their habitat and resource requirements. In the Swedish government's environmental quality goals, there are a number of bird species identified as being associated with 'older' and 'high natural value' forests. Here we evaluate the occurrence of four of these indicator species using data from 91 production forest stands and 10 forest reserves in southern Sweden. The bird species assessed are willow tit Poecile montanus, coal tit Periparus ater, European crested tit Lophophanes cristatus and Eurasian treecreeper Certhia familiaris. For the production stands assessed, these indicator species exhibited no significant preferences regarding forest composition and structure, indicating a wider range of habitat associations than expected. These species frequently showed territorial behavior in forest stands <60 and even 40 years of age; much younger than the 120-year threshold for 'older forest' as defined by governmental environmental goals. As almost 80% of the production stands >= 10 years old included at least one of the four indicator species, this raises questions regarding the suitability of these species as indictors of forests of high conservational value in southern Sweden. Notably, besides the four species assessed here, none of the additional indicator taxa identified by the government, were recorded in the 10 reserves. This outcome may reflect the difficulties involved in finding bird indicator species indicative of high natural values in this region. Our results highlight the importance of coupling bird surveys with quantified assessments of proximate vegetation cover
    corecore