2,998 research outputs found

    New Algorithms for Position Heaps

    Full text link
    We present several results about position heaps, a relatively new alternative to suffix trees and suffix arrays. First, we show that, if we limit the maximum length of patterns to be sought, then we can also limit the height of the heap and reduce the worst-case cost of insertions and deletions. Second, we show how to build a position heap in linear time independent of the size of the alphabet. Third, we show how to augment a position heap such that it supports access to the corresponding suffix array, and vice versa. Fourth, we introduce a variant of a position heap that can be simulated efficiently by a compressed suffix array with a linear number of extra bits

    Efficient Seeds Computation Revisited

    Get PDF
    The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions --- computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in O(n2)O(n^2) time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an O(nlog(n/m))O(n\log{(n/m)}) time algorithm checking if the shortest seed has length at least mm and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm (Iliopoulos et al., 1996).Comment: 14 pages, accepted to CPM 201

    Fast Label Extraction in the CDAWG

    Full text link
    The compact directed acyclic word graph (CDAWG) of a string TT of length nn takes space proportional just to the number ee of right extensions of the maximal repeats of TT, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which ee grows significantly more slowly than nn. We reduce from O(mloglogn)O(m\log{\log{n}}) to O(m)O(m) the time needed to count the number of occurrences of a pattern of length mm, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(mloglogn+occ)O(m\log{\log{n}}+\mathtt{occ}) to O(m+occ)O(m+\mathtt{occ}) in the time needed to locate all the occ\mathtt{occ} occurrences of the pattern. We also reduce from O(kloglogn)O(k\log{\log{n}}) to O(k)O(k) the time needed to read the kk characters of the label of an edge of the suffix tree of TT, and we reduce from O(mloglogn)O(m\log{\log{n}}) to O(m)O(m) the time needed to compute the matching statistics between a query of length mm and TT, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864

    Efficient LZ78 factorization of grammar compressed text

    Full text link
    We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size nn representing a text SS of length NN, our algorithm computes the LZ78 factorization of TT in O(nN+mlogN)O(n\sqrt{N}+m\log N) time and O(nN+m)O(n\sqrt{N}+m) space, where mm is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nNn\sqrt{N} term in the time and space complexities becomes either nLnL, where LL is the length of the longest LZ78 factor, or (Nα)(N - \alpha) where α0\alpha \geq 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of SS of a certain length. Since m=O(N/logσN)m = O(N/\log_\sigma N) where σ\sigma is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ\sigma is constant, and can be more efficient when the text is compressible, i.e. when mm and nn are small.Comment: SPIRE 201

    One-variable word equations in linear time

    Full text link
    In this paper we consider word equations with one variable (and arbitrary many appearances of it). A recent technique of recompression, which is applicable to general word equations, is shown to be suitable also in this case. While in general case it is non-deterministic, it determinises in case of one variable and the obtained running time is O(n + #_X log n), where #_X is the number of appearances of the variable in the equation. This matches the previously-best algorithm due to D\k{a}browski and Plandowski. Then, using a couple of heuristics as well as more detailed time analysis the running time is lowered to O(n) in RAM model. Unfortunately no new properties of solutions are shown.Comment: submitted to a journal, general overhaul over the previous versio

    The effects of the spontaneous presence of a spouse/partner and others on cardiovascular reactions to an acute psychological challenge

    Get PDF
    The presence of supportive others has been associated with attenuated cardiovascular reactivity in the laboratory. The effects of the presence of a spouse and others in a more naturalistic setting have received little attention. Blood pressure and heart rate reactions to mental stress were recorded at home in 1028 married/partnered individuals. For 112 participants, their spouse/partner was present; for 78, at least one other person was present. Women tested with a spouse/partner present showed lower magnitude systolic blood pressure and heart rate reactivity than those tested without. Individuals tested with at least one nonspousal other present also displayed attenuated reactivity. This extends the results of laboratory studies and indicates that the spontaneous presence of others is associated with a reduction in cardiovascular reactivity in an everyday environment; spouse/partner presence would appear to be especially effective for women.\ud \u

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlogNlogwocc)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlogw+mlogNocc)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlogN)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

    Challenges and Opportunities: What Can We Learn from Patients Living with Chronic Musculoskeletal Conditions, Health Professionals and Carers about the Concept of Health Literacy Using Qualitative Methods of Inquiry?

    Get PDF
    The field of health literacy continues to evolve and concern public health researchers and yet remains a largely overlooked concept elsewhere in the healthcare system. We conducted focus group discussions in England UK, about the concept of health literacy with older patients with chronic musculoskeletal conditions (mean age = 73.4 years), carers and health professionals. Our research posed methodological, intellectual and practical challenges. Gaps in conceptualisation and expectations were revealed, reiterating deficiencies in predominant models for understanding health literacy and methodological shortcomings of using focus groups in qualitative research for this topic. Building on this unique insight into what the concept of health literacy meant to participants, we present analysis of our findings on factors perceived to foster and inhibit health literacy and on the issue of responsibility in health literacy. Patients saw health literacy as a result of an inconsistent interactive process and the implications as wide ranging; healthcare professionals had more heterogeneous views. All focus group discussants agreed that health literacy most benefited from good inter-personal communication and partnership. By proposing a needs-based approach to health literacy we offer an alternative way of conceptualising health literacy to help improve the health of older people with chronic conditions
    corecore