213 research outputs found

    On the maximal sum of exponents of runs in a string

    Get PDF
    A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition vv with a period pp such that 2pv2p \le |v|. The exponent of a run is defined as v/p|v|/p and is 2\ge 2. We show new bounds on the maximal sum of exponents of runs in a string of length nn. Our upper bound of 4.1n4.1n is better than the best previously known proven bound of 5.6n5.6n by Crochemore & Ilie (2008). The lower bound of 2.035n2.035n, obtained using a family of binary words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the maximal sum of exponents of runs in a string of length nn is smaller than 2n2nComment: 7 pages, 1 figur

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

    Get PDF
    Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an O(n(logn)2/3)O(n (\log n)^{2/3})-time algorithm for answering O(n)O(n) LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to O(nloglogn)O(n \log \log n) time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any nn such non-crossing queries can be answered on-line in O(nα(n))O(n \alpha(n)) time, which yields an O(nα(n))O(n \alpha(n))-time algorithm for computing runs

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    Fingerprints in Compressed Strings

    Get PDF
    The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log log N) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(log N log l) and O(log l log log l + log log N) for SLPs and Linear SLPs, respectively. Here, l denotes the length of the LCE

    Willow short-rotation production systems in Canada and Northern United States: A review

    Get PDF
    Willow short rotation coppice (SRC) systems are becoming an attractive practice because they are a sustainable system fulfilling multiple ecological objectives with significant environmental benefits. A sustainable supply of bioenergy feedstock can be produced by willow on marginal land using well-adapted or tolerant cultivars. Across Canada and northern U.S.A., there are millions of hectares of available degraded land that have the potential for willow SRC biomass production, with a C sequestration potential capable of offsetting appreciable amount of anthropogenic green-house gas emissions. A fundamental question concerning 1 sustainable SRC willow yields was whether long-term soil productivity is maintained within a multi-rotation SRC system, given the rapid growth rate and associated nutrient exports offsite when harvesting the willow biomass after repeated short rotations. Based on early results from the first willow SRC rotation, it was found willow systems are relatively low nutrient-demanding, with minimal nutrient output other than in harvested biomass. The overall aim of this manuscript is to summarize the literature and present findings and data from ongoing research trials across Canada and northern U.S.A. examining willow SRC system establishment and viability. The research areas of interest presented here are the crop production of willow SRC systems, above- and below-ground biomass dynamics and the C budget, comprehensive soil-willow system nutrient budget, and soil nutrient amendments (via fertilization) in willow SRC systems. Areas of existing research gaps were also identified for the Canadian context

    Tailoring r-index for Document Listing Towards Metagenomics Applications

    Get PDF
    A basic problem in metagenomics is to assign a sequenced read to the correct species in the reference collection. In typical applications in genomic epidemiology and viral metagenomics the reference collection consists of a set of species with each species represented by its highly similar strains. It has been recently shown that accurate read assignment can be achieved with k-mer hashing-based pseudoalignment: a read is assigned to species A if each of its k-mer hits to a reference collection is located only on strains of A. We study the underlying primitives required in pseudoalignment and related tasks. We propose three space-efficient solutions building upon the document listing with frequencies problem. All the solutions use an r-index (Gagie et al., SODA 2018) as an underlying index structure for the text obtained as concatenation of the set of species, as well as for each species. Given t species whose concatenation length is n, and whose Burrows-Wheeler transform contains r runs, our first solution, based on a grammar-compressed document array with precomputed queries at non terminal symbols, reports the frequencies for the distinct documents in which the pattern of length m occurs in time. Our second solution is also based on a grammar-compressed document array, but enhanced with bitvectors and reports the frequencies in time, over a machine with wordsize w. Our third solution, based on the interleaved LCP array, answers the same query in time. We implemented our solutions and tested them on real-world and synthetic datasets. The results show that all the solutions are fast on highly-repetitive data, and the size overhead introduced by the indexes are comparable with the size of the r-index.Peer reviewe

    Fish Oil Supplementation During Late Pregnancy Does Not Influence Plasma Lipids or Lipoprotein Levels in Young Adult Offspring

    Get PDF
    Nutritional influences on cardiovascular disease operate throughout life. Studies in both experimental animals and humans have suggested that changes in the peri- and early post-natal nutrition can affect the development of the various components of the metabolic syndrome in adult life. This has lead to the hypothesis that n-3 fatty acid supplementation in pregnancy may have a beneficial effect on lipid profile in the offspring. The aim of the present study was to investigate the effect of supplementation with n-3 fatty acids during the third trimester of pregnancy on lipids and lipoproteins in the 19-year-old offspring. The study was based on the follow-up of a randomized controlled trial from 1990 where 533 pregnant women were randomized to fish oil (n = 266), olive oil (n = 136) or no oil (n = 131). In 2009, the offspring were invited to a physical examination including blood sampling. A total of 243 of the offspring participated. Lipid values did not differ between the fish oil and olive oil groups. The relative adjusted difference (95% confidence intervals) in lipid concentrations was −3% (−11; 7) for LDL cholesterol, 3% (−3; 10) for HDL cholesterol, −1% (−6; 5) for total cholesterol,−4% (−16; 10) for TAG concentrations, 2%(−2; 7) for apolipoprotein A1, −1% (−9; 7) for apolipoprotein B and 3% (−7; 15) in relative abundance of small dense LDL. In conclusion, there was no effect of fish oil supplementation during the third trimester of pregnancy on offspring plasma lipids and lipoproteins in adolescence

    Knee complaints and consequences on work status; a 10-year follow-up survey among floor layers and graphic designers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The purpose of the study was to examine if knee complaints among floor layers predict exclusion from the trade.</p> <p>Methods</p> <p>In 1994/95 self-reported data were obtained from a cohort of floor layers and graphic designers with and without knee straining work activities, respectively. At follow-up in 2005 the questionnaire survey was repeated. The study population consisted of 81 floor layers and 173 graphic designers who were presently working in their trades at baseline (1995). All participants were men aged 36–70 years in 2005.</p> <p>We computed the risk of losing gainful employment in the trade according to occurrence of knee complaints at baseline, using Cox proportional hazard regression adjusted for a number of potential confounding variables. Moreover, the crude and adjusted odds risk ratio for knee complaints according to status of employment in the trade were computed, using graphic designers as reference.</p> <p>Results</p> <p>A positive but non-significant association between knee complaints lasting more than 30 days the past 12 months and exclusion from the trade was found among floor layers (Hazard Ratio = 1.4, 95% CI = 0.6–3.5).</p> <p>The frequency of self-reported knee complaints was lower among floor layers presently at work in the trade in year 2005 (26.3%) compared with baseline in 1995 (41.1%), while the opposite tendency was seen among graphic designers (20.7% vs. 10.7%).</p> <p>Conclusion</p> <p>The study suggests that knee complaints are a risk factor for premature exclusion from a knee demanding trade. However, low power of the study precludes strong conclusions. The study also indicates a healthy worker effect among floor layers and a survivor effect among graphic designers.</p
    corecore