Search CORE

213 research outputs found

On the maximal sum of exponents of runs in a string

Author: D. Gusfield
F. Franek
J. Berstel
J. Simpson
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Giraud
M. Lothaire
R.M. Kolpakov
S.J. Puglisi
W. Rytter
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/03/2010
Field of study

A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition

v

with a period

p

such that

2p \le |v|

. The exponent of a run is defined as

|v|/p

and is

\ge 2

. We show new bounds on the maximal sum of exponents of runs in a string of length

n

. Our upper bound of

4.1n

is better than the best previously known proven bound of

5.6n

by Crochemore & Ilie (2008). The lower bound of

2.035n

, obtained using a family of binary words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the maximal sum of exponents of runs in a string of length

n

is smaller than

2n

Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Elsevier - Publisher Connector

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Online Pattern Matching for String Edit Distance with Moves

Author: D. Shapira
G. Navarro
J. Kececioglu
R. Clifford
S. Maruyama
V. Bafna
V.I. Levenshtein
W. Rytter
Publication venue
Publication date: 01/01/2014
Field of study

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

arXiv.org e-Print Archive

Crossref

Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

Author: C Hohlweg
CSJA Nash-Williams
D Kosolobov
GS Brodal
H Barcelo
J Fischer
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Giraud
SJ Puglisi
W Rytter
W Rytter
Publication venue
Publication date: 01/01/2016
Field of study

Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an

O(n (\log n)^{2/3})

-time algorithm for answering

O(n)

LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to

O(n \log \log n)

time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any

n

such non-crossing queries can be answered on-line in

O(n \alpha(n))

time, which yields an

O(n \alpha(n))

-time algorithm for computing runs

arXiv.org e-Print Archive

Crossref

King's Research Portal

Hal-Diderot

HAL - UPEC / UPEM

Composite repetition-aware data structures

Author: A Blumer
A Lempel
D Arroyuelo
D Belazzougui
DE Willard
J Radoszewski
J Sirén
J Ziv
M Crochemore
M Crochemore
M Raffinot
P Ferragina
S Kreft
T Gagie
V Mäkinen
V Mäkinen
W Rytter
Publication venue
Publication date: 01/01/2015
Field of study

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Physical engineering of an island-braided river by two riparian tree species: Evidence from aerial images and airborne lidar

Author: Houston Durrant T.
Naiman R. J.
Rigo D.
Rytter L.
Thompson K.
Wilson S. M.
Publication venue: 'Wiley'
Publication date: 01/09/2020
Field of study

Crossref

Queen Mary Research Online

Fingerprints in Compressed Strings

Author: A. Amir
D. Harel
D. Willard
F. Claude
G. Cormode
J. Ziv
J. Ziv
K. Mehlhorn
L. Gąsieniec
M. Bender
M. Charikar
M. Farach
O. Berkman
P. Bille
P. Emde Boas van
P.F. Dietz
R. Cole
R.M. Karp
S. Alstrup
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log log N) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(log N log l) and O(log l log log l + log log N) for SLPs and Linear SLPs, respectively. Here, l denotes the length of the LCE

arXiv.org e-Print Archive

CiteSeerX

Crossref

Warwick Research Archives Portal Repository

Online Research Database In Technology

Willow short-rotation production systems in Canada and Northern United States: A review

Author: Abrahamson
Abrahamson
Achat
Adegbidi
Adegbidi
Adegbidi
Alriksson
Amichev
Amichev
Arevalo
Armstrong
Aylott
Ballard
Ballard
Bell
Beyhan Y. Amichev
Block
Boehmel
Bollmark
Brundrett
Bullard
Burke
Caputo
Cardinael
Christersson
Christersson
Christersson
Christersson
Christine N. Stadnyk
Clinch
Corredor
Corredor
Corseuil
Dickmann
Ens
Ens
Environment Canada National Climate Data and Information Archive
Ericsson
Ericsson
Ferm
Girouard
Greer
Grigal
Grogan
Gruenewald
Guidi
Gunderson
Hangs
Hangs
Hangs
Hangs
Hansen
Hasselgren
Heller
Heller
Hendrick
Hofmann-Schielle
Hoogwijk
Hytönen
Hytönen
Hytönen
Ingestad
Jackson
Jeff J. Schoenau
Johnson
Judicaël Moukoumi
Karrenberg
Keller
Ken C.J. Van Rees
Keoleian
Kering
Kiernan
Kopp
Kowalik
Kummerow
Labrecque
Lal
Lemus
Leuschner
Liedgens
Maynard
Mele
Mitchell
Mola-Yudego
Mosseler
Moukoumi
Nadelhoffer
Nicolas Bélanger
Nicoullaud
Norby
Norby
Ostonen
Pacaldo
Pacaldo
Puttsepp
Quaye
Quaye
Rockwood
Ryan D. Hangs
Rytter
Rytter
Rytter
Rytter
Rytter
Sanchez
Scholz
Sennerby-Forsse
Sheala M. Konecsni
Singh
Smith
Soil Classification Working Group
Soil Landscapes of Canada Working Group
Steele
Stolarski
Swamy
Tharakan
Timothy A. Volk
Vargas
Verwijst
Vladimir Vujanovic
Volk
Volk
Volk
Volk
Vujanovic
Weih
Weih
Weih
Zalesny
Zan
Publication venue: 'Soil Science Society of America'
Publication date: 01/07/2014
Field of study

Willow short rotation coppice (SRC) systems are becoming an attractive practice because they are a sustainable system fulfilling multiple ecological objectives with significant environmental benefits. A sustainable supply of bioenergy feedstock can be produced by willow on marginal land using well-adapted or tolerant cultivars. Across Canada and northern U.S.A., there are millions of hectares of available degraded land that have the potential for willow SRC biomass production, with a C sequestration potential capable of offsetting appreciable amount of anthropogenic green-house gas emissions. A fundamental question concerning 1 sustainable SRC willow yields was whether long-term soil productivity is maintained within a multi-rotation SRC system, given the rapid growth rate and associated nutrient exports offsite when harvesting the willow biomass after repeated short rotations. Based on early results from the first willow SRC rotation, it was found willow systems are relatively low nutrient-demanding, with minimal nutrient output other than in harvested biomass. The overall aim of this manuscript is to summarize the literature and present findings and data from ongoing research trials across Canada and northern U.S.A. examining willow SRC system establishment and viability. The research areas of interest presented here are the crop production of willow SRC systems, above- and below-ground biomass dynamics and the C budget, comprehensive soil-willow system nutrient budget, and soil nutrient amendments (via fertilization) in willow SRC systems. Areas of existing research gaps were also identified for the Canadian context

R-libre

Crossref

Tailoring r-index for Document Listing Towards Metagenomics Applications

Author: A Jez
D Belazzougui
D Carroll
D Cobas
DE Wood
DH Huson
F Claude
G Navarro
G Navarro
G Navarro
J Fischer
K Sadakane
L Schaeffer
M Charikar
ML Fredman
MS Lindner
N Välimäki
NL Bray
S Gog
T Gagie
T Gagie
T Gagie
T Gagie
U Manber
V Mäkinen
W Rytter
Z Iqbal
Publication venue: Springer Nature Switzerland AG
Publication date: 01/01/2020
Field of study

A basic problem in metagenomics is to assign a sequenced read to the correct species in the reference collection. In typical applications in genomic epidemiology and viral metagenomics the reference collection consists of a set of species with each species represented by its highly similar strains. It has been recently shown that accurate read assignment can be achieved with k-mer hashing-based pseudoalignment: a read is assigned to species A if each of its k-mer hits to a reference collection is located only on strains of A. We study the underlying primitives required in pseudoalignment and related tasks. We propose three space-efficient solutions building upon the document listing with frequencies problem. All the solutions use an r-index (Gagie et al., SODA 2018) as an underlying index structure for the text obtained as concatenation of the set of species, as well as for each species. Given t species whose concatenation length is n, and whose Burrows-Wheeler transform contains r runs, our first solution, based on a grammar-compressed document array with precomputed queries at non terminal symbols, reports the frequencies for the distinct documents in which the pattern of length m occurs in time. Our second solution is also based on a grammar-compressed document array, but enhanced with bitvectors and reports the frequencies in time, over a machine with wordsize w. Our third solution, based on the interleaved LCP array, answers the same query in time. We implemented our solutions and tested them on real-world and synthetic datasets. The results show that all the solutions are fast on highly-repetitive data, and the size overhead introduced by the indexes are comparable with the size of the r-index.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Fish Oil Supplementation During Late Pregnancy Does Not Influence Plasma Lipids or Lipoprotein Levels in Young Adult Offspring

Author: A Singhal
A Singhal
AE Barden
AE Di
AP Toft-Petersen
BA Griffin
Bodil H. Bech
C Adams
CG Owen
CT Damsgaard
D Rytter
DJ Barker
DJ Barker
Dorte Rytter
Erik B. Schmidt
GS Berenson
I Thorsdottir
IF Gazi
IG Davies
J Baird
Jeppe H. Christensen
JG Ayer
KE Harchaoui
MJ Stampfer
QM Nguyen
SA Stanner
SF Olsen
SF Olsen
Sjurdur F. Olsen
Tine B. Henriksen
TJ Roseboom
WJ Elliott
WW Wong
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

Nutritional influences on cardiovascular disease operate throughout life. Studies in both experimental animals and humans have suggested that changes in the peri- and early post-natal nutrition can affect the development of the various components of the metabolic syndrome in adult life. This has lead to the hypothesis that n-3 fatty acid supplementation in pregnancy may have a beneficial effect on lipid profile in the offspring. The aim of the present study was to investigate the effect of supplementation with n-3 fatty acids during the third trimester of pregnancy on lipids and lipoproteins in the 19-year-old offspring. The study was based on the follow-up of a randomized controlled trial from 1990 where 533 pregnant women were randomized to fish oil (n = 266), olive oil (n = 136) or no oil (n = 131). In 2009, the offspring were invited to a physical examination including blood sampling. A total of 243 of the offspring participated. Lipid values did not differ between the fish oil and olive oil groups. The relative adjusted difference (95% confidence intervals) in lipid concentrations was −3% (−11; 7) for LDL cholesterol, 3% (−3; 10) for HDL cholesterol, −1% (−6; 5) for total cholesterol,−4% (−16; 10) for TAG concentrations, 2%(−2; 7) for apolipoprotein A1, −1% (−9; 7) for apolipoprotein B and 3% (−7; 15) in relative abundance of small dense LDL. In conclusion, there was no effect of fish oil supplementation during the third trimester of pregnancy on offspring plasma lipids and lipoproteins in adolescence

Crossref

Springer - Publisher Connector

PubMed Central

Knee complaints and consequences on work status; a 10-year follow-up survey among floor layers and graphic designers

Author: D Coggon
DJ Hunter
EpiData Association
G Enderlein
H Brenner
H Ekström
I Kuorinka
I Lissau
J Kivimäki
Jens Peter Bonde
Lilli Kirkeskov Jensen
LK Jensen
LK Jensen
LK Jensen
LK Jensen
M Karpansalo
M Thun
N Krause
P Baker
SC O'Reilly
StataCorp LP
Søren Rytter
T Myllymäki
U Siebert
V Arndt
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The purpose of the study was to examine if knee complaints among floor layers predict exclusion from the trade. Methods In 1994/95 self-reported data were obtained from a cohort of floor layers and graphic designers with and without knee straining work activities, respectively. At follow-up in 2005 the questionnaire survey was repeated. The study population consisted of 81 floor layers and 173 graphic designers who were presently working in their trades at baseline (1995). All participants were men aged 36–70 years in 2005. We computed the risk of losing gainful employment in the trade according to occurrence of knee complaints at baseline, using Cox proportional hazard regression adjusted for a number of potential confounding variables. Moreover, the crude and adjusted odds risk ratio for knee complaints according to status of employment in the trade were computed, using graphic designers as reference. Results A positive but non-significant association between knee complaints lasting more than 30 days the past 12 months and exclusion from the trade was found among floor layers (Hazard Ratio = 1.4, 95% CI = 0.6–3.5). The frequency of self-reported knee complaints was lower among floor layers presently at work in the trade in year 2005 (26.3%) compared with baseline in 1995 (41.1%), while the opposite tendency was seen among graphic designers (20.7% vs. 10.7%). Conclusion The study suggests that knee complaints are a risk factor for premature exclusion from a knee demanding trade. However, low power of the study precludes strong conclusions. The study also indicates a healthy worker effect among floor layers and a survivor effect among graphic designers.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central