293 research outputs found
On the maximal sum of exponents of runs in a string
A run is an inclusion maximal occurrence in a string (as a subinterval) of a
repetition with a period such that . The exponent of a run
is defined as and is . We show new bounds on the maximal sum of
exponents of runs in a string of length . Our upper bound of is
better than the best previously known proven bound of by Crochemore &
Ilie (2008). The lower bound of , obtained using a family of binary
words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the
maximal sum of exponents of runs in a string of length is smaller than Comment: 7 pages, 1 figur
Online Pattern Matching for String Edit Distance with Moves
Edit distance with moves (EDM) is a string-to-string distance measure that
includes substring moves in addition to ordinal editing operations to turn one
string to the other. Although optimizing EDM is intractable, it has many
applications especially in error detections. Edit sensitive parsing (ESP) is an
efficient parsing algorithm that guarantees an upper bound of parsing
discrepancies between different appearances of the same substrings in a string.
ESP can be used for computing an approximate EDM as the L1 distance between
characteristic vectors built by node labels in parsing trees. However, ESP is
not applicable to a streaming text data where a whole text is unknown in
advance. We present an online ESP (OESP) that enables an online pattern
matching for EDM. OESP builds a parse tree for a streaming text and computes
the L1 distance between characteristic vectors in an online manner. For the
space-efficient computation of EDM, OESP directly encodes the parse tree into a
succinct representation by leveraging the idea behind recent results of a
dynamic succinct tree. We experimentally test OESP on the ability to compute
EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International
Symposium on String Processing and Information Retrieval (SPIRE2014
Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in
algorithmic stringology. Linear-time algorithms computing runs and
preprocessing for constant-time LCE queries have been known for over a decade.
However, these algorithms assume a linearly-sortable integer alphabet. A recent
breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the
two notions: all the runs in a string can be computed via a linear number of
LCE queries. The first to consider these problems over a general ordered
alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an
-time algorithm for answering LCE queries. This
result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to time. In this work we note a special \emph{non-crossing} property
of LCE queries asked in the runs computation. We show that any such
non-crossing queries can be answered on-line in time, which
yields an -time algorithm for computing runs
On the maximal number of cubic subwords in a string
We investigate the problem of the maximum number of cubic subwords (of the
form ) in a given word. We also consider square subwords (of the form
). The problem of the maximum number of squares in a word is not well
understood. Several new results related to this problem are produced in the
paper. We consider two simple problems related to the maximum number of
subwords which are squares or which are highly repetitive; then we provide a
nontrivial estimation for the number of cubes. We show that the maximum number
of squares such that is not a primitive word (nonprimitive squares) in
a word of length is exactly , and the
maximum number of subwords of the form , for , is exactly .
In particular, the maximum number of cubes in a word is not greater than
either. Using very technical properties of occurrences of cubes, we improve
this bound significantly. We show that the maximum number of cubes in a word of
length is between and . (In particular, we improve the
lower bound from the conference version of the paper.)Comment: 14 page
Composite repetition-aware data structures
In highly repetitive strings, like collections of genomes from the same
species, distinct measures of repetition all grow sublinearly in the length of
the text, and indexes targeted to such strings typically depend only on one of
these measures. We describe two data structures whose size depends on multiple
measures of repetition at once, and that provide competitive tradeoffs between
the time for counting and reporting all the exact occurrences of a pattern, and
the space taken by the structure. The key component of our constructions is the
run-length encoded BWT (RLBWT), which takes space proportional to the number of
BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it
with data structures from LZ77 indexes, which take space proportional to the
number of LZ77 factors, and with the compact directed acyclic word graph
(CDAWG), which takes space proportional to the number of extensions of maximal
repeats. The combination of CDAWG and RLBWT enables also a new representation
of the suffix tree, whose size depends again on the number of extensions of
maximal repeats, and that is powerful enough to support matching statistics and
constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from
previous version
Infections in temporal proximity to HPV vaccination and adverse effects following vaccination in Denmark: A nationwide register-based cohort study and case-crossover analysis
BACKGROUND: Public trust in the human papilloma virus (HPV) vaccination programme has been challenged by reports of potential severe adverse effects. The reported adverse symptoms were heterogeneous and overlapping with those characterised as chronic fatigue syndrome (CFS) and have been described as CFS-like symptoms. Evidence suggests that CFS is often precipitated by an infection. The aim of the study was to examine if an infection in temporal proximity to HPV vaccination is a risk factor for suspected adverse effects following HPV vaccination. METHODS AND FINDINGS: The study was a nationwide register-based cohort study and case-crossover analysis. The study population consisted of all HPV vaccinated females living in Denmark, born between 1974 and 2006, and vaccinated between January 1, 2006 and December 31, 2017. The exposure was any infection in the period ± 1 month around time of first HPV vaccination and was defined as (1) hospital-treated infection; (2) redemption of anti-infective medication; or (3) having a rapid streptococcal test done at the general practitioner. The outcome was referral to a specialised hospital setting (5 national HPV centres opened June 1, 2015) due to suspected adverse effects following HPV vaccination. Multivariable logistic regression was used to estimate the association between infection and later HPV centre referral. The participants were 600,400 HPV-vaccinated females aged 11 to 44 years. Of these, 48,361 (9.7%) females had a hospital-treated infection, redeemed anti-infective medication, or had a rapid streptococcal test ± 1 month around time of first HPV vaccination. A total of 1,755 (0.3%) females were referred to an HPV centre. Having a hospital-treated infection in temporal proximity to vaccination was associated with significantly elevated risk of later referral to an HPV centre (odds ratio (OR) 2.75, 95% confidence interval (CI) 1.72 to 4.40; P < 0.001). Increased risk was also observed among females who redeemed anti-infective medication (OR 1.56, 95% CI 1.33 to 1.83; P < 0.001) or had a rapid streptococcal test (OR 1.45, 95% CI 1.10 to 1.93; P = 0.010). Results from a case-crossover analysis, which was performed to adjust for potential unmeasured confounding, supported the findings. A key limitation of the study is that the HPV centres did not open until June 1, 2015, which may have led to an underestimation of the risk of suspected adverse effects, but stratified analyses by year of vaccination yielded similar results. CONCLUSIONS: Treated infection in temporal proximity to HPV vaccination is associated with increased risk for later referral with suspected adverse vaccine effects. Thus, the infection could potentially be a trigger of the CFS-like symptoms in a subset of the referred females. To our knowledge, the study is the first to investigate the role of infection in the development of suspected adverse effects after HPV vaccination and replication of these findings are needed in other studies
One-dimensional staged self-assembly
17th International Conference, DNA 17, Pasadena, CA, USA, September 19-23, 2011. ProceedingsWe introduce the problem of staged self-assembly of one-dimensional nanostructures, which becomes interesting when the elements are labeled (e.g., representing functional units that must be placed at specific locations). In a restricted model in which each operation has a single terminal assembly, we prove that assembling a given string of labels with the fewest stages is equivalent, up to constant factors, to compressing the string to be uniquely derived from the smallest possible context-free grammar (a well-studied O(logn)-approximable problem). Without this restriction, we show that the optimal assembly can be substantially smaller than the optimal context-free grammar, by a factor of Ω √n/log n even for binary strings of length n. Fortunately, we can bound this separation in model power by a quadratic function in the number of distinct glues or tiles allowed in the assembly, which is typically small in practice
- …