114 research outputs found
Reverse-Safe Data Structures for Text Indexing
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
Palindromic Length of Words with Many Periodic Palindromes
The palindromic length of a finite word is the minimal
number of palindromes whose concatenation is equal to . In 2013, Frid,
Puzynina, and Zamboni conjectured that: If is an infinite word and is
an integer such that for every factor of then
is ultimately periodic.
Suppose that is an infinite word and is an integer such
for every factor of . Let be the set
of all factors of that have more than
palindromic prefixes. We show that is an infinite set and we show
that for each positive integer there are palindromes and a word such that is a factor of and is nonempty. Note
that is a periodic word and is a palindrome for each . These results justify the following question: What is the palindromic
length of a concatenation of a suffix of and a periodic word with
"many" periodic palindromes?
It is known that ,
where and are nonempty words. The main result of our article shows that
if are palindromes, is nonempty, is a nonempty suffix of ,
is the minimal period of , and is a positive integer
with then
Substring Complexity in Sublinear Space
Shannon's entropy is a definitive lower bound for statistical compression.
Unfortunately, no such clear measure exists for the compressibility of
repetitive strings. Thus, ad-hoc measures are employed to estimate the
repetitiveness of strings, e.g., the size of the Lempel-Ziv parse or the
number of equal-letter runs of the Burrows-Wheeler transform. A more recent
one is the size of a smallest string attractor. Unfortunately, Kempa
and Prezza [STOC 2018] showed that computing is NP-hard. Kociumaka et
al. [LATIN 2020] considered a new measure that is based on the function
counting the cardinalities of the sets of substrings of each length of ,
also known as the substring complexity. This new measure is defined as and lower bounds all the measures previously
considered. In particular, always holds and can be
computed in time using working space. Kociumaka et
al. showed that if is given, one can construct an -sized representation of supporting efficient direct
access and efficient pattern matching queries on . Given that for highly
compressible strings, is significantly smaller than , it is natural
to pose the following question: Can we compute efficiently using
sublinear working space?
It is straightforward to show that any algorithm computing using
space requires time through a reduction
from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We present
the following results: an -time and
-space algorithm to compute , for any ; and
an -time and -space algorithm to
compute , for any
Substring Complexity in Sublinear Space
Shannon’s entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel–Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Let T be a string of length n. A string attractor of T is a set of positions of T capturing the occurrences of all the substrings of T. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function S_T(k) counting the number of distinct substrings of length k of T, also known as the substring complexity of T. This new measure is defined as δ = sup{S_T(k)/k, k ≥ 1} and lower bounds all the relevant ad hoc measures previously considered. In particular, δ ≤ γ always holds and δ can be computed in O(n) time using Θ(n) working space. Kociumaka et al. showed that one can construct an O(δ log n/(δ))-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question: Can we compute δ efficiently using sublinear working space? It is straightforward to show that in the comparison model, any algorithm computing δ using O(b) space requires Ω(n^{2-o(1)}/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We thus wanted to investigate whether we can indeed match this lower bound. We address this algorithmic challenge by showing the following bounds to compute δ: - O((n3log b)/b2) time using O(b) space, for any b ∈ [1,n], in the comparison model. - Õ(n2/b) time using Õ(b) space, for any b ∈ [√n,n], in the word RAM model. This gives an Õ(n^{1+ε})-time and Õ(n^{1-ε})-space algorithm to compute δ, for any 0 < ε ≤ 1/2. Let us remark that our algorithms compute S_T(k), for all k, within the same complexities
Constructing Antidictionaries of Long Texts in Output-Sensitive Space
A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y1, … , yk over an alphabet Σ, we are asked to compute the set M{y1,…,yk}ℓ of minimal absent words of length at most ℓ of the collection {y1, … , yk}. The set M{y1,…,yk}ℓ contains all the words x such that x is absent from all the words of the collection while there exist i,j, such that the maximal proper suffix of x is a factor of yi and the maximal proper prefix of x is a factor of yj. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. Indeed, the set Myℓ of minimal absent words of a word y is equal to M{y1,…,yk}ℓ for any decomposition of y into a collection of words y1, … , yk such that there is an overlap of length at least ℓ − 1 between any two consecutive words in the collection. This computation generally requires Ω(n) space for n = |y| using any of the plenty available O(n) -time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ∥M{y1,…,yN}ℓ∥=o(n), for all N ∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in set S. For instance, in the human genome, n ≈ 3 × 109 but ∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show that allMy1ℓ,…,M{y1,…,yk}ℓ can be computed in O(kn+∑N=1k∥M{y1,…,yN}ℓ∥) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y1, … , yk} and MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution
POSTURE AND POSTUROLOGY, ANATOMICAL AND PHYSIOLOGICAL PROFILES: OVERVIEW AND CURRENT STATE OF ART
Background and aim of work: posture is the position of the body in the space, and is controlled by a set of anatomical structures. The maintenance and the control of posture are a set of interactions between muscle-skeletal, visual, vestibular, and skin system. Lately there are numerous studies that correlate the muscle-skeletal and the maintenance of posture. In particular, the correction of defects and obstruction of temporomandibular disorders, seem to have an impoact on posture. The aim of this work is to collect information in literature on posture and the influence of the stomatognatich system on postural system. Methods: Comparison of the literature on posture and posturology by consulting books and scientific sites. results: the results obtained from the comparison of the of the literature on posture and posturology by consulting books and scientific sites. Some studies support the correlation between stomatognatich system and posture, while others such a correlation. Conclusions: further studies are necessary to be able to confirm one or the other argument. (www.actabiomedica.it
A Characterization of Bispecial Sturmian Words
A finite Sturmian word w over the alphabet {a,b} is left special (resp. right
special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial
Sturmian word is a Sturmian word that is both left and right special. We show
as a main result that bispecial Sturmian words are exactly the maximal internal
factors of Christoffel words, that are words coding the digital approximations
of segments in the Euclidean plane. This result is an extension of the known
relation between central words and primitive Christoffel words. Our
characterization allows us to give an enumerative formula for bispecial
Sturmian words. We also investigate the minimal forbidden words for the set of
Sturmian words.Comment: Accepted to MFCS 201
Words with the Maximum Number of Abelian Squares
An abelian square is the concatenation of two words that are anagrams of one
another. A word of length can contain distinct factors that
are abelian squares. We study infinite words such that the number of abelian
square factors of length grows quadratically with .Comment: To appear in the proceedings of WORDS 201
Minimal Absent Words in Rooted and Unrooted Trees
We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet of cardinality. We show that the set of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality (resp.), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time (resp. assuming an integer alphabet of size polynomial in n
Single-cell NGS-based analysis of copy number alterations reveals new insights in circulating tumor cells persistence in early-stage breast cancer
Circulating tumor cells (CTCs) are a rare population of cells representing a key player in the metastatic cascade. They are recognized as a validated tool for the identification of patients with a higher risk of relapse, including those diagnosed with breast cancer (BC). However, CTCs are characterized by high levels of heterogeneity that also involve copy number alterations (CNAs), structural variations associated with gene dosage changes. In this study, single CTCs were isolated from the peripheral blood of 11 early-stage BC patients at different time points. A label-free enrichment of CTCs was performed using OncoQuick, and single CTCs were isolated using DEPArray. Libraries were prepared from single CTCs and DNA extracted from matched tumor tissues for a whole-genome low-coverage next-generation sequencing (NGS) analysis using the Ion Torrent S5 System. The analysis of the CNA burden highlighted that CTCs had different degrees of aberration based on the time point and subtype. CTCs were found even six months after surgery and shared CNAs with matched tumor tissue. Tumor-associated CNAs that were recurrent in CTCs were patient-specific, and some alterations involved regions associated with BC and survival (i.e., gains at 1q21-23 and 5p15.33). The enrichment analysis emphasized the involvement of aberrations of terms, associated in particular with interferon (IFN) signaling. Collectively, our findings reveal that these aberrations may contribute to understanding the molecular mechanisms involving CTC-related processes and their survival ability in occult niches, supporting the goal of exploiting their application in patients’ surveillance and follow-up
- …