Search CORE

114 research outputs found

Reverse-Safe Data Structures for Text Indexing

Author: Gabriele Fici
Giulia Bernardini
Grigorios Loukides
Huiping Chen
Solon P. Pissis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CWI's Institutional Repository

University of Birmingham Research Portal

Archivio istituzionale della ricerca - Università di Palermo

Palindromic Length of Words with Many Periodic Palindromes

Author: A Frid
A Saarela
AE Frid
AE Frid
D Kosolobov
G Fici
M Bucci
M Rubinchik
P Ambrož
P Ambrož
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/05/2020
Field of study

The palindromic length

\text{PL}(v)

of a finite word

v

is the minimal number of palindromes whose concatenation is equal to

v

. In 2013, Frid, Puzynina, and Zamboni conjectured that: If

w

is an infinite word and

k

is an integer such that

\text{PL}(u)\leq k

for every factor

u

w

then

w

is ultimately periodic. Suppose that

w

is an infinite word and

k

is an integer such

\text{PL}(u)\leq k

for every factor

u

w

. Let

\Omega(w,k)

be the set of all factors

u

w

that have more than

\sqrt[k]{k^{-1}\vert u\vert}

palindromic prefixes. We show that

\Omega(w,k)

is an infinite set and we show that for each positive integer

j

there are palindromes

a,b

and a word

u\in \Omega(w,k)

such that

(ab)^j

is a factor of

u

and

b

is nonempty. Note that

(ab)^j

is a periodic word and

(ab)^ia

is a palindrome for each

i\leq j

. These results justify the following question: What is the palindromic length of a concatenation of a suffix of

b

and a periodic word

(ab)^j

with "many" periodic palindromes? It is known that

\lvert\text{PL}(uv)-\text{PL}(u)\rvert\leq \text{PL}(v)

, where

u

and

v

are nonempty words. The main result of our article shows that if

a,b

are palindromes,

b

is nonempty,

u

is a nonempty suffix of

b

\vert ab\vert

is the minimal period of

aba

, and

j

is a positive integer with

j\geq3\text{PL}(u)

then

\text{PL}(u(ab)^j)-\text{PL}(u)\geq 0

arXiv.org e-Print Archive

Crossref

Substring Complexity in Sublinear Space

Author: Bernardini Giulia
Fici Gabriele
Gawrychowski Paweł
Pissis Solon P.
Publication venue
Publication date: 16/07/2020
Field of study

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size

z

of the Lempel-Ziv parse or the number

r

of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size

\gamma

of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing

\gamma

is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function

S_T

counting the cardinalities of the sets of substrings of each length of

T

, also known as the substring complexity. This new measure is defined as

\delta= \sup\{S_T(k)/k, k\geq 1\}

and lower bounds all the measures previously considered. In particular,

\delta\leq \gamma

always holds and

\delta

can be computed in

\mathcal{O}(n)

time using

\Omega(n)

working space. Kociumaka et al. showed that if

\delta

is given, one can construct an

\mathcal{O}(\delta \log \frac{n}{\delta})

-sized representation of

T

supporting efficient direct access and efficient pattern matching queries on

T

. Given that for highly compressible strings,

\delta

is significantly smaller than

n

, it is natural to pose the following question: Can we compute

\delta

efficiently using sublinear working space? It is straightforward to show that any algorithm computing

\delta

using

\mathcal{O}(b)

space requires

\Omega(n^{2-o(1)}/b)

time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We present the following results: an

\mathcal{O}(n^3/b^2)

-time and

\mathcal{O}(b)

-space algorithm to compute

\delta

, for any

b\in[1,n]

; and an

\tilde{\mathcal{O}}(n^2/b)

-time and

\mathcal{O}(b)

-space algorithm to compute

\delta

, for any

b\in[n^{2/3},n]

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

CWI's Institutional Repository

Substring Complexity in Sublinear Space

Author: and Solon P. Pissis.
Gabriele Fici
Giulia Bernardini
Paweł Gawrychowski
Publication venue: Schloss-Dagstuhl - Leibniz Zentrum für Informatik
Publication date: 28/11/2023
Field of study

Shannon’s entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel–Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Let T be a string of length n. A string attractor of T is a set of positions of T capturing the occurrences of all the substrings of T. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function S_T(k) counting the number of distinct substrings of length k of T, also known as the substring complexity of T. This new measure is defined as δ = sup{S_T(k)/k, k ≥ 1} and lower bounds all the relevant ad hoc measures previously considered. In particular, δ ≤ γ always holds and δ can be computed in O(n) time using Θ(n) working space. Kociumaka et al. showed that one can construct an O(δ log n/(δ))-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question: Can we compute δ efficiently using sublinear working space? It is straightforward to show that in the comparison model, any algorithm computing δ using O(b) space requires Ω(n^{2-o(1)}/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We thus wanted to investigate whether we can indeed match this lower bound. We address this algorithmic challenge by showing the following bounds to compute δ: - O((n3log b)/b2) time using O(b) space, for any b ∈ [1,n], in the comparison model. - Õ(n2/b) time using Õ(b) space, for any b ∈ [√n,n], in the word RAM model. This gives an Õ(n^{1+ε})-time and Õ(n^{1-ε})-space algorithm to compute δ, for any 0 < ε ≤ 1/2. Let us remark that our algorithms compute S_T(k), for all k, within the same complexities

Archivio istituzionale della ricerca - Università di Palermo

Constructing Antidictionaries of Long Texts in Output-Sensitive Space

Author: Ayad Lorraine A. K.
Badkobeh Golnaz
Fici Gabriele
Heliou Alice
Pissis Solon P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2020
Field of study

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y1, … , yk over an alphabet Σ, we are asked to compute the set M{y1,…,yk}ℓ of minimal absent words of length at most ℓ of the collection {y1, … , yk}. The set M{y1,…,yk}ℓ contains all the words x such that x is absent from all the words of the collection while there exist i,j, such that the maximal proper suffix of x is a factor of yi and the maximal proper prefix of x is a factor of yj. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. Indeed, the set Myℓ of minimal absent words of a word y is equal to M{y1,…,yk}ℓ for any decomposition of y into a collection of words y1, … , yk such that there is an overlap of length at least ℓ − 1 between any two consecutive words in the collection. This computation generally requires Ω(n) space for n = |y| using any of the plenty available O(n) -time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ∥M{y1,…,yN}ℓ∥=o(n), for all N ∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in set S. For instance, in the human genome, n ≈ 3 × 109 but ∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show that allMy1ℓ,…,M{y1,…,yk}ℓ can be computed in O(kn+∑N=1k∥M{y1,…,yN}ℓ∥) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y1, … , yk} and MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution

Goldsmiths Research Online

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Brunel University Research Archive

HAL-Polytechnique

Archivio istituzionale della ricerca - Università di Palermo

POSTURE AND POSTUROLOGY, ANATOMICAL AND PHYSIOLOGICAL PROFILES: OVERVIEW AND CURRENT STATE OF ART

Author: Carini F.
Damiani P.
Fici C.
Mazzola M.
Messina M.
Palmeri S.
Tomasello G.
Publication venue: country:IT
Publication date: 01/01/2017
Field of study

Background and aim of work: posture is the position of the body in the space, and is controlled by a set of anatomical structures. The maintenance and the control of posture are a set of interactions between muscle-skeletal, visual, vestibular, and skin system. Lately there are numerous studies that correlate the muscle-skeletal and the maintenance of posture. In particular, the correction of defects and obstruction of temporomandibular disorders, seem to have an impoact on posture. The aim of this work is to collect information in literature on posture and the influence of the stomatognatich system on postural system. Methods: Comparison of the literature on posture and posturology by consulting books and scientific sites. results: the results obtained from the comparison of the of the literature on posture and posturology by consulting books and scientific sites. Some studies support the correlation between stomatognatich system and posture, while others such a correlation. Conclusions: further studies are necessary to be able to confirm one or the other argument. (www.actabiomedica.it

Archivio istituzionale della ricerca - Università di Palermo

A Characterization of Bispecial Sturmian Words

Author: A. Carpi
A. Luca de
A. Luca de
E.M. Coven
E.P. Lipatov
F. Mignosi
F. Mignosi
G. Fici
G.H. Hardy
J. Berstel
M. Crochemore
M. Morse
M. Sciortino
M.-P. Béal
S. Dulucq
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A finite Sturmian word w over the alphabet {a,b} is left special (resp. right special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial Sturmian word is a Sturmian word that is both left and right special. We show as a main result that bispecial Sturmian words are exactly the maximal internal factors of Christoffel words, that are words coding the digital approximations of segments in the Euclidean plane. This result is an extension of the known relation between central words and primitive Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the set of Sturmian words.Comment: Accepted to MFCS 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Words with the Maximum Number of Abelian Squares

Author: A Luca de
AS Fraenkel
AS Fraenkel
D Damanik
F Mignosi
F Mignosi
G Fici
J Cassaigne
L Ilie
L Kuipers
M Christodoulakis
M Lothaire
P Erdös
S Brlek
T Kociumaka
V Keränen
Publication venue
Publication date: 01/01/2015
Field of study

An abelian square is the concatenation of two words that are anagrams of one another. A word of length

n

can contain

\Theta(n^2)

distinct factors that are abelian squares. We study infinite words such that the number of abelian square factors of length

n

grows quadratically with

n

.Comment: To appear in the proceedings of WORDS 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Minimal Absent Words in Rooted and Unrooted Trees

Author: B Schieber
C Barton
D Belazzougui
D Belazzougui
F Mignosi
F Mignosi
F Mignosi
G Fici
G Fici
M Béal
M Béal
M Crochemore
M Crochemore
M Crochemore
M-P Béal
MA Bender
P Charalampopoulos
P Charalampopoulos
RM Silva
S Chairungsee
T Shibuya
Y Almirantis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet of cardinality. We show that the set of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality (resp.), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time (resp. assuming an integer alphabet of size polynomial in n

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Single-cell NGS-based analysis of copy number alterations reveals new insights in circulating tumor cells persistence in early-stage breast cancer

Author: Angeli D.
Bandini E.
Cocchi C.
Fabbri F.
Fici P.
Gallerani G.
Gaudio M.
Maltoni R.
Martinelli G.
Rocca A.
Rossi T.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Circulating tumor cells (CTCs) are a rare population of cells representing a key player in the metastatic cascade. They are recognized as a validated tool for the identification of patients with a higher risk of relapse, including those diagnosed with breast cancer (BC). However, CTCs are characterized by high levels of heterogeneity that also involve copy number alterations (CNAs), structural variations associated with gene dosage changes. In this study, single CTCs were isolated from the peripheral blood of 11 early-stage BC patients at different time points. A label-free enrichment of CTCs was performed using OncoQuick, and single CTCs were isolated using DEPArray. Libraries were prepared from single CTCs and DNA extracted from matched tumor tissues for a whole-genome low-coverage next-generation sequencing (NGS) analysis using the Ion Torrent S5 System. The analysis of the CNA burden highlighted that CTCs had different degrees of aberration based on the time point and subtype. CTCs were found even six months after surgery and shared CNAs with matched tumor tissue. Tumor-associated CNAs that were recurrent in CTCs were patient-specific, and some alterations involved regions associated with BC and survival (i.e., gains at 1q21-23 and 5p15.33). The enrichment analysis emphasized the involvement of aberrations of terms, associated in particular with interferon (IFN) signaling. Collectively, our findings reveal that these aberrations may contribute to understanding the molecular mechanisms involving CTC-related processes and their survival ability in occult niches, supporting the goal of exploiting their application in patients’ surveillance and follow-up

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna