755 research outputs found

    String attractors and combinatorics on words

    Get PDF
    The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w[1]w[2] · · · w[n] is a subset Γ of the positions 1, . . ., n, such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words

    The Alternating BWT: An algorithmic perspective

    Get PDF
    The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm (K\ue4rkk\ue4inen et al., 2006 [28]) with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order

    A new class of string transformations for compressed text indexing

    Get PDF
    Introduced about thirty years ago in the field of data compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the design of efficient self-indexing compressed data structures. Finding other string transformations with the same remarkable properties of BWT has been a challenge for many researchers for a long time. In this paper, we introduce a whole class of new string transformations, called local orderings-based transformations, which have all the “myriad virtues” of BWT. As a further result, we show that such new string transformations can be used for the construction of the recently introduced r-index, which makes them suitable also for highly repetitive collections. In this context, we consider the problem of finding, for a given string, the BWT variant that minimizes the number of runs in the transformed string

    Incidence of mild cognitive impairment and dementia in Parkinson's disease: The Parkinson's disease cognitive impairment study

    Get PDF
    Background: Cognitive impairment in Parkinson's disease (PD) includes a spectrum varying from Mild Cognitive Impairment (PD-MCI) to PD Dementia (PDD). The main aim of the present study is to evaluate the incidence of PD-MCI, its rate of progression to dementia, and to identify demographic and clinical characteristics which predict cognitive impairment in PD patients. Methods: PD patients from a large hospital-based cohort who underwent at least two comprehensive neuropsychological evaluations were retrospectively enrolled in the study. PD-MCI and PDD were diagnosed according to the Movement Disorder Society criteria. Incidence rates of PD-MCI and PDD were estimated. Clinical and demographic factors predicting PD-MCI and dementia were evaluated using Cox proportional hazard model. Results: Out of 139 enrolled PD patients, 84 were classified with normal cognition (PD-NC), while 55 (39.6%) fulfilled the diagnosis of PD-MCI at baseline. At follow-up (mean follow-up 23.5 ± 10.3 months) 28 (33.3%) of the 84 PD-NC at baseline developed MCI and 4 (4.8%) converted to PDD. The incidence rate of PD-MCI was 184.0/1000 pyar (95% CI 124.7-262.3). At multivariate analysis a negative association between education and MCI development at follow-up was observed (HR 0.37, 95% CI 0.15-0.89; p = 0.03). The incidence rate of dementia was 24.3/1000 pyar (95% CI 7.7-58.5). Out of 55 PD-MCI patients at baseline, 14 (25.4%) converted to PDD, giving an incidence rate of 123.5/1000 pyar (95% CI 70.3-202.2). A five time increased risk of PDD was found in PD patients with MCI at baseline (RR 5.09, 95% CI 1.60-21.4). Conclusion: Our study supports the relevant role of PD-MCI in predicting PDD and underlines the importance of education in reducing the risk of cognitive impairment

    Sorting conjugates and Suffixes of Words in a Multiset

    Get PDF
    In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <_lex and ≺_ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <_lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺_ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatics, where for instance the advent of "next-generation" DNA sequencing technologies has meant that huge collections of DNA sequences are now commonplace

    Factorization in Formal Languages

    Get PDF
    We consider several novel aspects of unique factorization in formal languages. We reprove the familiar fact that the set uf(L) of words having unique factorization into elements of L is regular if L is regular, and from this deduce an quadratic upper and lower bound on the length of the shortest word not in uf(L). We observe that uf(L) need not be context-free if L is context-free. Next, we consider variations on unique factorization. We define a notion of "semi-unique" factorization, where every factorization has the same number of terms, and show that, if L is regular or even finite, the set of words having such a factorization need not be context-free. Finally, we consider additional variations, such as unique factorization "up to permutation" and "up to subset"

    In Silico Design, Synthesis and Biological Evaluation of Anticancer Arylsulfonamide Endowed with Anti-Telomerase Activity

    Get PDF
    Telomerase, a reverse transcriptase enzyme involved in DNA synthesis, has a tangible role in tumor progression. Several studies have evidenced telomerase as a promising target for developing cancer therapeutics. The main reason is due to the overexpression of telomerase in cancer cells (85–90%) compared with normal cells where it is almost unexpressed. In this paper, we used a structure-based approach to design potential inhibitors of the telomerase active site. The MYSHAPE (Molecular dYnamics SHared PharmacophorE) approach and docking were used to screen an in-house library of 126 arylsulfonamide derivatives. Promising compounds were synthesized using classical and green methods. Compound 2C revealed an interesting IC50 (33 ± 4 µM) against the K-562 cell line compared with the known telomerase inhibitor BIBR1532 IC50 (208 ± 11 µM) with an SI ~10 compared to the BALB/3-T3 cell line. A 100 ns MD simulation of 2C in the telomerase active site evidenced Phe494 as the key residue as well as in BIBR1532. Each moiety of compound 2C was involved in key interactions with some residues of the active site: Arg557, Ile550, and Gly553. Compound 2C, as an arylsulfonamide derivative, is an interesting hit compound that deserves further investigation in terms of optimization of its structure to obtain more active telomerase inhibitors
    • …
    corecore