8 research outputs found

    Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

    Get PDF
    An elastic-degenerate (ED) string T is a sequence of n sets T[1], . . ., T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as L(T) = {S1 · · · Sn : Si ∈ T[i] for all i ∈ [1, n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: There is no O((N1N2)1−ϔ)-time algorithm, thus no O ((N1m2 + N2m1)1−ϔ)-time algorithm and no O ((N1n2 + N2n1)1−ϔ)-time algorithm, for any constant Ï” > 0, for EDSI even when T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. There is no combinatorial O((N1 + N2)1.2−ϔf(n1, n2))-time algorithm, for any constant Ï” > 0 and any function f, for EDSI even when T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. An O(N1 log N1 log n1 + N2 log N2 log n2)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T1 and T2 are given in a compact representation, we show that the problem is NP-complete. An O(N1m2 + N2m1)-time algorithm for EDSI. An Õ(N1ω−1n2 + N2ω−1n1)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison

    On Strings Having the Same Length- k Substrings

    Get PDF
    Let Substrk(X) denote the set of length-k substrings of a given string X for a given integer k > 0. We study the following basic string problem, called z-Shortest Sk-Equivalent Strings: Given a set Sk of n length-k strings and an integer z > 0, list z shortest distinct strings T1,..., Tz such that Substrk(Ti) = Sk, for all i ∈ [1, z]. The z-Shortest Sk-Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g., in data privacy, in data compression, and in bioinformatics. The 1-Shortest Sk-Equivalent Strings, referred to as Shortest Sk-Equivalent String, asks for a shortest string X such that Substrk(X) = Sk. Our main contributions are summarized below: Given a directed graph G(V, E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved in Õ(|E||V |) time using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if Shortest Sk-Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP. We show that the length of a shortest string output by Shortest Sk-Equivalent String is in O(k + n2). We generalize this bound by showing that the total length of z shortest strings is in O(zk + zn2 + z2n). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs. We present an algorithm for solving z-Shortest Sk-Equivalent Strings in O(nk + n2 log2 n + zn2 log n + |output|) time. If z = 1, the time becomes O(nk + n2 log2 n) by the fact that the size of the input is Θ(nk) and the size of the output is O(k + n2).</p

    On Strings Having the Same Length- k Substrings

    Get PDF
    8siLet Substr_k(X) denote the set of length-k substrings of a given string X for a given integer k > 0. We study the following basic string problem, called z-Shortest _k-Equivalent Strings: Given a set _k of n length-k strings and an integer z > 0, list z shortest distinct strings T₁,
,T_z such that Substr_k(T_i) = _k, for all i ∈ [1,z]. The z-Shortest _k-Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g., in data privacy, in data compression, and in bioinformatics. The 1-Shortest _k-Equivalent Strings, referred to as Shortest _k-Equivalent String, asks for a shortest string X such that Substr_k(X) = _k. Our main contributions are summarized below: - Given a directed graph G(V,E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved in ̃(|E||V|) time using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if Shortest _k-Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP. - We show that the length of a shortest string output by Shortest _k-Equivalent String is in (k+nÂČ). We generalize this bound by showing that the total length of z shortest strings is in (zk+znÂČ+zÂČn). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs. - We present an algorithm for solving z-Shortest _k-Equivalent Strings in (nk+nÂČlogÂČn+znÂČlog n+|output|) time. If z = 1, the time becomes (nk+nÂČlogÂČn) by the fact that the size of the input is Θ(nk) and the size of the output is (k+nÂČ).nonenoneGiulia Bernardini; Alessio Conte; Esteban Gabory; Roberto Grossi; Grigorios Loukides; Solon P. Pissis; Giulia Punzi; Michelle SweeringBernardini, Giulia; Conte, Alessio; Gabory, Esteban; Grossi, Roberto; Loukides, Grigorios; Pissis, Solon P.; Punzi, Giulia; Sweering, Michell

    Elastic-Degenerate String Matching with 1 Error

    No full text
    An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~ (nmω-1) + O(N) -time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~ (· ) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+ kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k= 1, the existing algorithm runs in Ω(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+ N) log m) or O(nm3+ N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm+Nloglogm) -time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently

    Should a neck dissection be performed on patients with cN0 adenoid cystic carcinoma? A REFCOR propensity score matching study

    No full text
    International audienceBackground: Patterns of nodal involvement in adenoid cystic carcinoma (ACC) of the head and neck have not been sufficiently assessed to guide a decision of prophylactic neck dissection (ND). The objective of this study is to analyse the influence of ND on event-free survival (EFS) for patients with cN0 ACC.Patients and methods: A multicentre prospective study was conducted between 2009 and 2018. Patients presenting cN0 non-metastatic ACC on any site, and who received surgery on the tumour, were included. EFS was the main judgement criterion. A comparative survival analysis between the groups that received a ND versus those that did not was performed, using a propensity score. Analyses were carried out using the R software.Results: Between 2009 and 2018, 322 patients with cN0 ACC were included, out of which 58% were female. The average age was 53 years. Tumours were in minor salivary glands in 58% of cases, and 52% had T3/T4 stages. ND was performed on 46% of patients. Out of them, seven had histological lymph node invasion, out of which six had tumour infiltration in the mucosa of oral cavity. After propensity score, the median EFS for N0 patients with ND was 72 months (95% Confidence Interval (CI) [48-81]), compared to 73 months (95% CI [52-85]) for patients without ND (HR = 1.33; 95% CI [0.82-2.16]; p = 0.2).Conclusion: ND of cN0 patients does not provide any benefit on EFS, which suggests that its application on such patients is not necessary

    Exercise and epigenetic inheritance of disease risk

    No full text
    Epigenetics is the study of gene expression changes that occur in the absence of altered genotype. Current evidence indicates a role for environmentally induced alterations to epigenetic modifications leading to health and disease changes across multiple generations. This phenomenon is called intergenerational or transgenerational epigenetic inheritance of health or disease. Environmental insults, in the form of toxins, plastics and particular dietary interventions, perturb the epigenetic landscape and influence the health of F1 through to F4 generations in rodents. There is, however, the possibility that healthy lifestyles and environmental factors, such as exercise training, could lead to favourable, heritable epigenetic modifications that augment transcriptional programmes protective of disease, including metabolic dysfunction, heart disease and cancer. The health benefits conferred by regular physical exercise training are unquestionable, yet many of the molecular changes may have heritable health implications for future generations. Similar to other environmental factors, exercise modulates the epigenome of somatic cells and researchers are beginning to study exercise epigenetics in germ cells. The germ cell epigenetic modifications affected by exercise offer a molecular mechanism for the inheritance of health and disease risk. The aims of this review are to: (i) provide an update on the expanding field of exercise epigenetics; (ii) offer an overview of data on intergenerational/transgenerational epigenetic inheritance of disease by environmental insults; (iii) to discuss the potential of exercise-induced intergenerational inheritance of health and disease risk; and finally, outline potential mechanisms and avenues for future work on epigenetic inheritance through exercise
    corecore