1,646 research outputs found

    Rates of DNA Sequence Profiles for Practical Values of Read Lengths

    Full text link
    A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size qq, read length ℓ\ell, and word length nn.Consequently, we demonstrate that for q≥2q\ge 2 and n≤qℓ/2−1n\le q^{\ell/2-1}, the number of profile vectors is at least qκnq^{\kappa n} with κ\kappa very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

    Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences

    Full text link
    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of {\it two-fold de Bruijn sequences}, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied

    On Greedy Algorithms for Binary de Bruijn Sequences

    Full text link
    We propose a general greedy algorithm for binary de Bruijn sequences, called Generalized Prefer-Opposite (GPO) Algorithm, and its modifications. By identifying specific feedback functions and initial states, we demonstrate that most previously-known greedy algorithms that generate binary de Bruijn sequences are particular cases of our new algorithm

    Partial DNA Assembly: A Rate-Distortion Perspective

    Full text link
    Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization of the perfect assembly formulation to these cases would be to consider a rate-distortion framework, partial assemblies are usually represented in terms of an assembly graph, making the definition of a distortion measure challenging. In this work, we introduce a distortion function for assembly graphs that can be understood as the logarithm of the number of Eulerian cycles in the assembly graph, each of which correspond to a candidate assembly that could have generated the observed reads. We also introduce an algorithm for the construction of an assembly graph and analyze its performance on real genomes.Comment: To be published at ISIT-2016. 11 pages, 10 figure

    Derandomized Parallel Repetition via Structured PCPs

    Full text link
    A PCP is a proof system for NP in which the proof can be checked by a probabilistic verifier. The verifier is only allowed to read a very small portion of the proof, and in return is allowed to err with some bounded probability. The probability that the verifier accepts a false proof is called the soundness error, and is an important parameter of a PCP system that one seeks to minimize. Constructing PCPs with sub-constant soundness error and, at the same time, a minimal number of queries into the proof (namely two) is especially important due to applications for inapproximability. In this work we construct such PCP verifiers, i.e., PCPs that make only two queries and have sub-constant soundness error. Our construction can be viewed as a combinatorial alternative to the "manifold vs. point" construction, which is the only construction in the literature for this parameter range. The "manifold vs. point" PCP is based on a low degree test, while our construction is based on a direct product test. We also extend our construction to yield a decodable PCP (dPCP) with the same parameters. By plugging in this dPCP into the scheme of Dinur and Harsha (FOCS 2009) one gets an alternative construction of the result of Moshkovitz and Raz (FOCS 2008), namely: a construction of two-query PCPs with small soundness error and small alphabet size. Our construction of a PCP is based on extending the derandomized direct product test of Impagliazzo, Kabanets and Wigderson (STOC 09) to a derandomized parallel repetition theorem. More accurately, our PCP construction is obtained in two steps. We first prove a derandomized parallel repetition theorem for specially structured PCPs. Then, we show that any PCP can be transformed into one that has the required structure, by embedding it on a de-Bruijn graph

    Automorphisms of shift spaces and the Higman - Thompson groups : the one-sided case

    Get PDF
    Funding: The authors are all grateful for support from EPSRC research grant EP/R032866/1; the third author also gratefully acknowledges support from Leverhulme Trust Research Project Grant RPG-2017-159.Let 1 ≤ r < n be integers. We give a proof that the group Aut(Xℕn,σn) of automorphisms of the one-sided shift on n letters embeds naturally as a subgroup ℋn of the outer automorphism group Out(Gn,r) of the Higman-Thompson group Gn,r. From this, we can represent the elements of Aut(Xℕn,σn) by finite state non-initial transducers admitting a very strong synchronizing condition. Let H ∈ ℋn and write |H| for the number of states of the minimal transducer representing H. We show that H can be written as a product of at most |H| torsion elements. This result strengthens a similar result of Boyle, Franks and Kitchens, where the decomposition involves more complex torsion elements and also does not support practical a priori estimates of the length of the resulting product. We also explore the number of foldings of de Bruijn graphs and give acounting result for these for word length 2 and alphabet size n. Finally, we offer new proofs of some known results about Aut(Xℕn,σn).Publisher PDFPeer reviewe
    • …
    corecore