1,646 research outputs found
Rates of DNA Sequence Profiles for Practical Values of Read Lengths
A recent study by one of the authors has demonstrated the importance of
profile vectors in DNA-based data storage. We provide exact values and lower
bounds on the number of profile vectors for finite values of alphabet size ,
read length , and word length .Consequently, we demonstrate that for
and , the number of profile vectors is at least
with very close to one.In addition to enumeration
results, we provide a set of efficient encoding and decoding algorithms for
each of two particular families of profile vectors
Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences
The concept of symbolic sequences play important role in study of complex
systems. In the work we are interested in ultrametric structure of the set of
cyclic sequences naturally arising in theory of dynamical systems. Aimed at
construction of analytic and numerical methods for investigation of clusters we
introduce operator language on the space of symbolic sequences and propose an
approach based on wavelet analysis for study of the cluster hierarchy. The
analytic power of the approach is demonstrated by derivation of a formula for
counting of {\it two-fold de Bruijn sequences}, the extension of the notion of
de Bruijn sequences. Possible advantages of the developed description is also
discussed in context of applied
On Greedy Algorithms for Binary de Bruijn Sequences
We propose a general greedy algorithm for binary de Bruijn sequences, called
Generalized Prefer-Opposite (GPO) Algorithm, and its modifications. By
identifying specific feedback functions and initial states, we demonstrate that
most previously-known greedy algorithms that generate binary de Bruijn
sequences are particular cases of our new algorithm
Partial DNA Assembly: A Rate-Distortion Perspective
Earlier formulations of the DNA assembly problem were all in the context of
perfect assembly; i.e., given a set of reads from a long genome sequence, is it
possible to perfectly reconstruct the original sequence? In practice, however,
it is very often the case that the read data is not sufficiently rich to permit
unambiguous reconstruction of the original sequence. While a natural
generalization of the perfect assembly formulation to these cases would be to
consider a rate-distortion framework, partial assemblies are usually
represented in terms of an assembly graph, making the definition of a
distortion measure challenging. In this work, we introduce a distortion
function for assembly graphs that can be understood as the logarithm of the
number of Eulerian cycles in the assembly graph, each of which correspond to a
candidate assembly that could have generated the observed reads. We also
introduce an algorithm for the construction of an assembly graph and analyze
its performance on real genomes.Comment: To be published at ISIT-2016. 11 pages, 10 figure
Derandomized Parallel Repetition via Structured PCPs
A PCP is a proof system for NP in which the proof can be checked by a
probabilistic verifier. The verifier is only allowed to read a very small
portion of the proof, and in return is allowed to err with some bounded
probability. The probability that the verifier accepts a false proof is called
the soundness error, and is an important parameter of a PCP system that one
seeks to minimize. Constructing PCPs with sub-constant soundness error and, at
the same time, a minimal number of queries into the proof (namely two) is
especially important due to applications for inapproximability.
In this work we construct such PCP verifiers, i.e., PCPs that make only two
queries and have sub-constant soundness error. Our construction can be viewed
as a combinatorial alternative to the "manifold vs. point" construction, which
is the only construction in the literature for this parameter range. The
"manifold vs. point" PCP is based on a low degree test, while our construction
is based on a direct product test. We also extend our construction to yield a
decodable PCP (dPCP) with the same parameters. By plugging in this dPCP into
the scheme of Dinur and Harsha (FOCS 2009) one gets an alternative construction
of the result of Moshkovitz and Raz (FOCS 2008), namely: a construction of
two-query PCPs with small soundness error and small alphabet size.
Our construction of a PCP is based on extending the derandomized direct
product test of Impagliazzo, Kabanets and Wigderson (STOC 09) to a derandomized
parallel repetition theorem. More accurately, our PCP construction is obtained
in two steps. We first prove a derandomized parallel repetition theorem for
specially structured PCPs. Then, we show that any PCP can be transformed into
one that has the required structure, by embedding it on a de-Bruijn graph
Automorphisms of shift spaces and the Higman - Thompson groups : the one-sided case
Funding: The authors are all grateful for support from EPSRC research grant EP/R032866/1; the third author also gratefully acknowledges support from Leverhulme Trust Research Project Grant RPG-2017-159.Let 1 ≤ r < n be integers. We give a proof that the group Aut(Xℕn,σn) of automorphisms of the one-sided shift on n letters embeds naturally as a subgroup ℋn of the outer automorphism group Out(Gn,r) of the Higman-Thompson group Gn,r. From this, we can represent the elements of Aut(Xℕn,σn) by finite state non-initial transducers admitting a very strong synchronizing condition. Let H ∈ ℋn and write |H| for the number of states of the minimal transducer representing H. We show that H can be written as a product of at most |H| torsion elements. This result strengthens a similar result of Boyle, Franks and Kitchens, where the decomposition involves more complex torsion elements and also does not support practical a priori estimates of the length of the resulting product. We also explore the number of foldings of de Bruijn graphs and give acounting result for these for word length 2 and alphabet size n. Finally, we offer new proofs of some known results about Aut(Xℕn,σn).Publisher PDFPeer reviewe
- …