Search CORE

114 research outputs found

Perfect Necklaces

Author: Becher Verónica
Ferrari Pablo A.
Yuhjtman Sergio A.
Álvarez Nicolás
Publication venue
Publication date: 28/01/2016
Field of study

We introduce a variant of de Bruijn words that we call perfect necklaces. Fix a finite alphabet. Recall that a word is a finite sequence of symbols in the alphabet and a circular word, or necklace, is the equivalence class of a word under rotations. For positive integers k and n, we call a necklace (k,n)-perfect if each word of length k occurs exactly n times at positions which are different modulo n for any convention on the starting point. We call a necklace perfect if it is (k,k)-perfect for some k. We prove that every arithmetic sequence with difference coprime with the alphabet size induces a perfect necklace. In particular, the concatenation of all words of the same length in lexicographic order yields a perfect necklace. For each k and n, we give a closed formula for the number of (k,n)-perfect necklaces. Finally, we prove that every infinite periodic sequence whose period coincides with some (k,n)-perfect necklace for any n, passes all statistical tests of size up to k, but not all larger tests. This last theorem motivated this work

arXiv.org e-Print Archive

CONICET Digital

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Author: Chang Zuling
Chrisnata Johan
Ezerman Martianus Frederic
Kiah Han Mao
Publication venue
Publication date: 08/07/2016
Field of study

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size

q

, read length

\ell

, and word length

n

.Consequently, we demonstrate that for

q\ge 2

and

n\le q^{\ell/2-1}

, the number of profile vectors is at least

q^{\kappa n}

with

\kappa

very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Partial DNA Assembly: A Rate-Distortion Perspective

Author: Courtade Thomas A.
Kamath Govinda M.
Shomorony Ilan
Tse David N.
Xia Fei
Publication venue
Publication date: 06/05/2016
Field of study

Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization of the perfect assembly formulation to these cases would be to consider a rate-distortion framework, partial assemblies are usually represented in terms of an assembly graph, making the definition of a distortion measure challenging. In this work, we introduce a distortion function for assembly graphs that can be understood as the logarithm of the number of Eulerian cycles in the assembly graph, each of which correspond to a candidate assembly that could have generated the observed reads. We also introduce an algorithm for the construction of an assembly graph and analyze its performance on real genomes.Comment: To be published at ISIT-2016. 11 pages, 10 figure

arXiv.org e-Print Archive

Crossref

Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs

Author: D\u27Addario Marianna
Kriege Nils
Rahmann Sven
Publication venue: OASIcs - OpenAccess Series in Informatics. German Conference on Bioinformatics 2012
Publication date: 01/01/2012
Field of study

DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a convenient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation induced by reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, so finding a longest q-unique sequence is equivalent to finding an Euler tour and solved in linear time with respect to the output string length. For even q, self-complementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open

Dagstuhl Research Online Publication Server

Shift registers and De Bruijn graphs

Author: van Vredendaal C.
Publication venue
Publication date: 01/01/2011
Field of study

Repository TU/e

Pure OAI Repository

A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem

Author: A Tatsuya
A Tatsuya
AC Good
AC Good
B Mak
BB Masek
C Steinbeck
C Steinbeck
CA Azencott
CJ Churchwell
DB Reitz
FJ Burkowski
Forbes J Burkowski
GH Bakir
HC Huang
J Shawe-Taylor
JJ Sutherland
JL Faulon
JL Faulon
JL Faulon
JL Faulon
JTY Kwok
JW Robin
K-R Müller
KA Sharp
L Ralaivola
LB Kier
LH Hall
LH Hall
MI Skvortsova
N Brown
P Chavatte
P Mahe
P Mahe
PA Pevzner
R Todeschini
RA Lewis
RC Glenn
RP Sheridan
S Mika
SJ Swamidass
V Kvasnicka
V Venkatasubramanian
VJ Gillet
William WL Wong
X Leval
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The inverse-QSAR problem seeks to find a new molecular descriptor from which one can recover the structure of a molecule that possess a desired activity or property. Surprisingly, there are very few papers providing solutions to this problem. It is a difficult problem because the molecular descriptors involved with the inverse-QSAR algorithm must adequately address the forward QSAR problem for a given biological activity if the subsequent recovery phase is to be meaningful. In addition, one should be able to construct a feasible molecule from such a descriptor. The difficulty of recovering the molecule from its descriptor is the major limitation of most inverse-QSAR methods. Results In this paper, we describe the reversibility of our previously reported descriptor, the vector space model molecular descriptor (VSMMD) based on a vector space model that is suitable for kernel studies in QSAR modeling. Our inverse-QSAR approach can be described using five steps: (1) generate the VSMMD for the compounds in the training set; (2) map the VSMMD in the input space to the kernel feature space using an appropriate kernel function; (3) design or generate a new point in the kernel feature space using a kernel feature space algorithm; (4) map the feature space point back to the input space of descriptors using a pre-image approximation algorithm; (5) build the molecular structure template using our VSMMD molecule recovery algorithm. Conclusion The empirical results reported in this paper show that our strategy of using kernel methodology for an inverse-Quantitative Structure-Activity Relationship is sufficiently powerful to find a meaningful solution for practical problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Stationary Distribution and Eigenvalues for a de Bruijn Process

Author: Abbas Alhakim
Anthony Ralston
B. Nooten Van
Donald E. Knuth
Haiyan Chen
Herbert S. Wilf
J. Sherman
N. G. Bruijn
P. Flajolet
Pavel A. Pevzner
R A Blythe
R. Dawson
T. Aardenne-Ehrenfest van
T. Mori
V. V. Strok
W. T. Tutte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/08/2011
Field of study

We define a de Bruijn process with parameters n and L as a certain continuous-time Markov chain on the de Bruijn graph with words of length L over an n-letter alphabet as vertices. We determine explicitly its steady state distribution and its characteristic polynomial, which turns out to decompose into linear factors. In addition, we examine the stationary state of two specializations in detail. In the first one, the de Bruijn-Bernoulli process, this is a product measure. In the second one, the Skin-deep de Bruin process, the distribution has constant density but nontrivial correlation functions. The two point correlation function is determined using generating function techniques.Comment: Dedicated to Herb Wilf on the occasion of his 80th birthda

arXiv.org e-Print Archive

Crossref

eScholarship - University of California