Search CORE

5,049 research outputs found

Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA

Author: Goodrich Michael T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/04/2010
Field of study

We study the degree to which a character string,

Q

, leaks details about itself any time it engages in comparison protocols with a strings provided by a querier, Bob, even if those protocols are cryptographically guaranteed to produce no additional information other than the scores that assess the degree to which

Q

matches strings offered by Bob. We show that such scenarios allow Bob to play variants of the game of Mastermind with

Q

so as to learn the complete identity of

Q

. We show that there are a number of efficient implementations for Bob to employ in these Mastermind attacks, depending on knowledge he has about the structure of

Q

, which show how quickly he can determine

Q

. Indeed, we show that Bob can discover

Q

using a number of rounds of test comparisons that is much smaller than the length of

Q

, under reasonable assumptions regarding the types of scores that are returned by the cryptographic protocols and whether he can use knowledge about the distribution that

Q

comes from. We also provide the results of a case study we performed on a database of mitochondrial DNA, showing the vulnerability of existing real-world DNA data to the Mastermind attack.Comment: Full version of related paper appearing in IEEE Symposium on Security and Privacy 2009, "The Mastermind Attack on Genomic Data." This version corrects the proofs of what are now Theorems 2 and 4

arXiv.org e-Print Archive

Crossref

Genetic Algorithms for the Imitation of Genomic Styles in Protein Backtranslation

Author: Moreira Andres
Publication venue
Publication date: 05/04/2003
Field of study

Several technological applications require the translation of a protein into a nucleic acid that codes for it (``backtranslation''). The degeneracy of the genetic code makes this translation ambiguous; moreover, not every translation is equally viable. The common answer to this problem is the imitation of the codon usage of the target species. Here we discuss several other features of coding sequences (``coding statistics'') that are relevant for the ``genomic style'' of different species. A genetic algorithm is then used to obtain backtranslations that mimic these styles, by minimizing the difference in the coding statistics. Possible improvements and applications are discussed.Comment: 17 pages, 13 figures. Submitted to Theor. Comp. Scienc

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

CERN Document Server

Towards Understanding the Origin of Genetic Languages

Author: Patel Apoorva D.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/10/2008
Field of study

Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation principles at the root of this system, from the availability of basic building blocks to the execution of tasks. The languages of DNA and proteins are argued to be the optimal solutions to the information processing tasks they carry out. The analysis also suggests simpler predecessors to these languages, and provides fascinating clues about their origin. Obviously, a comprehensive unraveling of the puzzle of life would have a lot to say about what we may design or convert ourselves into.Comment: (v1) 33 pages, contributed chapter to "Quantum Aspects of Life", edited by D. Abbott, P. Davies and A. Pati, (v2) published version with some editin

arXiv.org e-Print Archive

Crossref

Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences

Author: Osipov Vladimir Al.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/01/2016
Field of study

The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of {\it two-fold de Bruijn sequences}, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied

arXiv.org e-Print Archive

Lund University Publications

Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences

Author: Minary Peter
Zenil Hector
Publication venue
Publication date: 16/10/2018
Field of study

We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure

arXiv.org e-Print Archive

Oxford University Research Archive