Search CORE

53 research outputs found

Local Renyi entropic profiles of DNA sequences

Author: A Bouvier
A Vanet
B Haubold
B Haubold
B Schoelkopf
BY Liao
C Dufraigne
D Chu
D Dubnau
D Gusfield
D Holste
D Holste
E Parzen
FR Blattner
G Bejerano
H Herzel
HJ Jeffrey
HS Koo
J Vilo
JD Helmann
JL Oliver
JM Freeman
Jonas S Almeida
JS Almeida
JS Almeida
K Arakawa
LM Ettwiller
LY Chen
M Bakkali
M Crochemore
M Vandenbogaert
OG Troyanskaya
P Buhlmann
P Deschavanne
P Tino
R Redon
RW Jernigan
S Karlin
S Karlin
S Robin
S Sourice
S Vinga
S Vinga
S Vinga
SL Salzberg
Susana Vinga
T Davidsen
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at <url>http://kdbio.inesc-id.pt/~svinga/ep/</url>. Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositório da Universidade Nova de Lisboa

On the entropy of protein families

Author: Barton John
Chakraborty Arup
Cocco Simona
Jacquin Hugo
Monasson Rémi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/12/2015
Field of study

Proteins are essential components of living systems, capable of performing a huge variety of tasks at the molecular level, such as recognition, signalling, copy, transport, ... The protein sequences realizing a given function may largely vary across organisms, giving rise to a protein family. Here, we estimate the entropy of those families based on different approaches, including Hidden Markov Models used for protein databases and inferred statistical models reproducing the low-order (1-and 2-point) statistics of multi-sequence alignments. We also compute the entropic cost, that is, the loss in entropy resulting from a constraint acting on the protein, such as the fixation of one particular amino-acid on a specific site, and relate this notion to the escape probability of the HIV virus. The case of lattice proteins, for which the entropy can be computed exactly, allows us to provide another illustration of the concept of cost, due to the competition of different folds. The relevance of the entropy in relation to directed evolution experiments is stressed.Comment: to appear in Journal of Statistical Physic

arXiv.org e-Print Archive

DSpace@MIT

Hal-Diderot

Information profiles for DNA pattern discovery

Author: Ferreira Paulo J. S. G.
Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 19/01/2014
Field of study

Finite-context modeling is a powerful tool for compressing and hence for representing DNA sequences. We describe an algorithm to detect genomic regularities, within a blind discovery strategy. The algorithm uses information profiles built using suitable combinations of finite-context models. We used the genome of the fission yeast Schizosaccharomyces pombe strain 972 h- for illustration, unveilling locations of low information content, which are usually associated with DNA regions of potential biological interest.Comment: Full version of DCC 2014 paper "Information profiles for DNA pattern discovery

arXiv.org e-Print Archive

Crossref

On the comparison of regulatory sequences with multiple resolution Entropic Profiles

Author: ANTONELLO MORRIS
COMIN MATTEO
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Enhancers are stretches of DNA (100-1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques

Springer - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Biological sequences as pictures – a generic two dimensional solution for iterated maps

Author: A Andreeva
A Fiser
C Dutta
HJ Jeffrey
HJ Jeffrey
J Giles
J Joseph
J Schwacke
JH Choi
JL Oliver
Jonas S Almeida
JS Almeida
JS Almeida
JS Almeida
JS Almeida
KA Hill
KP Pleissner
LK Gallos
P Cenac
P Deschavanne
P Tino
PJ Deschavanne
S Basu
S Vinga
S Vinga
S Vinga
S Vinga
Susana Vinga
W Fu
W Fu
ZB Wu
ZG Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Representing symbolic sequences graphically using iterated maps has enjoyed an enduring popularity since it was first proposed in Jeffrey 1990 as chaos game representation (CGR). The usefulness of this representation goes beyond the convenience of a scale independent representation. It provides a variable memory length representation of transition. This includes the representation of succession with non-integer order, which comes with the promise of generalizing Markovian formalisms. The original proposal targeted genomic sequences only but since then several generalizations have been proposed, many specifically designed to handle protein data. Results The challenge of a general solution is that of deriving a bijective transformation of symbolic sequences into bi-dimensional planes. More specifically, it requires the regular fractal nesting of polygons. A first attempt at a general solution was proposed by Fiser 1994 by using non-overlapping circles that contain the polygons. This was used as a starting point to identify a more efficient solution where the encapsulating circles can overlap without the same happening for the sequence maps which are circumscribed to fractal polygon domains. Conclusion We identified the optimal inscribed packing solution for iterated maps of any Biological sequence, indeed of any symbolic sequence. The new solution maintains the prized bijective mapping property and includes the Sierpinski triangle and the CGR square as particular solutions of the more encompassing formulation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositório da Universidade Nova de Lisboa

DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters

Author: Finan Christopher
Jones Susan
Newport Melanie
Thomas David
Publication venue: 'Elsevier BV'
Publication date: 01/10/2015
Field of study

BACKGROUND The complexity of DNA can be quantified using estimates of entropy. Variation in DNA complexity is expected between the promoters of genes with different transcriptional mechanisms; namely housekeeping (HK) and tissue specific (TS). The former are transcribed constitutively to maintain general cellular functions, and the latter are transcribed in restricted tissue and cells types for specific molecular events. It is known that promoter features in the human genome are related to tissue specificity, but this has been difficult to quantify on a genomic scale. If entropy effectively quantifies DNA complexity, calculating the entropies of HK and TS gene promoters as profiles may reveal significant differences. RESULTS Entropy profiles were calculated for a total dataset of 12,003 human gene promoters and for 501 housekeeping (HK) and 587 tissue specific (TS) human gene promoters. The mean profiles show the TS promoters have a significantly lower entropy (p<2.2e-16) than HK gene promoters. The entropy distributions for the 3 datasets show that promoter entropies could be used to identify novel HK genes. CONCLUSION Functional features comprise DNA sequence patterns that are non-random and hence they have lower entropies. The lower entropy of TS gene promoters can be explained by a higher density of positive and negative regulatory elements, required for genes with complex spatial and temporary expression

University of Dundee Online Publications

Sussex Research Online

Informational laws of genome structures

Author: Bonnici Vincenzo
Manca Vincenzo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k\u2009=\u2009lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined

Archivio istituzionale della Ricerca - Università degli Studi di Parma

PubMed Central

Catalogo dei prodotti della ricerca

New Robust Similarity Measures Derived from Entropic Profiles

Author
Publication venue
Publication date
Field of study

Enhancers are stretches of DNA that play a major role in development geneexpression. They contain short DNA motifs so that their classification can be addressed by alignment-free compositional approaches. The contributions of this work are the development of the statistical properties of entropic profiles and the definition of new similarity measures derived from them. Experiments on both simulated and real enhancers reveal that the multi-resolution property enhances the similarity score

Padua Thesis and Dissertation Archive

Decoding genomic information

Author: Franco G.
Manca V.
Publication venue: Stepney, Susan, Rasmussen, Steen, Amos, Martyn (Eds.)
Publication date: 01/01/2018
Field of study

Our work here outlines and follows some trends of research which analyze and interpret (i.e., decode) genomic information, by assuming the genome to be a book encrypted in an unknown language. This analysis is performed by sequence alignment-free methods, based on information theoretical concepts, in order to convert the genomic information into a comprehensible mathematical form and understand its complexity

Crossref

Catalogo dei prodotti della ricerca