Search CORE

14 research outputs found

Fast algorithms for computing sequence distances by exhaustive substring composition

Author: A Apostolico
A Kolmogorov
A Lempel
Alberto Apostolico
B Blaidsell
B Hao
H Otu
I Ulitsky
J Na
J Qi
JV Helden
L Brillouin
LL Gatlin
M Höhl
M Li
Olgert Denas
P Ferragina
R Edgar
R von Mises
S Vinga
TJ Wu
TM Cover
Publication venue: BioMed Central
Publication date: 01/10/2008
Field of study

The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A reexamination of information theory-based methods for DNA-binding site identification

Author: A Kolb
AR Fernandez De Henestrosa
B Barash
CE Lawrence
CE Shannon
D Betel
D GuhaThakurta
DT Pride
EN Trifonov
ET Jaynes
ET Jaynes
G Robertson
G Thijs
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GJ Phillips
GZ Hertz
I Erill
Ivan Erill
J Rudnick
J van Helden
JJ Kohler
JM Heumann
JT Kim
JW Gibbs
K Gaston
K Uchida
KL Griffith
L Kozobay-Avraham
LJ Sun
LL Gatlin
LL Gatlin
M Abella
M Asayama
M Butala
M Schnarr
MC O'Neill
MC O'Neill
MC O'Neill
MH Zweig
Michael C O'Neill
ML Bulyk
MS Gelfand
N Baichoo
O Aparicio
O Huisman
OG Berg
OG Berg
P D'Haeseleer
PH von Hippel
PH von Hippel
R Brent
R Jauregui
R Munch
R Munch
R Osada
R Staden
RJ Redfield
RK Shultzaberger
RK Shultzaberger
RK Shultzaberger
RV Parbhane
S Krishna
S Kullback
ST Cole
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TL Bailey
TL Bailey
X Liu
Z Chen
Z Xiaoyue
Publication venue: BioMed Central
Publication date: 01/02/2009
Field of study

Abstract Background Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods. Results Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results. Conclusion We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Non-random pre-transcriptional evolution in HIV-1. A refutation of the foundational conditions for neutral evolution

Author: Bernardi G
Carlos Y Valenzuela
Crow JF
Drake JW
Drake JW
Drake JW
Feller W
Feller W
Freund JE
Gatlin LL
Gouet R
Gouet R
Hey J
Jern P
Jukes TH
Jukes TH
Karlin S
Kimura M
Kimura M
Kimura M
Kimura M
Kimura M
Kimura M
King JL
Kitrinos KM
Kreitman M
Kreitman M
Leigh-Brown AJ
Li WH
MacNeil A
Mani I
Mrazek J
Nei M
Ohta T
Reiher III WE
Serres PF
Spiegel MR
Sueoka N
Travers SAA
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Wright S
Yang Z
Zhang J
Publication venue: Sociedade Brasileira de Genética
Publication date: 01/01/2009
Field of study

The complete base sequence of HIV-1 virus and GP120 ENV gene were analyzed to establish their distance to the expected neutral random sequence. An especial methodology was devised to achieve this aim. Analyses included: a) proportion of dinucleotides (signatures); b) homogeneity in the distribution of dinucleotides and bases (isochores) by dividing both segments in ten and three sub-segments, respectively; c) probability of runs of bases and No-bases according to the Bose-Einstein distribution. The analyses showed a huge deviation from the random distribution expected from neutral evolution and neutral-neighbor influence of nucleotide sites. The most significant result is the tremendous lack of CG dinucleotides (p < 10-50 ), a selective trait of eukaryote and not of single stranded RNA virus genomes. Results not only refute neutral evolution and neutral neighbor influence, but also strongly indicate that any base at any nucleotide site correlates with all the viral genome or sub-segments. These results suggest that evolution of HIV-1 is pan-selective rather than neutral or nearly neutral

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Repositorio Académico de la Universidad de Chile

Mind the Gap: Transitions Between Concepts of Information in Varied Domains

Author: A Avramescu
A Reading
B Greene
B Hjørland
B Hjørland
B Piwowarski
BC Brookes
BC Patten
BR Frieden
C Baeyer Von
C Cole
C Zins
CA Pickover
CE Shannon
CE Shannon
D Baltimore
D Bawden
D Bawden
D Bawden
D Bawden
D Deutsch
D Madden
D Shaw
D Wallace
E Schrödinger
EB Parker
F Machlup
FI Dretske
G Auletta
G Tonioni
H Nyquist
HS Leff
HS Leff
I Cornelius
I Hargatti
I Müller
J Barwise
J Furner
J Gleick
J Maynard Smith
J Rowley
JD Barrow
JD Barrow
JL Kvanvig
JS Ottaviani
K Denbigh
K Jaffe
L Brillouin
L Dartnell
L Egghe
L Floridi
L Floridi
L Floridi
L Ma
L Qvortrup
L Smolin
LB Kier
LL Gatlin
M Buckland
M Frické
M Gell-Mann
M Karnani
M Polanyi
M Tribus
MJ Bates
N Wiener
N Wiener
NJ Belkin
NJ Belkin
P Atkins
P Byrne
P Checkland
P Davies
R Audi
R Capurro
R Landauer
RE Day
RM Hazen
RVL Hartley
S Goonatilake
S Lloyd
S Lloyd
S Lloyd
T Berners-Lee
T Stonier
T Stonier
T Stonier
TL Duncan
V Vedral
V Vedral
WH Zurek
Y Liu
YF Coadic Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The concept of 'information' in five different realms – technological, physical, biological, social and philosophical – is briefly examined. The 'gaps' between these conceptions are dis‐ cussed, and unifying frameworks of diverse nature, including those of Shannon/Wiener, Landauer, Stonier, Bates and Floridi, are examined. The value of attempting to bridge the gaps, while avoiding shallow analogies, is explained. With information physics gaining general acceptance, and biology gaining the status of an information science, it seems rational to look for links, relationships, analogies and even helpful metaphors between them and the library/information sciences. Prospects for doing so, involving concepts of complexity and emergence, are suggested

Crossref

City Research Online

Humanities Commons

Improving feature extraction performance of greedy network-growing algorithm by inverse euclidean distance

Author: Gatlin LL
Hinton GE
Kamimura R
Rumelhart DE
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

The Concept of Information in Biology

Author: A Weismann
C Shannon
C Shannon
D Dennett
E Szathmary
F Jacob
G. Haider
J Maynard
J Monod
K Sterelny
LL Gatlin
P Godfrey-Smith
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

Crossref

Information theory and the ethylene genetic network

Author: Arbib M
Chiu DKY
Gatlin LL
Grosse I
Helms V
Jones DS
José Díaz
José S. González-García
Kauffman SA
Khinchin AI
MacKay DJC
Quastler H
Taiz L
Thom R
Watson JD
Weaver W
Yockey H
Publication venue: Landes Bioscience
Publication date
Field of study

The original aim of the Information Theory (IT) was to solve a purely technical problem: to increase the performance of communication systems, which are constantly affected by interferences that diminish the quality of the transmitted information. That is, the theory deals only with the problem of transmitting with the maximal precision the symbols constituting a message. In Shannon's theory messages are characterized only by their probabilities, regardless of their value or meaning. As for its present day status, it is generally acknowledged that Information Theory has solid mathematical foundations and has fruitful strong links with Physics in both theoretical and experimental areas. However, many applications of Information Theory to Biology are limited to using it as a technical tool to analyze biopolymers, such as DNA, RNA or protein sequences. The main point of discussion about the applicability of IT to explain the information flow in biological systems is that in a classic communication channel, the symbols that conform the coded message are transmitted one by one in an independent form through a noisy communication channel, and noise can alter each of the symbols, distorting the message; in contrast, in a genetic communication channel the coded messages are not transmitted in the form of symbols but signaling cascades transmit them. Consequently, the information flow from the emitter to the effector is due to a series of coupled physicochemical processes that must ensure the accurate transmission of the message. In this review we discussed a novel proposal to overcome this difficulty, which consists of the modeling of gene expression with a stochastic approach that allows Shannon entropy (H) to be directly used to measure the amount of uncertainty that the genetic machinery has in relation to the correct decoding of a message transmitted into the nucleus by a signaling pathway. From the value of H we can define a function I that measures the amount of information content in the input message that the cell's genetic machinery is processing during a given time interval. Furthermore, combining Information Theory with the frequency response analysis of dynamical systems we can examine the cell's genetic response to input signals with varying frequencies, amplitude and form, in order to determine if the cell can distinguish between different regimes of information flow from the environment. In the particular case of the ethylene signaling pathway, the amount of information managed by the root cell of Arabidopsis can be correlated with the frequency of the input signal. The ethylene signaling pathway cuts off very low and very high frequencies, allowing a window of frequency response in which the nucleus reads the incoming message as a varying input. Outside of this window the nucleus reads the input message as an approximately non-varying one. This frequency response analysis is also useful to estimate the rate of information transfer during the transport of each new ERF1 molecule into the nucleus. Additionally, application of Information Theory to analysis of the flow of information in the ethylene signaling pathway provides a deeper insight in the form in which the transition between auxin and ethylene hormonal activity occurs during a circadian cycle. An ambitious goal for the future would be to use Information Theory as a theoretical foundation for a suitable model of the information flow that runs at each level and through all levels of biological organization

Crossref

PubMed Central