21,352 research outputs found
A methodology for determining amino-acid substitution matrices from set covers
We introduce a new methodology for the determination of amino-acid
substitution matrices for use in the alignment of proteins. The new methodology
is based on a pre-existing set cover on the set of residues and on the
undirected graph that describes residue exchangeability given the set cover.
For fixed functional forms indicating how to obtain edge weights from the set
cover and, after that, substitution-matrix elements from weighted distances on
the graph, the resulting substitution matrix can be checked for performance
against some known set of reference alignments and for given gap costs. Finding
the appropriate functional forms and gap costs can then be formulated as an
optimization problem that seeks to maximize the performance of the substitution
matrix on the reference alignment set. We give computational results on the
BAliBASE suite using a genetic algorithm for optimization. Our results indicate
that it is possible to obtain substitution matrices whose performance is either
comparable to or surpasses that of several others, depending on the particular
scenario under consideration
An Alternative Model of Amino Acid Replacement
The observed correlations between pairs of homologous protein sequences are
typically explained in terms of a Markovian dynamic of amino acid substitution.
This model assumes that every location on the protein sequence has the same
background distribution of amino acids, an assumption that is incompatible with
the observed heterogeneity of protein amino acid profiles and with the success
of profile multiple sequence alignment. We propose an alternative model of
amino acid replacement during protein evolution based upon the assumption that
the variation of the amino acid background distribution from one residue to the
next is sufficient to explain the observed sequence correlations of homologs.
The resulting dynamical model of independent replacements drawn from
heterogeneous backgrounds is simple and consistent, and provides a unified
homology match score for sequence-sequence, sequence-profile and
profile-profile alignment.Comment: Minor improvements. Added figure and reference
Regulatory motif discovery using a population clustering evolutionary algorithm
This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences
From quantum groups to genetic mutations
In the framework of the crystal basis model of the genetic code, where each
codon is assigned to an irreducible representation of , single base mutation matrices are introduced. The strength of the
mutation is assumed to depend on the "distance" between the codons. Preliminary
general predictions of the model are compared with experimental data, with a
satisfactory agreement.Comment: 11 pages, Talk at Int.Conf."Symmetries in Science XIII", Bregenz July
20-24 200
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
Adaptive evolution of transcription factor binding sites
The regulation of a gene depends on the binding of transcription factors to
specific sites located in the regulatory region of the gene. The generation of
these binding sites and of cooperativity between them are essential building
blocks in the evolution of complex regulatory networks. We study a theoretical
model for the sequence evolution of binding sites by point mutations. The
approach is based on biophysical models for the binding of transcription
factors to DNA. Hence we derive empirically grounded fitness landscapes, which
enter a population genetics model including mutations, genetic drift, and
selection. We show that the selection for factor binding generically leads to
specific correlations between nucleotide frequencies at different positions of
a binding site. We demonstrate the possibility of rapid adaptive evolution
generating a new binding site for a given transcription factor by point
mutations. The evolutionary time required is estimated in terms of the neutral
(background) mutation rate, the selection coefficient, and the effective
population size. The efficiency of binding site formation is seen to depend on
two joint conditions: the binding site motif must be short enough and the
promoter region must be long enough. These constraints on promoter architecture
are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive
evolution of genetic switches and of signal integration through binding
cooperativity between different sites. Experimental tests of this picture
involving the statistics of polymorphisms and phylogenies of sites are
discussed.Comment: published versio
Dynamics of transcription factor binding site evolution
Evolution of gene regulation is crucial for our understanding of the
phenotypic differences between species, populations and individuals.
Sequence-specific binding of transcription factors to the regulatory regions on
the DNA is a key regulatory mechanism that determines gene expression and hence
heritable phenotypic variation. We use a biophysical model for directional
selection on gene expression to estimate the rates of gain and loss of
transcription factor binding sites (TFBS) in finite populations under both
point and insertion/deletion mutations. Our results show that these rates are
typically slow for a single TFBS in an isolated DNA region, unless the
selection is extremely strong. These rates decrease drastically with increasing
TFBS length or increasingly specific protein-DNA interactions, making the
evolution of sites longer than ~10 bp unlikely on typical eukaryotic speciation
timescales. Similarly, evolution converges to the stationary distribution of
binding sequences very slowly, making the equilibrium assumption questionable.
The availability of longer regulatory sequences in which multiple binding sites
can evolve simultaneously, the presence of "pre-sites" or partially decayed old
sites in the initial sequence, and biophysical cooperativity between
transcription factors, can all facilitate gain of TFBS and reconcile
theoretical calculations with timescales inferred from comparative genetics.Comment: 28 pages, 15 figure
- …