63,432 research outputs found
Targeting determinants of dosage compensation in Drosophila
The dosage compensation complex (DCC) in Drosophila melanogaster is responsible for up-regulating transcription from the single male X chromosome to equal the transcription from the two X chromosomes in females. Visualization of the DCC, a large ribonucleoprotein complex, on male larval polytene chromosomes reveals that the complex binds selectively to many interbands on the X chromosome. The targeting of the DCC is thought to be in part determined by DNA sequences that are enriched on the X. So far, lack of knowledge about DCC binding sites has prevented the identification of sequence determinants. Only three binding sites have been identified to date, but analysis of their DNA sequence did not allow the prediction of further binding sites. We have used chromatin immunoprecipitation to identify a number of new DCC binding fragments and characterized them in vivo by visualizing DCC binding to autosomal insertions of these fragments, and we have demonstrated that they possess a wide range of potential to recruit the DCC. By varying the in vivo concentration of the DCC, we provide evidence that this range of recruitment potential is due to differences in affinity of the complex to these sites. We were also able to establish that DCC binding to ectopic high-affinity sites can allow nearby low-affinity sites to recruit the complex. Using the sequences of the newly identified and previously characterized binding fragments, we have uncovered a number of short sequence motifs, which in combination may contribute to DCC recruitment. Our findings suggest that the DCC is recruited to the X via a number of binding sites of decreasing affinities, and that the presence of high-and moderate-affinity sites on the X may ensure that lower-affinity sites are occupied in a context-dependent manner. Our bioinformatics analysis suggests that DCC binding sites may be composed of variable combinations of degenerate motifs
Conspiracy in bacterial genomes
The rank ordered distribution of the codon usage frequencies for 123
bacteriae is best fitted by a three parameters function that is the sum of a
constant, an exponential and a linear term in the rank n. The parameters depend
(two parabolically) from the total GC content. The rank ordered distribution of
the amino acids is fitted by a straight line. The Shannon entropy computed over
all the codons is well fitted by a parabola in the GC content, while the
partial entropies computed over subsets of the codons show peculiar different
behavior, exhibiting therefore a first conspiracy effect. Moreover the sum of
the codon usage frequencies over particular sets, e.g. with C and A
(respectively G and U) as i-th nucleotide, shows a clear linear dependence from
the GC content, exhibiting another conspiracy effect.Comment: revised version: introduction and conclusion enhanced, references
added, figures added, some tables remove
Application of regulatory sequence analysis and metabolic network analysis to the interpretation of gene expression data
We present two complementary approaches for the interpretation of clusters of
co-regulated genes, such as those obtained from DNA chips and related methods.
Starting from a cluster of genes with similar expression profiles, two basic
questions can be asked:
1. Which mechanism is responsible for the coordinated transcriptional response
of the genes? This question is approached by extracting motifs that are shared
between the upstream sequences of these genes. The motifs extracted are putative
cis-acting regulatory elements.
2. What is the physiological meaning for the cell to express together these
genes? One way to answer the question is to search for potential metabolic
pathways that could be catalyzed by the products of the genes. This can be
done by selecting the genes from the cluster that code for enzymes, and trying
to assemble the catalyzed reactions to form metabolic pathways.
We present tools to answer these two questions, and we illustrate their use with
selected examples in the yeast Saccharomyces cerevisiae. The tools are available
on the web (http://ucmb.ulb.ac.be/bioinformatics/rsa-tools/;
http://www.ebi.ac.uk/research/pfbp/; http://www.soi.city.ac.uk/~msch/)
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures
We investigate on a possible way to connect the presence of Low-Complexity
Sequences (LCS) in DNA genomes and the nonstationary properties of base
correlations. Under the hypothesis that these variations signal a change in the
DNA function, we use a new technique, called Non-Stationarity Entropic Index
(NSEI) method, and we prove that this technique is an efficient way to detect
functional changes with respect to a random baseline. The remarkable aspect is
that NSEI does not imply any training data or fitting parameter, the only
arbitrarity being the choice of a marker in the sequence. We make this choice
on the basis of biological information about LCS distributions in genomes. We
show that there exists a correlation between changing the amount in LCS and the
ratio of long- to short-range correlation
- …