50,601 research outputs found
Genomic regions with distinct genomic distance conservation in vertebrate genomes
<p>Abstract</p> <p>Background</p> <p>A number of vertebrate highly conserved elements (HCEs) have been detected and their genomic interval distances have been reported to be more conserved than protein coding genes among mammalian genomes. A characteristic of the human – non-mammalian comparisons is a bimodal distribution of relative distance difference of conserved consecutive HCE pairs; and it is difficult to attribute such profile to a random assortment. We therefore undertook an analysis of the human genomic regions confined by consecutive HCE pairs common to eight genomes (human, mouse, rat, chicken, frog, zebrafish, tetradon and fugu).</p> <p>Results</p> <p>Among HCE pairs, we found that some consistently preserve highly conserved interval distance among genomes while others have relatively low distance conservation. Using a partition method, we detected two groups of inter-HCE regions (IHRs) with distinct distance conservation pattern in vertebrate genomes: IHR1s that are bordered by HCE pairs with relative small distance variation, and IHR2s with larger distance difference values. Compared to random background, annotated repeat sequences are significantly less frequent in IHR1s than IHR2s, which reflects a correlation between repeat sequences and the length expansion of IHRs. Both groups of IHRs are unexpectedly enriched in human indel (i.e. insertion and deletion) polymorphism-variations than random background. The correlation between the percentage of conserved sequence and human IHR length was stronger for IHR1 than IHR2. Both groups of IHRs are significantly enriched for CpG islands.</p> <p>Conclusion</p> <p>The data suggest that subsets of HCE pairs may undergo different evolutionary paths in light of their genomic distance conservation, and that sets of genomic regions pertain to HCEs, as well as the region in which HCEs reside, should be treated as integrated domains.</p
Coexistence of different base periodicities in prokaryotic genomes as related to DNA curvature, supercoiling, and transcription
We analyzed the periodic patterns in E. coli promoters and compared the
distributions of the corresponding patterns in promoters and in the complete
genome to elucidate their function. Except the three-base periodicity,
coincident with that in the coding regions and growing stronger in the region
downstream from the transcriptions start (TS), all other salient periodicities
are peaked upstream of TS. We found that helical periodicities with the lengths
about B-helix pitch ~10.2-10.5 bp and A-helix pitch ~10.8-11.1 bp coexist in
the genomic sequences. We mapped the distributions of stretches with A-, B-,
and Z- like DNA periodicities onto E.coli genome. All three periodicities tend
to concentrate within non-coding regions when their intensity becomes stronger
and prevail in the promoter sequences. The comparison with available
experimental data indicates that promoters with the most pronounced
periodicities may be related to the supercoiling-sensitive genes.Comment: 23 pages, 6 figures, 2 table
Limited Lifespan of Fragile Regions in Mammalian Evolution
An important question in genome evolution is whether there exist fragile
regions (rearrangement hotspots) where chromosomal rearrangements are happening
over and over again. Although nearly all recent studies supported the existence
of fragile regions in mammalian genomes, the most comprehensive phylogenomic
study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some
doubts about their existence. We demonstrate that fragile regions are subject
to a "birth and death" process, implying that fragility has limited
evolutionary lifespan. This finding implies that fragile regions migrate to
different locations in different mammals, explaining why there exist only a few
chromosomal breakpoints shared between different lineages. The birth and death
of fragile regions phenomenon reinforces the hypothesis that rearrangements are
promoted by matching segmental duplications and suggests putative locations of
the currently active fragile regions in the human genome
Kerfuffle: a web tool for multi-species gene colocalization analysis
The evolutionary pressures that underlie the large-scale functional
organization of the genome are not well understood in eukaryotes. Recent
evidence suggests that functionally similar genes may colocalize (cluster) in
the eukaryotic genome, suggesting the role of chromatin-level gene regulation
in shaping the physical distribution of coordinated genes. However, few of the
bioinformatic tools currently available allow for a systematic study of gene
colocalization across several, evolutionarily distant species. Kerfuffle is a
web tool designed to help discover, visualize, and quantify the physical
organization of genomes by identifying significant gene colocalization and
conservation across the assembled genomes of available species (currently up to
47, from humans to worms). Kerfuffle only requires the user to specify a list
of human genes and the names of other species of interest. Without further
input from the user, the software queries the e!Ensembl BioMart server to
obtain positional information and discovers homology relations in all genes and
species specified. Using this information, Kerfuffle performs a multi-species
clustering analysis, presents downloadable lists of clustered genes, performs
Monte Carlo statistical significance calculations, estimates how conserved gene
clusters are across species, plots histograms and interactive graphs, allows
users to save their queries, and generates a downloadable visualization of the
clusters using the Circos software. These analyses may be used to further
explore the functional roles of gene clusters by interrogating the enriched
molecular pathways associated with each cluster.Comment: BMC Bioinformatics, In pres
How to infer relative fitness from a sample of genomic sequences
Mounting evidence suggests that natural populations can harbor extensive
fitness diversity with numerous genomic loci under selection. It is also known
that genealogical trees for populations under selection are quantifiably
different from those expected under neutral evolution and described
statistically by Kingman's coalescent. While differences in the statistical
structure of genealogies have long been used as a test for the presence of
selection, the full extent of the information that they contain has not been
exploited. Here we shall demonstrate that the shape of the reconstructed
genealogical tree for a moderately large number of random genomic samples taken
from a fitness diverse, but otherwise unstructured asexual population can be
used to predict the relative fitness of individuals within the sample. To
achieve this we define a heuristic algorithm, which we test in silico using
simulations of a Wright-Fisher model for a realistic range of mutation rates
and selection strength. Our inferred fitness ranking is based on a linear
discriminator which identifies rapidly coalescing lineages in the reconstructed
tree. Inferred fitness ranking correlates strongly with actual fitness, with a
genome in the top 10% ranked being in the top 20% fittest with false discovery
rate of 0.1-0.3 depending on the mutation/selection parameters. The ranking
also enables to predict the genotypes that future populations inherit from the
present one. While the inference accuracy increases monotonically with sample
size, samples of 200 nearly saturate the performance. We propose that our
approach can be used for inferring relative fitness of genomes obtained in
single-cell sequencing of tumors and in monitoring viral outbreaks
Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication
Horseshoe crabs are marine arthropods with a fossil record extending back
approximately 450 million years. They exhibit remarkable morphological
stability over their long evolutionary history, retaining a number of ancestral
arthropod traits, and are often cited as examples of "living fossils." As
arthropods, they belong to the Ecdysozoa}, an ancient super-phylum whose
sequenced genomes (including insects and nematodes) have thus far shown more
divergence from the ancestral pattern of eumetazoan genome organization than
cnidarians, deuterostomes, and lophotrochozoans. However, much of ecdysozoan
diversity remains unrepresented in comparative genomic analyses. Here we use a
new strategy of combined de novo assembly and genetic mapping to examine the
chromosome-scale genome organization of the Atlantic horseshoe crab Limulus
polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by
sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their
parents at a mean redundancy of 1.1x per sample. The map includes 84,307
sequence markers and 5,775 candidate conserved protein coding genes. Comparison
to other metazoan genomes shows that the L. polyphemus genome preserves
ancestral bilaterian linkage groups, and that a common ancestor of modern
horseshoe crabs underwent one or more ancient whole genome duplications (WGDs)
~ 300 MYA, followed by extensive chromosome fusion
- …