27,636 research outputs found
Revealing mammalian evolutionary relationships by comparative analysis of gene clusters
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events
Accelerated Evolution of the ASPM Gene Controlling Brain Size Begins Prior to Human Brain Expansion
Primary microcephaly (MCPH) is a neurodevelopmental disorder characterized by global reduction in cerebral cortical volume. The microcephalic brain has a volume comparable to that of early hominids, raising the possibility that some MCPH genes may have been evolutionary targets in the expansion of the cerebral cortex in mammals and especially primates. Mutations in ASPM, which encodes the human homologue of a fly protein essential for spindle function, are the most common known cause of MCPH. Here we have isolated large genomic clones containing the complete ASPM gene, including promoter regions and introns, from chimpanzee, gorilla, orangutan, and rhesus macaque by transformation-associated recombination cloning in yeast. We have sequenced these clones and show that whereas much of the sequence of ASPM is substantially conserved among primates, specific segments are subject to high Ka/Ks ratios (nonsynonymous/synonymous DNA changes) consistent with strong positive selection for evolutionary change. The ASPM gene sequence shows accelerated evolution in the African hominoid clade, and this precedes hominid brain expansion by several million years. Gorilla and human lineages show particularly accelerated evolution in the IQ domain of ASPM. Moreover, ASPM regions under positive selection in primates are also the most highly diverged regions between primates and nonprimate mammals. We report the first direct application of TAR cloning technology to the study of human evolution. Our data suggest that evolutionary selection of specific segments of the ASPM sequence strongly relates to differences in cerebral cortical size
Independent large scale duplications in multiple M. tuberculosis lineages overlapping the same genomic region
Mycobacterium tuberculosis, the causative agent of most human tuberculosis, infects one third of the world's population and kills an estimated 1.7 million people a year. With the world-wide emergence of drug resistance, and the finding of more functional genetic diversity than previously expected, there is a renewed interest in understanding the forces driving genome evolution of this important pathogen. Genetic diversity in M. tuberculosis is dominated by single nucleotide polymorphisms and small scale gene deletion, with little or no evidence for large scale genome rearrangements seen in other bacteria. Recently, a single report described a large scale genome duplication that was suggested to be specific to the Beijing lineage. We report here multiple independent large-scale duplications of the same genomic region of M. tuberculosis detected through whole-genome sequencing. The duplications occur in strains belonging to both M. tuberculosis lineage 2 and 4, and are thus not limited to Beijing strains. The duplications occur in both drug-resistant and drug susceptible strains. The duplicated regions also have substantially different boundaries in different strains, indicating different originating duplication events. We further identify a smaller segmental duplication of a different genomic region of a lab strain of H37Rv. The presence of multiple independent duplications of the same genomic region suggests either instability in this region, a selective advantage conferred by the duplication, or both. The identified duplications suggest that large-scale gene duplication may be more common in M. tuberculosis than previously considere
Content based network model with duplication and divergence
We construct a minimal content-based realization of the duplication and
divergence model of genomic networks introduced by Wagner [A. Wagner, Proc.
Natl. Acad. Sci. {\bf 91}, 4387 (1994)] and investigate the scaling properties
of the directed degree distribution and clustering coefficient. We find that
the content based network exhibits crossover between two scaling regimes, with
log-periodic oscillations for large degrees. These features are not present in
the original gene duplication model, but inherent in the content based model of
Balcan and Erzan. The scaling exponents and
of the Balcan-Erzan model turn out to be robust under duplication and point
mutations, but get modified in the presence of splitting and merging of
strings. The clustering coefficient as a function of the degree, , is
found, for the Balcan-Erzan model, to behave in a way qualitatively similar to
the out-degree distribution, however with a very small exponent and an envelope for the oscillatory part, which is essentially
flat, thus . Under duplication and mutations including splitting
and merging of strings, is found to decay exponentially.Comment: 12 pages, 6 figure
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Formation of regulatory modules by local sequence duplication
Turnover of regulatory sequence and function is an important part of
molecular evolution. But what are the modes of sequence evolution leading to
rapid formation and loss of regulatory sites? Here, we show that a large
fraction of neighboring transcription factor binding sites in the fly genome
have formed from a common sequence origin by local duplications. This mode of
evolution is found to produce regulatory information: duplications can seed new
sites in the neighborhood of existing sites. Duplicate seeds evolve
subsequently by point mutations, often towards binding a different factor than
their ancestral neighbor sites. These results are based on a statistical
analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome,
and a comparison set of intergenic regulatory sequence in Saccharomyces
cerevisiae. In fly regulatory modules, pairs of binding sites show
significantly enhanced sequence similarity up to distances of about 50 bp. We
analyze these data in terms of an evolutionary model with two distinct modes of
site formation: (i) evolution from independent sequence origin and (ii)
divergent evolution following duplication of a common ancestor sequence. Our
results suggest that pervasive formation of binding sites by local sequence
duplications distinguishes the complex regulatory architecture of higher
eukaryotes from the simpler architecture of unicellular organisms
- …