1,341 research outputs found

    Multi-platform discovery of haplotype-resolved structural variation in human genomes

    Get PDF

    Accelerated Evolution of the ASPM Gene Controlling Brain Size Begins Prior to Human Brain Expansion

    Get PDF
    Primary microcephaly (MCPH) is a neurodevelopmental disorder characterized by global reduction in cerebral cortical volume. The microcephalic brain has a volume comparable to that of early hominids, raising the possibility that some MCPH genes may have been evolutionary targets in the expansion of the cerebral cortex in mammals and especially primates. Mutations in ASPM, which encodes the human homologue of a fly protein essential for spindle function, are the most common known cause of MCPH. Here we have isolated large genomic clones containing the complete ASPM gene, including promoter regions and introns, from chimpanzee, gorilla, orangutan, and rhesus macaque by transformation-associated recombination cloning in yeast. We have sequenced these clones and show that whereas much of the sequence of ASPM is substantially conserved among primates, specific segments are subject to high Ka/Ks ratios (nonsynonymous/synonymous DNA changes) consistent with strong positive selection for evolutionary change. The ASPM gene sequence shows accelerated evolution in the African hominoid clade, and this precedes hominid brain expansion by several million years. Gorilla and human lineages show particularly accelerated evolution in the IQ domain of ASPM. Moreover, ASPM regions under positive selection in primates are also the most highly diverged regions between primates and nonprimate mammals. We report the first direct application of TAR cloning technology to the study of human evolution. Our data suggest that evolutionary selection of specific segments of the ASPM sequence strongly relates to differences in cerebral cortical size

    Mechanisms and impact of alternative transposition-induced segmental duplications

    Get PDF
    Segmental duplications are prevalent in both plant and animal genomes, and have played important roles in genome evolution. The focus of my project is to understand the transposition-mediated mechanisms that lead to the formation of segmental duplications, and the immediate impact of recently generated large (up to 14.6 Mb) tandem duplications in maize. We applied a variety of genetic, molecular, statistical and bioinformatics approaches, including genetic screening, PCR, Southern blotting, qRT-PCR, microarray, mRNA-sequencing, small RNA-sequencing, and a self-developed program (STRAND: Search for Transposon-Induced Tandem Direct Duplications) to study these questions. We discovered new genome rearrangement mechanisms, including transposition of paired DNA transposon termini that can generate tandem direct duplications (TDD) and novel structures termed Composite Insertions. Genomic study revealed that these mechanisms have played an important role in generating TDD in 8 of 22 examined plant genomes. We also found a significant dosage-dependent effect of a 14.6 Mb duplication on phenotypic variation, and expression of mRNA and small RNA transcripts. This work expands our current knowledge of how DNA transposons contribute to rapid genome expansion, extends our understanding of the significance of DNA transposons in altering genome structure, and provides new insight into the transcriptional expression and phenotypic effect of a specific and recent maize duplication

    Acquired Chromosomal Abnormalities and Their Potential Formation Mechanisms in Solid Tumours

    Get PDF
    Solid tumours include numerous malign or relatively less benign types of carcinomas and sarcomas. Acquired chromosomal abnormalities in solid tumours are hallmarks of gene deregulation and genome instability. Chromosomal abnormalities are mainly classified into two groups: structural and numerical alterations. Structural rearrangements involve chromosomal aberrations such as deletion, translocation, duplication, inversion and gene amplification, whereas numerical abnormalities result in aneuploidy or polyploidy. Structural chromosome abnormalities can arise from non-allelic homologous recombination (NAHR), non-homologous end joining (NHEJ) and fork stalling and template switching (FoSTeS). Numerical abnormalities can form through various errors in the mitotic spindle checkpoint and some cellular processes during mitosis. This chapter reviews acquired structural and numerical chromosomal abnormalities in solid tumours and presents potential formation mechanisms. In this chapter, the relationship between long inverted repeats (LIRs) and MYCN amplification in neuroblastoma was also investigated. The distribution of LIRs was determined at chromosome 2p25.3–2p24.3, using inverted repeat finder (IRF) software. LIRs were also identified at boundaries of amplicons in 14 neuroblastoma cell lines and 42 solid tumours, involving MYCN amplification. Statistical analysis showed a significant association between LIRs and MYCN amplification loci. Present data provide important insights into MYCN amplification mechanism. Therefore, a new model mechanism for formation of the MYCN amplification is proposed at the end of the chapter

    Polymorphic segmental duplication in the nematode Caenorhabditis elegans

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The nematode <it>Caenorhabditis elegans </it>was the first multicellular organism to have its genome fully sequenced. Over the last 10 years since the original publication in 1998, the <it>C. elegans </it>genome has been scrutinized and the last gaps were filled in November 2002, which present a unique opportunity for examining genome-wide segmental duplications.</p> <p>Results</p> <p>Here, we performed analysis of the <it>C. elegans </it>genome in search for segmental duplications using a new tool–OrthoCluster–we have recently developed. We detected 3,484 duplicated segments–duplicons–ranging in size from 234 bp to 108 Kb. The largest pair of duplicons, 108 kb in length located on the left arm of <it>Chromosome V</it>, was further characterized. They are nearly identical at the DNA level (99.7% identity) and each duplicon contains 26 putative protein coding genes. Genotyping of 76 wild-type strains obtained from different labs in the <it>C. elegans </it>community revealed that not all strains contain this duplication. In fact, only 29 strains carry this large segmental duplication, suggesting a very recent duplication event in the <it>C. elegans </it>genome.</p> <p>Conclusion</p> <p>This report represents the first demonstration that the <it>C. elegans </it>laboratory wild-type N2 strains has acquired large-scale differences.</p

    Bias of Selection on Human Copy-Number Variants

    Get PDF
    Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population

    Multi-platform​ ​ discovery​ ​ of​ ​ haplotype-resolved structural​ ​ variation​ ​ in​ ​ human​ ​ genomes

    Get PDF
    The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥50 bp) per human genome, a seven fold increase in structural variation compared to previous reports, including from the 1000 Genomes Project. We also discovered 156 inversions per genome, most of which previously escaped detection, as well as large unbalanced chromosomal rearrangements. We provide near-complete, haplotype-resolved structural variation for three genomes that can now be used as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies

    inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data

    Get PDF
    Mining genetic variation from personal genomes is a crucial step towards investigating the relationship between genotype and phenotype. However, compared to the detection of SNPs and small indels, characterizing large and particularly complex structural variation is much more difficult and less intuitive. In this article, we present a new scheme (inGAP-sv) to detect and visualize structural variation from paired-end mapping data. Under this scheme, abnormally mapped read pairs are clustered based on the location of a gap signature. Several important features, including local depth of coverage, mapping quality and associated tandem repeat, are used to evaluate the quality of predicted structural variation. Compared with other approaches, it can detect many more large insertions and complex variants with lower false discovery rate. Moreover, inGAP-sv, written in Java programming language, provides a user-friendly interface and can be performed in multiple operating systems. It can be freely accessed at http://ingap.sourceforge.net/
    corecore