457 research outputs found

    Contributions of Zea mays subspecies mexicana haplotypes to modern maize

    Get PDF
    Maize was domesticated from lowland teosinte (Zea mays ssp. parviglumis), but the contribution of highland teosinte (Zea mays ssp. mexicana, hereafter mexicana) to modern maize is not clear. Here, two genomes for Mo17 (a modern maize inbred) and mexicana are assembled using a meta-assembly strategy after sequencing of 10 lines derived from a maize-teosinte cross. Comparative analyses reveal a high level of diversity between Mo17, B73, and mexicana, including three Mb-size structural rearrangements. The maize spontaneous mutation rate is estimated to be 2.17 x 10(-8) ~3.87 x 10(-8) per site per generation with a nonrandom distribution across the genome. A higher deleterious mutation rate is observed in the pericentromeric regions, and might be caused by differences in recombination frequency. Over 10% of the maize genome shows evidence of introgression from the mexicana genome, suggesting that mexicana contributed to maize adaptation and improvement. Our data offer a rich resource for constructing the pan-genome of Zea mays and genetic improvement of modern maize varieties

    Widespread Genomic Signatures of Natural Selection in Hominid Evolution

    Get PDF
    Selection acting on genomic functional elements can be detected by its indirect effects on population diversity at linked neutral sites. To illuminate the selective forces that shaped hominid evolution, we analyzed the genomic distributions of human polymorphisms and sequence differences among five primate species relative to the locations of conserved sequence features. Neutral sequence diversity in human and ancestral hominid populations is substantially reduced near such features, resulting in a surprisingly large genome average diversity reduction due to selection of 19–26% on the autosomes and 12–40% on the X chromosome. The overall trends are broadly consistent with “background selection” or hitchhiking in ancestral populations acting to remove deleterious variants. Average selection is much stronger on exonic (both protein-coding and untranslated) conserved features than non-exonic features. Long term selection, rather than complex speciation scenarios, explains the large intragenomic variation in human/chimpanzee divergence. Our analyses reveal a dominant role for selection in shaping genomic diversity and divergence patterns, clarify hominid evolution, and provide a baseline for investigating specific selective events

    A mutation degree model for the identification of transcriptional regulatory elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Current approaches for identifying transcriptional regulatory elements are mainly via the combination of two properties, the evolutionary conservation and the overrepresentation of functional elements in the promoters of co-regulated genes. Despite the development of many motif detection algorithms, the discovery of conserved motifs in a wide range of phylogenetically related promoters is still a challenge, especially for the short motifs embedded in distantly related gene promoters or very closely related promoters, or in the situation that there are not enough orthologous genes available.</p> <p>Results</p> <p>A mutation degree model is proposed and a new word counting method is developed for the identification of transcriptional regulatory elements from a set of co-expressed genes. The new method comprises two parts: 1) identifying overrepresented oligo-nucleotides in promoters of co-expressed genes, 2) estimating the conservation of the oligo-nucleotides in promoters of phylogenetically related genes by the mutation degree model. Compared with the performance of other algorithms, our method shows the advantages of low false positive rate and higher specificity, especially the robustness to noisy data. Applying the method to co-expressed gene sets from Arabidopsis, most of known <it>cis</it>-elements were successfully detected. The tool and example are available at <url>http://mcube.nju.edu.cn/jwang/lab/soft/ocw/OCW.html</url>.</p> <p>Conclusions</p> <p>The mutation degree model proposed in this paper is adapted to phylogenetic data of different qualities, and to a wide range of evolutionary distances. The new word-counting method based on this model has the advantage of better performance in detecting short sequence of <it>cis</it>-elements from co-expressed genes of eukaryotes and is robust to less complete phylogenetic data.</p

    Duplication and positive selection among hominin-specific PRAME genes

    Get PDF
    BACKGROUND: The physiological and phenotypic differences between human and chimpanzee are largely specified by our genomic differences. We have been particularly interested in recent duplications in the human genome as examples of relatively large-scale changes to our genome. We performed an in-depth evolutionary analysis of a region of chromosome 1, which is copy number polymorphic among humans, and that contains at least 32 PRAME (Preferentially expressed antigen of melanoma) genes and pseudogenes. PRAME-like genes are expressed in the testis and in a large number of tumours, and are thought to possess roles in spermatogenesis and oogenesis. RESULTS: Using nucleotide substitution rate estimates for exons and introns, we show that two large segmental duplications, of six and seven human PRAME genes respectively, occurred in the last 3 million years. These duplicated genes are thus hominin-specific, having arisen in our genome since the divergence from chimpanzee. This cluster of PRAME genes appears to have arisen initially from a translocation approximately 95–85 million years ago. We identified multiple sites within human or mouse PRAME sequences which exhibit strong evidence of positive selection. These form a pronounced cluster on one face of the predicted PRAME protein structure. CONCLUSION: We predict that PRAME genes evolved adaptively due to strong competition between rapidly-dividing cells during spermatogenesis and oogenesis. We suggest that as PRAME gene copy number is polymorphic among individuals, positive selection of PRAME alleles may still prevail within the human population

    Evolutionary distance estimation and fidelity of pair wise sequence alignment

    Get PDF
    BACKGROUND: Evolutionary distances are a critical measure in comparative genomics and molecular evolutionary biology. A simulation study was used to examine the effect of alignment accuracy of DNA sequences on evolutionary distance estimation. RESULTS: Under the studied conditions, distance estimation was relatively unaffected by alignment error (50% or more of the sites incorrectly aligned) as long as 50% or more of the sites were identical among the sequences (observed P-distance < 0.5). Beyond this threshold, the alignment procedure artificially inflates the apparent sequence identity, skewing distance estimates, and creating alignments that are essentially indistinguishable from random data. This general result was independent of substitution model, sequence length, and insertion and deletion size and rate. CONCLUSION: Examination of the estimated sequence identity may yield some guidance as to the accuracy of the alignment. Inaccurate alignments are expected to have large effects on analyses dependent on site specificity, but analyses that depend on evolutionary distance may be somewhat robust to alignment error as long as fewer than half of the sites have diverged

    Computational pan-genomics: status, promises and challenges

    Get PDF
    International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

    Algorithms for Analysis of Heterogeneous Cancer and Viral Populations Using High-Throughput Sequencing Data

    Get PDF
    Next-generation sequencing (NGS) technologies experienced giant leaps in recent years. Short read samples reach millions of reads, and the number of samples has been growing enormously in the wake of the COVID-19 pandemic. This data can expose essential aspects of disease transmission and development and reveal the key to its treatment. At the same time, single-cell sequencing saw the progress of getting from dozens to tens of thousands of cells per sample. These technological advances bring new challenges for computational biology and require the development of scalable, robust methods to deal with a wide range of problems varying from epidemiology to cancer studies. The first part of this work is focused on processing virus NGS data. It proposes algorithms that can facilitate the initial data analysis steps by filtering genetically related sequencing and the tool investigating intra-host virus diversity vital for biomedical research and epidemiology. The second part addresses single-cell data in cancer studies. It develops evolutionary cancer models involving new quantitative parameters of cancer subclones to understand the underlying processes of cancer development better
    • …
    corecore