34 research outputs found

    AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications

    Get PDF
    BACKGROUND: Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years. These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Moreover, the exponential growth of known Y chromosomal lineages will require an automatic determination of the phylogenetic position of an individual based on whole genome SNP calling data and an up to date Y chromosomal tree. RESULTS: We present an automated approach, ‘AMY-tree’, which is able to determine the phylogenetic position of a Y chromosome using a whole genome SNP profile, independently from the NGS platform and SNP calling program, whereby mistakes in the SNP calling or phylogenetic Y chromosomal tree are taken into account. Moreover, AMY-tree indicates ambiguities within the present phylogenetic tree and points out new Y-SNPs which may be phylogenetically relevant. The AMY-tree software package was validated successfully on 118 whole genome SNP profiles of 109 males with different origins. Moreover, support was found for an unknown recurrent mutation, wrong reported mutation conversions and a large amount of new interesting Y-SNPs. CONCLUSIONS: Therefore, AMY-tree is a useful tool to determine the Y lineage of a sample based on SNP calling, to identify Y-SNPs with yet unknown phylogenetic position and to optimize the Y chromosomal phylogenetic tree in the future. AMY-tree will not add lineages to the existing phylogenetic tree of the Y-chromosome but it is the first step to analyse whole genome SNP profiles in a phylogenetic framework

    The Y-Chromosome Tree Bursts into Leaf: 13,000 High-Confidence SNPs Covering the Majority of Known Clades

    Get PDF
    Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51x, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes

    Genomic variation in next-generation sequencing data: from human Y chromosomal phylogeny to insect genomics

    No full text
    In the past decade next-generation sequencing (NGS) technologies have revolutionized the fields of molecular biology, genetics and genomics by enabling cost-effective and quick generation of DNA sequence data with exquisite accuracy and resolution. The technical strategy of the NGS technology is straightforward: the sequencing throughput is boosted by miniaturizing the sequencing chemical reactions such that millions of those reactions can take place. The amount of data produced by the NGS technologies has been truly astonishing: about every seven months this produced data is doubled and it is expected that their sequencing capacities will continue to grow very rapidly over the next ten years. This allows for answering several new research questions while new kinds of experiments can be performed as the NGS technology is used creatively to sequence the genome, transcriptome and their interactions with the proteome. Investigations that were, for most, unreachable luxuries just a few years ago are now being increasingly enabled, at a rapid pace. Two different applications that use the current advantages of NGS in biology and which were the main focus of my PhD thesis, are the enlarging of the human Y chromosomal phylogeny and the detection of genomic imprinting in social insects. Publically available NGS data are used to validate and improve the existing human Y chromosomal phylogeny which represents the evolutionary relationship among the studied human Y chromosomes. The human genome can be sequenced with NGS in a time- and cost-efficient manner and therefore, more and more human genomes will become publically available in the coming years. By using Y chromosomal NGS data, a more detailed genealogical history of the evolutionary change of the Y chromosome can be reconstructed. We created two software packages, AMY-tree and PENNY, to deal with NGS data to improve the phylogeny. AMY-tree is created to determine the Y chromosomal lineage of a sample while also detecting Y-SNP recurrent mutations and wrongly reported Y-SNP conversions and reporting Y-SNPs which are not yet reported in scientific publications. As there are still many false positive SNP calls in NGS data, PENNY was developed to deal in silico with the huge number of newly reported Y-SNPs before adding them to the phylogeny. Both programs are validated based on a dataset of DNA samples sequenced on different NGS platforms and with different sequencing depths. This resulted in a new Y chromosomal phylogeny, however, the practical use of this phylogeny is becoming increasingly complex and therefore, also a minimalized version of the Y phylogeny has been constructed. The second application developed during my PhD project, is the use of NGS in a complex experimental design to find genomic imprinting in bumblebee Bombus terrestris. Genomic imprinting is the epigenetic phenomenon whereby alleles are expressed based on their inheritance from one specific parent and since it only effects a small amount of genes, it definitely benefits from the use of NGS technologies. NGS makes it possible to determine the genetic variation in the parents first before the transcriptome of the offspring is sequenced to find the parent-of-origin specific gene expression. Without NGS finding new imprinted genes is almost unfeasible as only targeted genes can be tested. A large number of samples is required to distinguish parent-of-origin effects from other effects specific for the family, lineage and individual and the falling cost of NGS technologies make it possible to study imprinting in a genome-wide manner in this social bumble bee. Our extensive experiment was designed such that parent-of-origin effects on gene expression could be discriminated from other effects and this resulted in the detection of 93 genes with a parent-of-origin bias which all expressed the patrigene more than the matrigene. The found number of imprinted genes accords with the percentages found in other species, however, no known imprinted genes or functions found. As the experimental design is complex it was crucial that all technical requirements regarding the NGS were fulfilled to successfully detect parent-of-origin gene expression. Our results do not support David Haig's kin conflict theory as an adaptive basis of genomic imprinting in bees. Instead, both our results from bumblebees and a recently published study on genomic imprinting in the honeybee show that genomic imprinting is used to tune and reduce the phenotypic variance at specific loci. For decades it was laborious to get an adequate amount of DNA and RNA data but with the upswing in NGS technologies the challenges have been shifted onto the interpretation of the huge amounts of data that can be generated shown by the complexity to detect genomic imprinting in B. terrestris. Furthermore, the efforts that make it possible to answer research questions are also challenges resulting from the increasing use of NGS methods as shown by updating the human Y chromosomal phylogeny. By updating the phylogeny the discrimination power of samples increases such that other fields using this Y chromosomal phylogeny can answer their research questions more accurately.nrpages: 214status: publishe

    Automating a combined composite-consensus method to generate DNA profiles from low and high template mixture samples

    No full text
    We present an automated method to generate DNA profiles from replicate PCRs by combining advantages of the composite and consensus method by a system of brackets in which an allelic balance threshold is used as a variable to separate DNA-profiles of major from minor donors. Through the analysis of artificial low (125 pg) and high (250 pg) template three-person mixtures with low (1:1.5:3) and high (1:5:10) donor ratios we demonstrate the usefulness of a tool to determine the optimal allelic balance threshold within a locus. The automated extraction of dominant profiles saves considerable amounts of time when producing composite-consensus profiles. Drop-in/drop-out rates are produced and a comparison is made with an alternative open source script to evaluate the dominant profiles generated. By introducing this script into the forensic community we hope to increase awareness of much needed collaborative efforts with bioinformaticians and statisticians to develop forensic open source software scripts.status: publishe

    Neutral and adaptive genomic signatures of rapid poleward range expansion

    No full text
    Many species are expanding their range polewards and this has been associated with rapid phenotypic change. Yet, it is unclear to what extent this reflects rapid genetic adaptation or neutral processes associated with range expansion, or selection linked to the new thermal conditions encountered. To disentangle these alternatives, we studied the genomic signature of range expansion in the damselfly Coenagrion scitulum using 4950 newly developed genomic SNPs and linked this to the rapidly evolved phenotypic differences between core and (newly established) edge populations. Most edge populations were genetically clearly differentiated from the core populations and all were differentiated from each other indicating independent range expansion events. In addition, evidence for genetic drift in the edge populations, and strong evidence for adaptive genetic variation in association with the range expansion was detected. We identified one SNP under consistent selection in four of the five edge populations and showed that the allele increasing in frequency is associated with increased flight performance. This indicates collateral, non-neutral evolutionary changes in independent edge populations driven by the range expansion process. We also detected a genomic signature of adaptation to the newly encountered thermal regimes, reflecting a pattern of countergradient variation. The latter signature was identified at a single SNP as well as in a set of covarying SNPs using a polygenic multilocus approach to detect selection. Overall, this study highlights how a strategic geographic sampling design and the integration of genomic, phenotypic and environmental data can identify anddisentangle the neutral and adaptive processes that are simultaneously operating during range expansions.status: publishe

    Track-a-Forager: a program for the automated analysis of RFID tracking data to reconstruct foraging behaviour

    Get PDF
    Behavioural studies make increasingly use of the passive radio-frequency identification (RFID) technology to monitor the foraging behaviour and activity patterns of individual animals over extended periods of time. Central place foragers, such as social insects, birds and many rodents have proved particularly well suited for this technology. As yet, however, there is no standardized methodology to filter and postprocess the data resulting from RFID scanners. Here we present a new user-friendly, publically available Java program named “Track a Forager” to analyse and rigorously filter RFID animal tracking data. The program is particularly suited and has special features to analyse social insect behaviour, but it is generic enough to analyse data obtained from any species. The implemented filtering algorithm consists of several well-defined steps to cluster multiple temporally clustered RFID scans of the same individual, determine events of leaving and entering the nest and/or feeder and reconstruct foraging trips for each individual. Track-a-Forager analyses RFID data independent of the used scanner system for eight different types of standard experimental setups that are common in foraging behaviour research. These setups differ with respect to whether or not foraging at an artificial feeder is monitored and the specific placement of the RFID scanners at the nest or feeder. As a real-life example, we show how Track-a-Forager enables one to reconstruct 75% more foraging trips compared to if one were to use the raw data.status: publishe

    Microarray analysis of copy number variants on the human Y chromosome reveals novel and frequent duplications overrepresented in specific haplogroups

    Get PDF
    Background The human Y chromosome is almost always excluded from genome-wide investigations of copy number variants (CNVs) due to its highly repetitive structure. This chromosome should not be forgotten, not only for its well-known relevance in male fertility, but also for its involvement in clinical phenotypes such as cancers, heart failure and sex specific effects on brain and behaviour. Results We analysed Y chromosome data from Affymetrix 6.0 SNP arrays and found that the signal intensities for most of 8179 SNP/CN probes in the male specific region (MSY) discriminated between a male, background signals in a female and an isodicentric male containing a large deletion of the q-arm and a duplication of the p-arm of the Y chromosome. Therefore, this SNP/CN platform is suitable for identification of gain and loss of Y chromosome sequences. In a set of 1718 males, we found 25 different CNV patterns, many of which are novel. We confirmed some of these variants by PCR or qPCR. The total frequency of individuals with CNVs was 14.7%, including 9.5% with duplications, 4.5% with deletions and 0.7% exhibiting both. Hence, a novel observation is that the frequency of duplications was more than twice the frequency of deletions. Another striking result was that 10 of the 25 detected variants were significantly overrepresented in one or more haplogroups, demonstrating the importance to control for haplogroups in genome-wide investigations to avoid stratification. NO-M214(xM175) individuals presented the highest percentage (95%) of CNVs. If they were not counted, 12.4% of the rest included CNVs, and the difference between duplications (8.9%) and deletions (2.8%) was even larger. Conclusions Our results demonstrate that currently available genome-wide SNP platforms can be used to identify duplications and deletions in the human Y chromosome. Future association studies of the full spectrum of Y chromosome variants will demonstrate the potential involvement of gain or loss of Y chromosome sequence in different human phenotypes.status: publishe

    Data from: Neutral and adaptive genomic signatures of rapid poleward range expansion

    No full text
    Many species are expanding their range polewards and this has been associated with rapid phenotypic change. Yet, it is unclear to what extent this reflects rapid genetic adaptation or neutral processes associated with range expansion, or selection linked to the new thermal conditions encountered. To disentangle these alternatives, we studied the genomic signature of range expansion in the damselfly Coenagrion scitulum using 4950 newly developed genomic SNPs and linked this to the rapidly evolved phenotypic differences between core and (newly established) edge populations. Most edge populations were genetically clearly differentiated from the core populations and all were differentiated from each other indicating independent range expansion events. In addition, evidence for genetic drift in the edge populations, and strong evidence for adaptive genetic variation in association with the range expansion was detected. We identified one SNP under consistent selection in four of the five edge populations and showed that the allele increasing in frequency is associated with increased flight performance. This indicates collateral, non-neutral evolutionary changes in independent edge populations driven by the range expansion process. We also detected a genomic signature of adaptation to the newly encountered thermal regimes, reflecting a pattern of countergradient variation. The latter signature was identified at a single SNP as well as in a set of covarying SNPs using a polygenic multilocus approach to detect selection. Overall, this study highlights how a strategic geographic sampling design and the integration of genomic, phenotypic and environmental data can identify and disentangle the neutral and adaptive processes that are simultaneously operating during range expansions
    corecore