71 research outputs found

    SONiCS: PCR stutter noise correction in genome-scale microsatellites

    Full text link
    Motivation Massively parallel capture of short tandem repeats (STRs, or microsatellites) provides a strategy for population genomic and demographic analyses at high resolution with or without a reference genome. However, the high Polymerase Chain Reaction (PCR) cycle numbers needed for target capture experiments create genotyping noise through polymerase slippage known as PCR stutter. Results We developed SONiCS—Stutter mONte Carlo Simulation—a solution for stutter correction based on dense forward simulations of PCR and capture experimental conditions. To test SONiCS, we genotyped a 2499-marker STR panel in 22 humpback dolphins (Sousa sahulensis) using target capture, and generated capillary-based genotypes to validate five of these markers. In these 110 comparisons, SONiCS showed a 99.1% accuracy rate and a 98.2% genotyping success rate, miscalling a single allele in a marker with low sequence coverage and rejecting another as un-callable. Availability and implementation Source code and documentation for SONiCS is freely available at https://github.com/kzkedzierska/sonics. Raw read data used in experimental validation of SONiCS have been deposited in the Sequence Read Archive under accession number SRP135756

    Supporting California condor conservation management through analysis of species-wide whole genome sequence variation

    Get PDF
    The critically endangered California condor (Gymnogyps californianus) has been the focus of intensive conservation efforts for several decades. Reduced to a population size of twenty-three birds in 1985, the entire surviving population was brought under captive management for recovery. Founded by fourteen individuals, the surviving California condor gene pool has been managed through captive breeding of individuals paired through pedigree analysis. As of August, 2013, there were 424 California condor individuals; 223 are flying in the wild in four re-introduced populations in California, Arizona and Baja California, Mexico. All condors have their sex identified via amplification of sex chromosome specific markers and DNA samples are stored for every individual of the species. Microsatellite genotyping has confirmed parentage in captive and wild condor chicks, corrected switched identities, and identified successful extra-pair copulation in the wild population. Whole genome sequencing using data generated on multiple platforms has been used to produce a de novo genome assembly for a founder male condor and thirty additional condors that together encompass the entire genetic variation of the species, perhaps the first time such a comprehensive effort has been conducted for any species. Studbook-based kinship relationships between founder birds and kinship estimates from genome-wide genetic variation can be compared and evaluated in the context of retention of genetic diversity in the generations of California condors. Genomic studies of California condors are providing a model system for avian conservation genomics and allow empirical evaluation of basic facets of transmission genetics, including segregation, linkage, recombination and mutation

    Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The scientific literature contains many examples where DNA sequence analyses have been used to provide definitive answers to phylogenetic problems that traditional (non-DNA based) approaches alone have failed to resolve. One notable example concerns the rhinoceroses, a group for which several contradictory phylogenies were proposed on the basis of morphology, then apparently resolved using mitochondrial DNA fragments.</p> <p>Results</p> <p>In this study we report the first complete mitochondrial genome sequences of the extinct ice-age woolly rhinoceros (<it>Coelodonta antiquitatis</it>), and the threatened Javan (<it>Rhinoceros sondaicus</it>), Sumatran (<it>Dicerorhinus sumatrensis</it>), and black (<it>Diceros bicornis</it>) rhinoceroses. In combination with the previously published mitochondrial genomes of the white (<it>Ceratotherium simum</it>) and Indian (<it>Rhinoceros unicornis</it>) rhinoceroses, this data set putatively enables reconstruction of the rhinoceros phylogeny. While the six species cluster into three strongly supported sister-pairings: (i) The black/white, (ii) the woolly/Sumatran, and (iii) the Javan/Indian, resolution of the higher-level relationships has no statistical support. The phylogenetic signal from individual genes is highly diffuse, with mixed topological support from different genes. Furthermore, the choice of outgroup (horse <it>vs </it>tapir) has considerable effect on reconstruction of the phylogeny. The lack of resolution is suggestive of a hard polytomy at the base of crown-group Rhinocerotidae, and this is supported by an investigation of the relative branch lengths.</p> <p>Conclusion</p> <p>Satisfactory resolution of the rhinoceros phylogeny may not be achievable without additional analyses of substantial amounts of nuclear DNA. This study provides a compelling demonstration that, in spite of substantial sequence length, there are significant limitations with single-locus phylogenetics. We expect further examples of this to appear as next-generation, large-scale sequencing of complete mitochondrial genomes becomes commonplace in evolutionary studies.</p> <p><it>"The human factor in classification is nowhere more evident than in dealing with this superfamily (Rhinocerotoidea)." G. G. Simpson (1945)</it></p

    Telomere length is not a main factor for the development of islet autoimmunity and type 1 diabetes in the TEDDY study.

    Get PDF
    Funder: Lund UniversityThe Environmental Determinants of Diabetes in the Young (TEDDY) study enrolled 8676 children, 3-4 months of age, born with HLA-susceptibility genotypes for islet autoimmunity (IA) and type 1 diabetes (T1D). Whole-genome sequencing (WGS) was performed in 1119 children in a nested case-control study design. Telomere length was estimated from WGS data using five tools: Computel, Telseq, Telomerecat, qMotif and Motif_counter. The estimated median telomere length was 5.10 kb (IQR 4.52-5.68 kb) using Computel. The age when the blood sample was drawn had a significant negative correlation with telomere length (P = 0.003). European children, particularly those from Finland (P = 0.041) and from Sweden (P = 0.001), had shorter telomeres than children from the U.S.A. Paternal age (P = 0.019) was positively associated with telomere length. First-degree relative status, presence of gestational diabetes in the mother, and maternal age did not have a significant impact on estimated telomere length. HLA-DR4/4 or HLA-DR4/X children had significantly longer telomeres compared to children with HLA-DR3/3 or HLA-DR3/9 haplogenotypes (P = 0.008). Estimated telomere length was not significantly different with respect to any IA (P = 0.377), IAA-first (P = 0.248), GADA-first (P = 0.248) or T1D (P = 0.861). These results suggest that telomere length has no major impact on the risk for IA, the first step to develop T1D. Nevertheless, telomere length was shorter in the T1D high prevalence populations, Finland and Sweden

    Telomere length is not a main factor for the development of islet autoimmunity and type 1 diabetes in the TEDDY study

    Get PDF
    The Environmental Determinants of Diabetes in the Young (TEDDY) study enrolled 8676 children, 3-4 months of age, born with HLA-susceptibility genotypes for islet autoimmunity (IA) and type 1 diabetes (T1D). Whole-genome sequencing (WGS) was performed in 1119 children in a nested case-control study design. Telomere length was estimated from WGS data using five tools: Computel, Telseq, Telomerecat, qMotif and Motif_counter. The estimated median telomere length was 5.10 kb (IQR 4.52-5.68 kb) using Computel. The age when the blood sample was drawn had a significant negative correlation with telomere length (P = 0.003). European children, particularly those from Finland (P = 0.041) and from Sweden (P = 0.001), had shorter telomeres than children from the U.S.A. Paternal age (P = 0.019) was positively associated with telomere length. First-degree relative status, presence of gestational diabetes in the mother, and maternal age did not have a significant impact on estimated telomere length. HLA-DR4/4 or HLA-DR4/X children had significantly longer telomeres compared to children with HLA-DR3/3 or HLA-DR3/9 haplogenotypes (P = 0.008). Estimated telomere length was not significantly different with respect to any IA (P = 0.377), IAA-first (P = 0.248), GADA-first (P = 0.248) or T1D (P = 0.861). These results suggest that telomere length has no major impact on the risk for IA, the first step to develop T1D. Nevertheless, telomere length was shorter in the T1D high prevalence populations, Finland and Sweden.</p

    SVXplorer: Three-tier approach to identification of structural variants via sequential recombination of discordant cluster signatures.

    No full text
    The identification of structural variants using short-read data remains challenging. Most approaches that use discordant paired-end sequences ignore non-trivial signatures presented by variants containing 3 breakpoints, such as those generated by various copy-paste and cut-paste mechanisms. This can result in lower precision and sensitivity in the identification of the more common structural variants such as deletions and duplications. We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods. We show that SVXplorer is more sensitive and precise compared to several existing approaches on multiple real and simulated datasets. SVXplorer is available for download at https://github.com/kunalkathuria/SVXplorer

    Comparison of sequencing platforms for single nucleotide variant calls in a human sample.

    Get PDF
    Next-generation sequencings platforms coupled with advanced bioinformatic tools enable re-sequencing of the human genome at high-speed and large cost savings. We compare sequencing platforms from Roche/454(GS FLX), Illumina/HiSeq (HiSeq 2000), and Life Technologies/SOLiD (SOLiD 3 ECC) for their ability to identify single nucleotide substitutions in whole genome sequences from the same human sample. We report on significant GC-related bias observed in the data sequenced on Illumina and SOLiD platforms. The differences in the variant calls were investigated with regards to coverage, and sequencing error. Some of the variants called by only one or two of the platforms were experimentally tested using mass spectrometry; a method that is independent of DNA sequencing. We establish several causes why variants remained unreported, specific to each platform. We report the indel called using the three sequencing technologies and from the obtained results we conclude that sequencing human genomes with more than a single platform and multiple libraries is beneficial when high level of accuracy is required
    • …
    corecore