213,585 research outputs found
Compressed Genotyping
Significant volumes of knowledge have been accumulated in recent years
linking subtle genetic variations to a wide variety of medical disorders from
Cystic Fibrosis to mental retardation. Nevertheless, there are still great
challenges in applying this knowledge routinely in the clinic, largely due to
the relatively tedious and expensive process of DNA sequencing. Since the
genetic polymorphisms that underlie these disorders are relatively rare in the
human population, the presence or absence of a disease-linked polymorphism can
be thought of as a sparse signal. Using methods and ideas from compressed
sensing and group testing, we have developed a cost-effective genotyping
protocol. In particular, we have adapted our scheme to a recently developed
class of high throughput DNA sequencing technologies, and assembled a
mathematical framework that has some important distinctions from 'traditional'
compressed sensing ideas in order to address different biological and technical
constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue
on Molecular Biology and Neuroscienc
High-Throughput SNP Genotyping by SBE/SBH
Despite much progress over the past decade, current Single Nucleotide
Polymorphism (SNP) genotyping technologies still offer an insufficient degree
of multiplexing when required to handle user-selected sets of SNPs. In this
paper we propose a new genotyping assay architecture combining multiplexed
solution-phase single-base extension (SBE) reactions with sequencing by
hybridization (SBH) using universal DNA arrays such as all -mer arrays. In
addition to PCR amplification of genomic DNA, SNP genotyping using SBE/SBH
assays involves the following steps: (1) Synthesizing primers complementing the
genomic sequence immediately preceding SNPs of interest; (2) Hybridizing these
primers with the genomic DNA; (3) Extending each primer by a single base using
polymerase enzyme and dideoxynucleotides labeled with 4 different fluorescent
dyes; and finally (4) Hybridizing extended primers to a universal DNA array and
determining the identity of the bases that extend each primer by hybridization
pattern analysis. Our contributions include a study of multiplexing algorithms
for SBE/SBH genotyping assays and preliminary experimental results showing the
achievable tradeoffs between the number of array probes and primer length on
one hand and the number of SNPs that can be assayed simultaneously on the
other. Simulation results on datasets both randomly generated and extracted
from the NCBI dbSNP database suggest that the SBE/SBH architecture provides a
flexible and cost-effective alternative to genotyping assays currently used in
the industry, enabling genotyping of up to hundreds of thousands of
user-specified SNPs per assay.Comment: 19 page
Forensic SNP genotyping using nanopore MinION sequencing
One of the latest developments in next generation sequencing is the Oxford Nanopore Technologies' (ONT) MinION nanopore sequencer. We studied the applicability of this system to perform forensic genotyping of the forensic female DNA standard 9947 A using the 52 SNP-plex assay developed by the SNPforID consortium. All but one of the loci were correctly genotyped. Several SNP loci were identified as problematic for correct and robust genotyping using nanopore sequencing. All these loci contained homopolymers in the sequence flanking the forensic SNP and most of them were already reported as problematic in studies using other sequencing technologies. When these problematic loci are avoided, correct forensic genotyping using nanopore sequencing is technically feasible
Benchmarking database systems for Genomic Selection implementation
Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix
GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data
Background: With its simple library preparation and robust approach to genome reduction, genotyping-by-sequencing (GBS) is a flexible and cost-effective strategy for SNP discovery and genotyping, provided an appropriate reference genome is available. For resource-limited curation, research, and breeding programs of underutilized plant genetic resources, however, even low-depth references may not be within reach, despite declining sequencing costs. Such programs would find value in an open-source bioinformatics pipeline that can maximize GBS data usage and perform high-density SNP genotyping in the absence of a reference.
Results: The GBS SNP-Calling Reference Optional Pipeline (GBS-SNP-CROP) developed and presented here adopts a clustering strategy to build a population-tailored “Mock Reference” from the same GBS data used for downstream SNP calling and genotyping. Designed for libraries of paired-end (PE) reads, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed read-length uniformity requirements. Using 150 bp PE reads from a GBS library of 48 accessions of tetraploid kiwiberry (Actinidia arguta), GBS-SNP-CROP yielded on average three times as many SNPs as TASSEL-GBS analyses (32 and 64 bp tag lengths) and over 18 times as many as TASSEL-UNEAK, with fewer genotyping errors in all cases, as evidenced by comparing the genotypic characterizations of biological replicates. Using the published reference genome of a related diploid species (A. chinensis), the reference-based version of GBS-SNP-CROP behaved similarly to TASSEL-GBS in terms of the number of SNPs called but had an improved read depth distribution and fewer genotyping errors. Our results also indicate that the sets of SNPs detected by the different pipelines above are largely orthogonal to one another; thus GBS-SNP-CROP may be used to augment the results of alternative analyses, whether or not a reference is available.
Conclusions: By achieving high-density SNP genotyping in populations for which no reference genome is available, GBS-SNP-CROP is worth consideration by curators, researchers, and breeders of under-researched plant genetic resources. In cases where a reference is available, especially if from a related species or when the target population is particularly diverse, GBS-SNP-CROP may complement other reference-based pipelines by extracting more information per sequencing dollar spent. The current version of GBS-SNP-CROP is available at https://github.com/halelab/GBS-SNP-CROP.gi
A universal method for automated gene mapping
Small insertions or deletions (InDels) constitute a ubiquituous class of sequence polymorphisms found in eukaryotic genomes. Here, we present an automated high-throughput genotyping method that relies on the detection of fragment-length polymorphisms (FLPs) caused by InDels. The protocol utilizes standard sequencers and genotyping software. We have established genome-wide FLP maps for both Caenorhabditis elegans and Drosophila melanogaster that facilitate genetic mapping with a minimum of manual input and at comparatively low cost
Automated SNP genotype clustering algorithm to improve data completeness in high-throughput SNP genotyping datasets from custom arrays
High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author
- …