193 research outputs found
Using multiple alignments to improve seeded local alignment algorithms
Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple alignments. In this paper, we present an algorithm that uses the information implicit in a multiple alignment to dynamically build an index that is weighted most heavily towards the promising regions of the multiple alignment. We have implemented Typhon, a local alignment tool that incorporates our indexing algorithm, which our test results show to be more sensitive than algorithms that index only a sequence. This suggests that when applied on a whole-genome scale, Typhon should provide improved homology searches in time comparable to existing algorithms
Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1Γ sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays β the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling
Recommended from our members
Genetic and Computational Identification of a Conserved Bacterial Metabolic Module
We have experimentally and computationally defined a set of genes that form a conserved metabolic module in the Ξ±-proteobacterium Caulobacter crescentus and used this module to illustrate a schema for the propagation of pathway-level annotation across bacterial genera. Applying comprehensive forward and reverse genetic methods and genome-wide transcriptional analysis, we (1) confirmed the presence of genes involved in catabolism of the abundant environmental sugar myo-inositol, (2) defined an operon encoding an ABC-family myo-inositol transmembrane transporter, and (3) identified a novel myo-inositol regulator protein and cis-acting regulatory motif that control expression of genes in this metabolic module. Despite being encoded from non-contiguous loci on the C. crescentus chromosome, these myo-inositol catabolic enzymes and transporter proteins form a tightly linked functional group in a computationally inferred network of protein associations. Primary sequence comparison was not sufficient to confidently extend annotation of all components of this novel metabolic module to related bacterial genera. Consequently, we implemented the Graemlin multiple-network alignment algorithm to generate cross-species predictions of genes involved in myo-inositol transport and catabolism in other Ξ±-proteobacteria. Although the chromosomal organization of genes in this functional module varied between species, the upstream regions of genes in this aligned network were enriched for the same palindromic cis-regulatory motif identified experimentally in C. crescentus. Transposon disruption of the operon encoding the computationally predicted ABC myo-inositol transporter of Sinorhizobium meliloti abolished growth on myo-inositol as the sole carbon source, confirming our cross-genera functional prediction. Thus, we have defined regulatory, transport, and catabolic genes and a cis-acting regulatory sequence that form a conserved module required for myo-inositol metabolism in select Ξ±-proteobacteria. Moreover, this study describes a forward validation of gene-network alignment, and illustrates a strategy for reliably transferring pathway-level annotation across bacterial species.</p
Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls
Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 Γ 10β3) and candidate genes from knockout mice (P = 5.2 Γ 10β3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000β185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts
The genetic architecture of type 2 diabetes
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes
Recommended from our members
A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels
Polygenic scores (PGSs) combine the effects of common genetic variants1,2 to predict risk or treatment strategies for complex diseases3-7. Adding rare variation to PGSs has largely unknown benefits and is methodically challenging. Here, we developed a method for constructing rare variant PGSs and applied it to calculate genetically modified hemoglobin A1C thresholds for type 2 diabetes (T2D) diagnosis7-10. The resultant rare variant PGS is highly polygenic (21,293 variants across 154 genes), depends on ultra-rare variants (72.7% observed in fewer than three people) and identifies significantly more undiagnosed T2D cases than expected by chance (odds ratioβ=β2.71; Pβ=β1.51βΓβ10-6). A PGS combining common and rare variants is expected to identify 4.9βmillion misdiagnosed T2D cases in the United States-nearly 1.5-fold more than the common variant PGS alone. These results provide a method for constructing complex trait PGSs from rare variants and suggest that rare variants will augment common variants in precision medicine approaches for common disease
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1β5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (\u3e80% of low-frequency coding variants in ~82βK Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44βK Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D
Human gain-of-function variants in HNF1A confer protection from diabetes but independently increase hepatic secretion of atherogenic lipoproteins
Loss-of-function mutations in hepatocyte nuclear factor 1A (HNF1A) are known to cause rare forms of diabetes and alter hepatic physiology through unclear mechanisms. In the general population, 1:100 individuals carry a rare, protein-coding HNF1A variant, most of unknown functional consequence. To characterize the full allelic series, we performed deep mutational scanning of 11,970 protein-coding HNF1A variants in human hepatocytes and clinical correlation with 553,246 exome-sequenced individuals. Surprisingly, we found that βΌ1:5 rare protein-coding HNF1A variants in the general population cause molecular gain of function (GOF), increasing the transcriptional activity of HNF1A by up to 50% and conferring protection from type 2 diabetes (odds ratio [OR] = 0.77, p = 0.007). Increased hepatic expression of HNF1A promoted a pro-atherogenic serum profile mediated in part by enhanced transcription of risk genes including ANGPTL3 and PCSK9. In summary, βΌ1:300 individuals carry a GOF variant in HNF1A that protects carriers from diabetes but enhances hepatic secretion of atherogenic lipoproteins.publishedVersio
- β¦