Search CORE

193 research outputs found

Using multiple alignments to improve seeded local alignment algorithms

Author: Batzoglou Serafim
Flannick Jason
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple alignments. In this paper, we present an algorithm that uses the information implicit in a multiple alignment to dynamically build an index that is weighted most heavily towards the promising regions of the multiple alignment. We have implemented Typhon, a local alignment tool that incorporates our indexing algorithm, which our test results show to be more sensitive than algorithms that index only a sequence. This suggests that when applied on a whole-genome scale, Typhon should provide improved homology searches in time comparable to existing algorithms

CiteSeerX

Crossref

PubMed Central

Sequential PAttern mining using a bitmap representation

Author: Jason Flannick
Jay Ayres
Johannes Gehrke
Tomi Yiu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation

Author: Citation Flannick
David Altshuler
David Altshuler
Eric Banks
Eric Banks
George B. Grant
George B. Grant
Jason Flannick
Joshua M. Korn
Joshua M. Korn
Mark A. Depristo
Mark A. Depristo
Pierre Fontanillas
Pierre Fontanillas
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling

CiteSeerX

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Recommended from our members

Genetic and Computational Identification of a Conserved Bacterial Metabolic Module

Author: Batzoglou Serafim
Boutte Cara C.
Crosson Sean
Flannick Jason A.
Martens Andrew T.
Novak Antal F.
Srinivasan Balaji S.
Viollier Patrick H.
Publication venue
Publication date: 03/01/2024
Field of study

We have experimentally and computationally defined a set of genes that form a conserved metabolic module in the α-proteobacterium Caulobacter crescentus and used this module to illustrate a schema for the propagation of pathway-level annotation across bacterial genera. Applying comprehensive forward and reverse genetic methods and genome-wide transcriptional analysis, we (1) confirmed the presence of genes involved in catabolism of the abundant environmental sugar myo-inositol, (2) defined an operon encoding an ABC-family myo-inositol transmembrane transporter, and (3) identified a novel myo-inositol regulator protein and cis-acting regulatory motif that control expression of genes in this metabolic module. Despite being encoded from non-contiguous loci on the C. crescentus chromosome, these myo-inositol catabolic enzymes and transporter proteins form a tightly linked functional group in a computationally inferred network of protein associations. Primary sequence comparison was not sufficient to confidently extend annotation of all components of this novel metabolic module to related bacterial genera. Consequently, we implemented the Graemlin multiple-network alignment algorithm to generate cross-species predictions of genes involved in myo-inositol transport and catabolism in other α-proteobacteria. Although the chromosomal organization of genes in this functional module varied between species, the upstream regions of genes in this aligned network were enriched for the same palindromic cis-regulatory motif identified experimentally in C. crescentus. Transposon disruption of the operon encoding the computationally predicted ABC myo-inositol transporter of Sinorhizobium meliloti abolished growth on myo-inositol as the sole carbon source, confirming our cross-genera functional prediction. Thus, we have defined regulatory, transport, and catabolic genes and a cis-acting regulatory sequence that form a conserved module required for myo-inositol metabolism in select α-proteobacteria. Moreover, this study describes a forward validation of gene-network alignment, and illustrates a strategy for reliably transferring pathway-level annotation across bacterial species.</p

Knowledge UChicago

Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls

Author: Atzmon Gil
Blangero John
Duggirala Ravindranath
Flannick Jason
Fuchsberger Christian
Glaser Benjamin
Mahajan Anubha
Mercader Josep M.
Udler Miriam S.
Wessel Jennifer
Publication venue: ScholarWorks @ UTRGV
Publication date: 22/05/2019
Field of study

Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10−3) and candidate genes from knockout mice (P = 5.2 × 10−3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000–185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

The genetic architecture of type 2 diabetes

Author: Agarwala Vineeta
Blangero John
Curran Joanne E
Duggirala Ravi
Flannick Jason
Fuchsberger Christian
Gaulton Kyle J.
Kumar Satish
Mahajan Anubha
Teslovich Tanya M.
Publication venue: ScholarWorks @ UTRGV
Publication date: 04/08/2016
Field of study

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Recommended from our members

A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels

Author: Cole Joanne B
Dornbos Peter
Flannick Jason
Florez Jose C
Koesterer Ryan
Leong Aaron
Meigs James B
Nguyen Trang
Rotter Jerome I
Ruttenburg Andrew
Udler Miriam S
Publication venue: eScholarship, University of California
Publication date: 01/11/2022
Field of study

Polygenic scores (PGSs) combine the effects of common genetic variants1,2 to predict risk or treatment strategies for complex diseases3-7. Adding rare variation to PGSs has largely unknown benefits and is methodically challenging. Here, we developed a method for constructing rare variant PGSs and applied it to calculate genetically modified hemoglobin A1C thresholds for type 2 diabetes (T2D) diagnosis7-10. The resultant rare variant PGS is highly polygenic (21,293 variants across 154 genes), depends on ultra-rare variants (72.7% observed in fewer than three people) and identifies significantly more undiagnosed T2D cases than expected by chance (odds ratio = 2.71; P = 1.51 × 10-6). A PGS combining common and rare variants is expected to identify 4.9 million misdiagnosed T2D cases in the United States-nearly 1.5-fold more than the common variant PGS alone. These results provide a method for constructing complex trait PGSs from rare variants and suggest that rare variants will augment common variants in precision medicine approaches for common disease

eScholarship - University of California

Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

Author: Agarwala Vineeta
Arya Rector
Blangero John
Caulkins Lizz
Curran Joanne E
Duggirala Ravi
Flannick Jason
Fuchsberger Christian
Gaulton Kyle J.
Kumar Satish
Mahajan Anubha
Teslovich Tanya M.
Publication venue: ScholarWorks @ UTRGV
Publication date: 19/12/2017
Field of study

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (\u3e80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Human gain-of-function variants in HNF1A confer protection from diabetes but independently increase hepatic secretion of atherogenic lipoproteins

Author: Abagyan Ruben
De Arruda Saldanha Camila
Deaton Aimee M.
DeForest Natalie
Du Xiaomi
Flannick Jason
Gordts Philip L.S.M.
Gylys Jenny
Heinz Sven
Hu Siqi
Isaac Roi
Kavitha Babu
Khera Amit V.
Krohn Lynne
Majithia Amit R.
Merli Edoardo
Mohan Viswanathan
Najmi Laeya Abdoli
Olefsky Jerrold
Peloso Gina M.
Radha Venkatesan
Wang Minxian
Publication venue: Cell Press
Publication date: 01/01/2023
Field of study

Loss-of-function mutations in hepatocyte nuclear factor 1A (HNF1A) are known to cause rare forms of diabetes and alter hepatic physiology through unclear mechanisms. In the general population, 1:100 individuals carry a rare, protein-coding HNF1A variant, most of unknown functional consequence. To characterize the full allelic series, we performed deep mutational scanning of 11,970 protein-coding HNF1A variants in human hepatocytes and clinical correlation with 553,246 exome-sequenced individuals. Surprisingly, we found that ∼1:5 rare protein-coding HNF1A variants in the general population cause molecular gain of function (GOF), increasing the transcriptional activity of HNF1A by up to 50% and conferring protection from type 2 diabetes (odds ratio [OR] = 0.77, p = 0.007). Increased hepatic expression of HNF1A promoted a pro-atherogenic serum profile mediated in part by enhanced transcription of risk genes including ANGPTL3 and PCSK9. In summary, ∼1:300 individuals carry a GOF variant in HNF1A that protects carriers from diabetes but enhances hepatic secretion of atherogenic lipoproteins.publishedVersio

University of Bergen