6 research outputs found
Human Whole-Exome Genotype Data For alzheimer\u27s Disease
The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer\u27s Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD \u3e 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community
The early-onset Alzheimer's disease whole-genome sequencing project: Study design and methodology
INTRODUCTION
Sequencing efforts to identify genetic variants and pathways underlying Alzheimer's disease (AD) have largely focused on late-onset AD although early-onset AD (EOAD), accounting for ∼10% of cases, is largely unexplained by known mutations, resulting in a lack of understanding of its molecular etiology.
METHODS
Whole-genome sequencing and harmonization of clinical, neuropathological, and biomarker data of over 5000 EOAD cases of diverse ancestries.
RESULTS
A publicly available genomics resource for EOAD with extensive harmonized phenotypes. Primary analysis will (1) identify novel EOAD risk loci and druggable targets; (2) assess local-ancestry effects; (3) create EOAD prediction models; and (4) assess genetic overlap with cardiovascular and other traits.
DISCUSSION
This novel resource complements over 50,000 control and late-onset AD samples generated through the Alzheimer's Disease Sequencing Project (ADSP). The harmonized EOAD/ADSP joint call will be available through upcoming ADSP data releases and will allow for additional analyses across the full onset range.
Highlights
Sequencing efforts to identify genetic variants and pathways underlying Alzheimer's disease (AD) have largely focused on late-onset AD although early-onset AD (EOAD), accounting for ∼10% of cases, is largely unexplained by known mutations. This results in a significant lack of understanding of the molecular etiology of this devastating form of the disease.
The Early-Onset Alzheimer's Disease Whole-genome Sequencing Project is a collaborative initiative to generate a large-scale genomics resource for early-onset Alzheimer's disease with extensive harmonized phenotype data.
Primary analyses are designed to (1) identify novel EOAD risk and protective loci and druggable targets; (2) assess local-ancestry effects; (3) create EOAD prediction models; and (4) assess genetic overlap with cardiovascular and other traits.
The harmonized genomic and phenotypic data from this initiative will be available through NIAGADS
Recommended from our members
The Alzheimer’s Disease Sequencing Project – Follow Up Study (ADSP‐FUS): APOE genotype status and demographic characteristics across datasets
Abstract Background The ADSP‐FUS is a National Institute on Aging (NIA) initiative focused on identifying genetic risk and protective variants for Alzheimer Disease (AD) by expanding the ADSP beyond non‐Hispanic Whites of European Ancestry (NHW‐EA) populations. Given the lack of diversity in the ADSP, the ADSP‐FUS was designed to whole genome sequence (WGS) existing ethnically diverse and unique cohorts. The upcoming phase ADSP‐ FUS 2.0: The Diverse Population Initiative, focuses on inclusion of Hispanic/Latino (HL), non‐Hispanic Black with African Ancestry (NHB‐AA), and Asian populations. Methods ADSP‐FUS cohorts consist of studies of AD, dementia, and age‐related conditions. Clinical classifications are assigned based on standard criteria from clinical measures and history, as well as additional neuropathologic data. In addition to production of WGS, genome‐wide array and APOE genotyping is acquired or performed for all ADSP‐FUS samples. Results The ADSP‐FUS currently consists of 38 cohorts comprised of ∼40,000 individuals, with plan to sequence >100,000 individuals from diverse ancestries. Genotyping, sequencing, and clinical adjudication has been performed on 23,428 participants (cases N = 6,961, median age = 73; controls N = 13,007, median age = 72; ADRD N = 3,460, median age = 77. More participants are female (62.3%) than male and are evenly distributed across cases (61.0%), controls (63.1%), and ADRD (61.8%). As expected, the most prevalent APOE genotype is APOE 3/3 (% by cases/controls for 2/2 = 0.2,0.4; 2/3 = 4.3, 8.2; 2/4 = 2.2, 1.8; 3/3 = 43.8, 64.4; 3/4 = 39.5, 23.0; 4/4 = 10.1, 2.2). These proportions vary greatly between ethnicities, with the highest for APOE 4/4 observed in Asian participants (8.8%) and the lowest in Hispanic participants (2.5%), for example. Mean Braak stage for AD cases is higher (5.1+1.2) than controls (2.6+1.3) and ADRD participants (3.5+1.6). Conclusion The results provide an overview of features of ADSP‐FUS cohorts. As the ADSP‐FUS expands in size and diversity, this genomic resource, available via NIAGADS, will be integrated with ADSP programs focused on phenotype harmonization, association analyses, functional genomics, and machine learning. In concert with these programs, the ADSP‐FUS will accelerate the identification and understanding of potential genetic risk and protective variants for AD across all populations with the target of developing new treatments that are globally effective
Recommended from our members
ADSP Whole Genome Sequencing (WGS) Release 4 Data Update from Genome Center for Alzheimer’s Disease
Abstract Background The Genome Center for Alzheimer’s Disease (GCAD) coordinates the integration of all available Alzheimer’s disease (AD) relevant whole genome sequencing (WGS) data with the goal of identifying AD risk or protective genetic variants and eventual therapeutic targets. The WGS datasets are generated through collaboration between investigators from the Alzheimer’s Disease Sequencing Project (ADSP) and GCAD. With the goal of minimizing data heterogeneity, introduced by different sequencing protocols and assays, GCAD processes all samples using standardized pipelines and performs quality control (QC)/quality assurance (QA) checks. Methods Raw sequencing data (FASTQs or BAMs) were aligned to GRCh38/hg38 by BWA, and variant calling and joint genotyping on single nucleotide variants (SNVs), insertions and deletions (indels), were done by GATK. Structural variants (SVs) were called per sample using the Smoove, Manta, and Strelka packages. Preliminary QA checks including sex check, contamination, and genotype concordance were performed followed by QC per ADSP protocol to evaluate the quality of samples and variants. To facilitate access and usage of massive joint‐genotype called VCF files, a compact version for storing variant info and sample genotypes only was released first. Results We dropped 275 (0.7%) samples of poor coverage (362M bi‐allelic variants, >58M multi‐allelic variants, with 95% of variants remaining after QC. SV calling is ongoing and data will be ready prior to the conference. Conclusion The ADSP and GCAD generate high quality SNVs, indels and SV calls. Currently GCAD is preparing the next release of ∼60,000 more ancestrally‐diverse WGS samples sequenced primarily through the ADSP Follow‐Up Study, which we anticipate will be released in 2023 to greatly benefit the AD genetics community
Human whole-exome genotype data for Alzheimer's disease
The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community
Human whole-exome genotype data for Alzheimer’s disease
The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer’s Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.</p