103,180 research outputs found
Consensus Rules in Variant Detection from Next-Generation Sequencing Data
A critical step in detecting variants from next-generation sequencing data is post hoc filtering of putative variants called or predicted by computational tools. Here, we highlight four critical parameters that could enhance the accuracy of called single nucleotide variants and insertions/deletions: quality and deepness, refinement and improvement of initial mapping, allele/strand balance, and examination of spurious genes. Use of these sequence features appropriately in variant filtering could greatly improve validation rates, thereby saving time and costs in next-generation sequencing projects
๋ ์ดํผ๋์ : ํฌ๊ท ์์ ๋ณ์ด์ ๋ํ ์ฐ์ฌ๋ ์ ๋ณด๋ฅผ ํตํฉํ๋ ๋๊ตฌ
ํ์๋
ผ๋ฌธ (์์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์์ฐ๊ณผํ๋ํ ํ๋๊ณผ์ ์๋ฌผ์ ๋ณดํ์ ๊ณต, 2018. 8. ๊น์ฃผํ.Finding causal factors of various diseases, whether they be environmental factors, stress, aging, and etc. has been the focus of many researchers around the globe. As advancements in science and technology were made, many disease-related genes and mechanisms have been discovered. With the development of DNA sequencing techniques, the sequences of disease-associated genes and specific disease-related genetic variants are being revealed. However, variants that occur at very low frequencies are often ignored, seemingly because there is little known information about these rare variants, which in turn makes rare variant analyses difficult without increasing the sample size.
RarePedia was designed to make a unified collection of information about rare variants. Furthermore, it focuses on deleterious variants that are expected to be related to diseases. RarePedia is another way to use Next Generation Sequencing(NGS) data, and a helpful tool to see organized information previously scattered across many sources.
The ultimate goal of RarePedia is the accumulation of information through additional updates, whenever there is new information about rare and deleterious variants. It can be a way to organize information about rare and deleterious variants that have not been organized systematically.1. INTRODUCTION 1
2. DATA 3
2.1. Next Generation Sequencing(NGS) data 3
2.1.1. 1000 Genomes Project data 3
2.1.2. Alzheimer's Disease Sequencing Project (ADSP) data 3
2.2. Databases and Resources 4
2.2.1. Databases prepared by ANNOVAR developers 4
2.2.2. Resources used by Oncotator 5
2.2.3. Additional publicly available resources 5
3. METHODS 8
3.1. Filtering scheme for extraction of rare-damaging variants 8
3.1.1. Analysis focused on variants 8
3.1.2. Analysis focused on genes 9
3.2. Annotation of information 10
3.2.1. Annotation with ANNOVAR 10
3.2.2. Annotation with Oncotator 10
3.3. Generation of HTML pages about rare-damaging variants 10
3.4. Validation with publicly available Next Generation Sequencing(NGS) data 11
3.4.1. Statistics from 1000 Genomes Project data 11
3.4.2. Statistics from Alzheimer's Disease Sequencing Project(ADSP) data 11
4. RESULT 13
4.1. Statistics about rare-damaging variants from Next Generation Sequencing(NGS) data 13
4.1.1. Statistics from 1000 Genomes Project data 13
4.1.2. Statistics from Alzheimer's Disease Sequencing Project(ADSP) data 13
4.1.3. Additional Whole Exome Sequencing(WXS) data 14
4.1.4. ClinVar information about variants 14
4.2. Information contained in the HTML page generated by RarePedia 15
5. DISCUSSION 17
6. REFERENCE 20
ABSTRACT(KOREAN) 59Maste
Identification of Sequence Variants in Genetic Disease-Causing Genes Using Targeted Next-Generation Sequencing
Identification of gene variants plays an important role in research on and diagnosis of genetic diseases. A combination of enrichment of targeted genes and next-generation sequencing (targeted DNA-HiSeq) results in both high efficiency and low cost for targeted sequencing of genes of interest.To identify mutations associated with genetic diseases, we designed an array-based gene chip to capture all of the exons of 193 genes involved in 103 genetic diseases. To evaluate this technology, we selected 7 samples from seven patients with six different genetic diseases resulting from six disease-causing genes and 100 samples from normal human adults as controls. The data obtained showed that on average, 99.14% of 3,382 exons with more than 30-fold coverage were successfully detected using Targeted DNA-HiSeq technology, and we found six known variants in four disease-causing genes and two novel mutations in two other disease-causing genes (the STS gene for XLI and the FBN1 gene for MFS) as well as one exon deletion mutation in the DMD gene. These results were confirmed in their entirety using either the Sanger sequencing method or real-time PCR.Targeted DNA-HiSeq combines next-generation sequencing with the capture of sequences from a relevant subset of high-interest genes. This method was tested by capturing sequences from a DNA library through hybridization to oligonucleotide probes specific for genetic disorder-related genes and was found to show high selectivity, improve the detection of mutations, enabling the discovery of novel variants, and provide additional indel data. Thus, targeted DNA-HiSeq can be used to analyze the gene variant profiles of monogenic diseases with high sensitivity, fidelity, throughput and speed
Recommended from our members
Combining effects from rare and common genetic variants in an exome-wide association study of sequence data
Recent breakthroughs in next-generation sequencing technologies allow cost-effective methods for measuring a growing list of cellular properties, including DNA sequence and structural variation. Next-generation sequencing has the potential to revolutionize complex trait genetics by directly measuring common and rare genetic variants within a genome-wide context. Because for a given gene both rare and common causal variants can coexist and have independent effects on a trait, strategies that model the effects of both common and rare variants could enhance the power of identifying disease-associated genes. To date, little work has been done on integrating signals from common and rare variants into powerful statistics for finding disease genes in genome-wide association studies. In this analysis of the Genetic Analysis Workshop 17 data, we evaluate various strategies for association of rare, common, or a combination of both rare and common variants on quantitative phenotypes in unrelated individuals. We show that the analysis of common variants only using classical approaches can achieve higher power to detect causal genes than recently proposed rare variant methods and that strategies that combine association signals derived independently in rare and common variants can slightly increase the power compared to strategies that focus on the effect of either the rare variants or the common variants
NGS Panels applied to Hereditary Cancer Syndromes
Cancer is among the leading causes of morbidity and mortality worldwide (Okur et al, 2017). Germline pathogenic variants for monogenic, highly penetrant cancer susceptibility genes are observed in 5%โ10% of all cancers (Lu et al, 2014). Hereditary cancers due to monogenic causes are characterized by earlier age of onset, other associated cancers, and often a family history of specific cancers. From the clinical perspective, it is important to recognize the affected individuals to provide them the best clinical management (Hennessy et al, 2010; Ledermann et al, 2014; Pennington et al, 2014) and to identify at-risk family members who will benefit from predictive genetic testing and enhanced surveillance, including early detection and/or risk reduction measures (Kurian et al, 2010; Okur et al, 2017). Germline variants identified in major cancer susceptibility genes associated with hereditary breast or ovarian cancer (HBOC) or hereditary colorectal cancer (HCRC), also account for 5-10% of the patients with these cancers. In the last years, new susceptibility genes, with different penetrance degrees, have been identified. Variants in any of those genes are rare and classical methodologies (e.g. Sanger sequencing - SS) are time consuming and expensive. Next-generation sequencing (NGS) has several advantages compared to SS, including the simultaneous analysis of many samples and sequencing of a large set of genes, higher sensitivity (down to 1% vs 15-20% in SS), lower cost and faster turnaround time, reasons that make NGS the best approach for molecular diagnosis.
It is possible nowadays to choose between whole-genome sequencing (WGS), whole-exome sequencing (WES) and NGS limited to a set of genes (NGS-Panel). In cases where a suspected genetic disease or condition has been identified, targeted sequencing of specific genes or genomic regions is preferred (Grada et al, 2013). For that reason, we use NGS-Panel approach using TruSight Cancer (Illumina) to sequence DNA extracted from blood samples of patients with personal and/or familiar history of cancer. This hereditary cancer gene panel sequences 94 genes associated with both common (e.g., breast, colorectal) and rare hereditary cancers and allows the creation of virtual gene panels according to each phenotype or disease under study.
NGS workflow analysis (Figure 1) includes five steps: quality assessment of raw data, read alignment to a reference genome, variant identification/calling, variant annotation and data visualization (Pabinger et al, 2013). The establishment of the most appropriate bioinformatics pipeline is crucial in order to achieve the best results. NGS data allows the identification of several types of variants like single nucleotide variants (SNVs), small insertions/deletions, inversions and also copy number variants (CNVs).FCT - UID/BIM/0009/2016info:eu-repo/semantics/publishedVersio
Use of a targeted, combinatorial next-generation sequencing approach for the study of bicuspid aortic valve
BACKGROUND: Bicuspid aortic valve (BAV) is the most common type of congenital heart disease with a population prevalence of 1-2%. While BAV is known to be highly heritable, mutations in single genes (such as GATA5 and NOTCH1) have been reported in few human BAV cases. Traditional gene sequencing methods are time and labor intensive, while next-generation high throughput sequencing remains costly for large patient cohorts and requires extensive bioinformatics processing. Here we describe an approach to targeted multi-gene sequencing with combinatorial pooling of samples from BAV patients. METHODS: We studied a previously described cohort of 78 unrelated subjects with echocardiogram-identified BAV. Subjects were identified as having isolated BAV or BAV associated with coarctation of aorta (BAV-CoA). BAV cusp fusion morphology was defined as right-left cusp fusion, right non-coronary cusp fusion, or left non-coronary cusp fusion. Samples were combined into 19 pools using a uniquely overlapping combinatorial design; a given mutation could be attributed to a single individual on the basis of which pools contained the mutation. A custom gene capture of 97 candidate genes was sequenced on the Illumina HiSeq 2000. Multistep bioinformatics processing was performed for base calling, variant identification, and in-silico analysis of putative disease-causing variants. RESULTS: Targeted capture identified 42 rare, non-synonymous, exonic variants involving 35 of the 97 candidate genes. Among these variants, in-silico analysis classified 33 of these variants as putative disease-causing changes. Sanger sequencing confirmed thirty-one of these variants, found among 16 individuals. There were no significant differences in variant burden among BAV fusion phenotypes or isolated BAV versus BAV-CoA. Pathway analysis suggests a role for the WNT signaling pathway in human BAV. CONCLUSION: We successfully developed a pooling and targeted capture strategy that enabled rapid and cost effective next generation sequencing of target genes in a large patient cohort. This approach identified a large number of putative disease-causing variants in a cohort of patients with BAV, including variants in 26 genes not previously associated with human BAV. The data suggest that BAV heritability is complex and polygenic. Our pooling approach saved over $39,350 compared to an unpooled, targeted capture sequencing strategy
Restriction Enzyme Generated Next-Generation Sequencing Libraries and Genetic Risk Modifiers of BRCA1 Mutation Carriers
Next-generation sequencing (NGS) is a high throughput technique used to sequence large amounts of DNA in a short amount of time. However, a limitation to NGS is that the generated data is in a single consensus sequence without distinguishing between variants on homologous chromosomes. Separating or phasing the variants from the maternal and paternal chromosomes can provide information about the genetic origin of disease and information about how DNA nucleotide alterations interact in cis. This dissertation explores a new technical method of using restriction enzymes during NGS library preparation and its ability to increase the amount of phasing information that can be derived from NGS data. This study provides evidence that increasing the fragment size of NGS libraries can increase the amount of variant phasing information derived from NGS data.
BRCA1 is a well-known tumor suppressor that, when mutated, predisposes the mutation carrier to breast cancer. BRCA1 mutation carriers have a 44-75% risk of developing breast cancer by age 70. In this study, we used next-generation sequencing data to identify germline genetic variants that modify the risk of breast cancer in BRCA1 mutation carriers. With the use of both biological and statistical filters, five variants were identified that changed breast cancer risk in BRCA1 mutation carriers. Furthermore, it was shown that two of the affected genes alter the growth of BRCA1 mutation breast cell lines. Perhaps, more importantly, the two variants were shown to alter the function of the affected genes. This is the first study to provide functional evidence on how common genetic variants can modify the risk of breast cancer in BRCA1 mutation carriers
Analysis of human mini-exome sequencing data from Genetic Analysis Workshop 17 using a Bayesian hierarchical mixture model
Next-generation sequencing technologies are rapidly changing the field of genetic epidemiology and enabling exploration of the full allele frequency spectrum underlying complex diseases. Although sequencing technologies have shifted our focus toward rare genetic variants, statistical methods traditionally used in genetic association studies are inadequate for estimating effects of low minor allele frequency variants. Four our study we use the Genetic Analysis Workshop 17 data from 697 unrelated individuals (genotypes for 24,487 autosomal variants from 3,205 genes). We apply a Bayesian hierarchical mixture model to identify genes associated with a simulated binary phenotype using a transformed genotype design matrix weighted by allele frequencies. A Metropolis Hasting algorithm is used to jointly sample each indicator variable and additive genetic effect pair from its conditional posterior distribution, and remaining parameters are sampled by Gibbs sampling. This method identified 58 genes with a posterior probability greater than 0.8 for being associated with the phenotype. One of these 58 genes, PIK3C2B was correctly identified as being associated with affected status based on the simulation process. This project demonstrates the utility of Bayesian hierarchical mixture models using a transformed genotype matrix to detect genes containing rare and common variants associated with a binary phenotype
Germline Analysis from Tumor-Germline Sequencing Dyads to Identify Clinically Actionable Secondary Findings
In the past decade, tumor-germline next generation sequencing has become a
routine part of personalized oncology care. Via this method, germline mutations are
typically subtracted from those in the tumor to identify somatic mutations, thus negating
the possibility of discovering germline variants. Previously, it has been proposed that the
identification of germline variants could have significant clinical implications for patients
with hereditary cancers and their family members. In this exploratory research study, we
sought to investigate the prevalence of germline variants identified through clinical
tumor-germline sequencing among a cohort of patients across ten major cancer types.
Germline sequencing data from 439 individuals undergoing tumor-germline sequencing
through the LCCC1108/UNCseqโข (NCT01457196) study were analyzed for genetic
variants in 36 hereditary cancer susceptibility genes. Variants indicative of hereditary
cancer predisposition were identified in 19 (4.3%) patients. For about half (10/19), these
findings represent new molecular diagnostic information with potentially important
implications for the patient and their family. Genes with pathogenic variants included the
hereditary cancer genes: ATM, BRCA1, BRCA2, CDKN2A, and CHEK2. Furthermore, a
substantial proportion of patients (178, or 40.5%) had Variants of Uncertain Significance
(VUS), 24 of which had VUS in genes pertinent to the presenting cancer. Overall, with
approximately 4% of cases harboring pathogenic variants in known hereditary cancer
susceptibility genes, diagnostic germline findings such as these could be beneficial for
patients and their familiesBachelor of Scienc
- โฆ