Search CORE

17 research outputs found

ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest.

Author: Coppola Giovanni
Freimer Nelson B
Hwang Sungoo
Jew Brandon
Li Jiajin
Sul Jae Hoon
Zhan Lingyu
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Next-generation sequencing technology (NGS) enables the discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in NGS or variant callers. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present ForestQC, a statistical tool for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our software uses the information on sequencing quality, such as sequencing depth, genotyping quality, and GC contents, to predict whether a particular variant is likely to be false-positive. To evaluate ForestQC, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that ForestQC outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. ForestQC is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is a practical approach to perform quality control on genetic variants from sequencing data

Directory of Open Access Journals

eScholarship - University of California

ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest.

Author: Brandon Jew
Giovanni Coppola
Jae Hoon Sul
Jiajin Li
Lingyu Zhan
Nelson B Freimer
Sungoo Hwang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2019
Field of study

Directory of Open Access Journals

Recommended from our members

ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest.

Author: Coppola Giovanni
Freimer Nelson B
Hwang Sungoo
Jew Brandon
Li Jiajin
Sul Jae Hoon
Zhan Lingyu
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

eScholarship - University of California

The custom making of hierarchical micro/nanoscaled titanium phosphate coatings and their formation mechanism analysis

Author: An
Barrere
Bavykin
Becker
Chen
Chuan
Deij
Eriksson
Fang
Gao
Gellynck
Haider
Hoemann
Ignotz
Jiang
Jiang
Kakihana
Krupa
Krupa
Lai
Li
Li
Malek
Mao
Mitsunori
Muhlebach
Nanev
Park
Park
Privman
Rasmusson
Sul
Sul
Thakral
Wu
Wu
Yang
Zhan
Zhang
Zhu
Zhu
Zinger
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2019
Field of study

Crossref

The effect of bumetanide on photodynamic therapy-induced peri-tumor edema of C6 glioma xenografts

Author: Beck
Candolfi
Canessa
Chen
Ding
Elliott
Flynn
Haas
Haas
Hirschberg
Huang
Ito
Jayakumar
Kahle
Kaplan
Lang
Li
Lu
Lu
Lynn
Madsen
Madsen
Nagaraja
Ozawa
Panet
Panet
Panet
Panet
Rutkowsky
Shiozaki
Smith
Song
Staub
Sul
Tabatabai
Ulmer
Unterberg
Wang
Worrell
Xu
Zelenkov
Zhan
Zhan
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative.

Author: Arboleda Valerie A
Balliu Brunilda
Bhattacharya Arjun
Boulier Kristin
Burch Kathryn S
Butte Manish J
Caggiano Christa
Chiu Alec
Denny Christopher T
Ding Yi
Freund Malika
Geschwind Daniel H
Halperin Eran
Hill Brian
Johnson Ruth
Knyazev Sergey
Lajonchere Clara
Pasaniuc Bogdan
Rakocz Nadav
Sankararaman Sriram
Schwarz Tommer
Sul Jae Hoon
UCLA Precision Health Data Discovery Repository Working Group UCLA Precision Health ATLAS Working Group
Venkateswaran Vidhya
Zaitlen Noah
Zhan Lingyu
Publication venue: eScholarship, University of California
Publication date: 01/09/2022
Field of study

BackgroundLarge medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736).MethodsWe quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes.ResultsWe identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group.ConclusionsOverall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping

PubMed Central

eScholarship - University of California

FigShare