Search CORE

6 research outputs found

Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries.

Author: An Ulzee,
Publication venue
Publication date: 23/12/2023
Field of study

Ezid

Recommended from our members

Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries.

Author: Alvarez Marcus
An Ulzee
Bacanu Silviu
Cai Na
Dahl Andy
Flint Jonathan
Huang Lianyun
Kendler Kenneth
Pajukanta Päivi
Pazokitoroudi Ali
Sankararaman Sriram
Schork Andrew
Zaitlen Noah
Publication venue: eScholarship, University of California
Publication date: 01/12/2023
Field of study

Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or fill-in missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets

eScholarship - University of California

Recommended from our members

Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder

Author: An Ulzee
Appadurai Vivek
Bacanu Silviu-Alin
Border Richard
Cai Na
Dahl Andrew
Flint Jonathan
Kendler Kenneth S
Krebs Morten
Sankararaman Sriram
Schork Andrew J
Thompson Michael
Werge Thomas
Publication venue: eScholarship, University of California
Publication date: 01/12/2023
Field of study

Biobanks often contain several phenotypes relevant to diseases such as major depressive disorder (MDD), with partly distinct genetic architectures. Researchers face complex tradeoffs between shallow (large sample size, low specificity/sensitivity) and deep (small sample size, high specificity/sensitivity) phenotypes, and the optimal choices are often unclear. Here we propose to integrate these phenotypes to combine the benefits of each. We use phenotype imputation to integrate information across hundreds of MDD-relevant phenotypes, which significantly increases genome-wide association study (GWAS) power and polygenic risk score (PRS) prediction accuracy of the deepest available MDD phenotype in UK Biobank, LifetimeMDD. We demonstrate that imputation preserves specificity in its genetic architecture using a novel PRS-based pleiotropy metric. We further find that integration via summary statistics also enhances GWAS power and PRS predictions, but can introduce nonspecific genetic effects depending on input. Our work provides a simple and scalable approach to improve genetic studies in large biobanks by integrating shallow and deep phenotypes

eScholarship - University of California

A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity.

Author: Akos Rudas
Brandon Jew
Brian L Hill
Brunilda Balliu
David Goodman-Meza
Eleazar Eskin
Elior Rahmani
Eran Halperin
Faysal G Saab
Jeffrey N Chiang
Jennifer A Fulcher
Joseph Ebinger
Misagh Kordi
Nancy Sun
Patrick Botting
Paul C Adamson
Rachel Brook
Ulzee An
Vladimir Manuel
Zeyuan Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Worldwide, testing capacity for SARS-CoV-2 is limited and bottlenecks in the scale up of polymerase chain reaction (PCR-based testing exist. Our aim was to develop and evaluate a machine learning algorithm to diagnose COVID-19 in the inpatient setting. The algorithm was based on basic demographic and laboratory features to serve as a screening tool at hospitals where testing is scarce or unavailable. We used retrospectively collected data from the UCLA Health System in Los Angeles, California. We included all emergency room or inpatient cases receiving SARS-CoV-2 PCR testing who also had a set of ancillary laboratory features (n = 1,455) between 1 March 2020 and 24 May 2020. We tested seven machine learning models and used a combination of those models for the final diagnostic classification. In the test set (n = 392), our combined model had an area under the receiver operator curve of 0.91 (95% confidence interval 0.87-0.96). The model achieved a sensitivity of 0.93 (95% CI 0.85-0.98), specificity of 0.64 (95% CI 0.58-0.69). We found that our machine learning algorithm had excellent diagnostic metrics compared to SARS-CoV-2 PCR. This ensemble machine learning algorithm to diagnose COVID-19 has the potential to be used as a screening tool in hospital settings where PCR testing is scarce or unavailable

Directory of Open Access Journals

eScholarship - University of California

Recommended from our members

A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity.

Author: Adamson Paul C
An Ulzee
Balliu Brunilda
Botting Patrick
Brook Rachel
Chen Zeyuan
Chiang Jeffrey N
Ebinger Joseph
Eskin Eleazar
Fulcher Jennifer A
Goodman-Meza David
Halperin Eran
Hill Brian L
Jew Brandon
Kordi Misagh
Manuel Vladimir
Rahmani Elior
Rudas Akos
Saab Faysal G
Sun Nancy
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

eScholarship - University of California