Search CORE

4 research outputs found

FasTag: Automatic text classification of unstructured medical narratives.

Author: Arturo Lopez Pineda
Ashley M Zehnder
Carlos D Bustamante
Guhan Ram Venkataraman
Manuel A Rivas
Oliver J Bear Don't Walk Iv
Rodney L Page
Sandeep Ayyar
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another

Directory of Open Access Journals

Nanoparticle enrichment mass-spectrometry proteomics identifies protein-altering variants for precise pQTL mapping

Author: Anna Halama
Asim Siddiqui
Frank Schmidt
Gaurav Thareja
Guhan Ram Venkataraman
Harendra Guturu
Hina Sarwath
Karsten Suhre
Khatereh Motamedchaboki
Margaret K. R. Donovan
Nisha Stephan
Serafim Batzoglou
Publication venue: Nature Portfolio
Publication date: 01/02/2024
Field of study

Abstract Proteogenomics studies generate hypotheses on protein function and provide genetic evidence for drug target prioritization. Most previous work has been conducted using affinity-based proteomics approaches. These technologies face challenges, such as uncertainty regarding target identity, non-specific binding, and handling of variants that affect epitope affinity binding. Mass spectrometry-based proteomics can overcome some of these challenges. Here we report a pQTL study using the Proteograph™ Product Suite workflow (Seer, Inc.) where we quantify over 18,000 unique peptides from nearly 3000 proteins in more than 320 blood samples from a multi-ethnic cohort in a bottom-up, peptide-centric, mass spectrometry-based proteomics approach. We identify 184 protein-altering variants in 137 genes that are significantly associated with their corresponding variant peptides, confirming target specificity of co-associated affinity binders, identifying putatively causal cis-encoded proteins and providing experimental evidence for their presence in blood, including proteins that may be inaccessible to affinity-based proteomics

Directory of Open Access Journals

Bayesian model comparison for rare-variant association studies

Author: Aguirre Matthew
Bustamante Carlos D.
Daly Mark J.
DeBoever Christopher
Ioannidis Alexander G.
Mostafavi Hakhamanesh
Pirinen Matti
Poterba Timothy
Rivas Manuel A.
Spencer Chris C. A.
Tanigawa Yosuke
Venkataraman Guhan Ram
Publication venue: Cell Press
Publication date: 24/11/2021
Field of study

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSFI3B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

Genetics of 35 blood and urine biomarkers in the UK Biobank

Author: Agarwala Vineeta
Aguirre Matthew
Amar David
Assimes Themistocles L.
Benner Christian
Daly Mark J.
Hastie Trevor
Havulinna Aki S.
Kiiskinen Tuomo
Mars Nina
Ollila Hanna M.
Pirruccello James P.
Pritchard Jonathan K.
Qian Junyang
Ripatti Samuli
Rivas Manuel A.
Rodriguez Fatima
Shcherbina Anna
Sinnott-Armstrong Nasa
Tanigawa Yosuke
Tibshirani Robert
Venkataraman Guhan Ram
Wainberg Michael
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Julkari