132 research outputs found
Phenotypic and genetic subtyping of hypertension – toward personalized hypertension care
Current knowledge of phenotypic and genotypic hypertension risk factors has not been effectively translated into personalized hypertension care. The aim of this thesis was to explore hypertension subtyping by applying publicly available supervised and unsupervised subtyping algorithms to large datasets with extensive phenotyping and genotyping.
This thesis included participants from two large Finnish studies: 32,442 from FINRISK and 218,792 from FinnGen. FINRISK is a cross-sectional population survey carried out every five years on risk factors for chronic, non-communicable diseases. FinnGen is a public-private partnership research project combining imputed genotype data from biobanks, patient cohorts, and prospective epidemiological surveys. Because every Finnish citizen is linked to health registers via a personal identity code, accurate follow-up is possible for all major end points, including hypertension and cardiovascular disease. In addition, we used publicly available genome-wide association data from several large-scale studies, including the UK Biobank.
In FINRISK, we observed a phenotypic hypertension subgroup characterized by high blood sugar and elevated body mass index, conferring an increased risk for cardiovascular disease. In a genotyped subset of FINRISK, systolic and diastolic blood pressure polygenic risk scores improved the predictive power of an externally validated clinical hypertension risk equation. Using publicly available genetic association data, we observed four genetic hypertension components corresponding to recognizable clinical features and demonstrated their clinical relevance in FINRISK and FinnGen.
In conclusion, data support the existence of a hyperglycemic hypertension subtype and robust genetic hypertension subtypes. Our findings demonstrate the current ability and future potential of genetics together with methodological development to improve personalized hypertension care.Verenpainetaudin alatyypitys fenotyypin ja genotyypin avulla
Nykyistä ymmärrystä verenpainetaudin fenotyypillisistä ja geneettisistä riskitekijöistä ei ole tehokkaasti hyödynnetty verenpainetaudin yksilöllisen hoidon mahdollistamiseksi. Tämän väitöskirjatutkimuksen tavoitteena oli tutkia verenpainetaudin alatyypitystä soveltamalla julkisesti saatavilla olevia ohjatun ja ohjaamattoman oppimisen algoritmeja suuriin feno- ja genotyypitettyihin tutkimusaineistoihin.
Tutkimuksessa hyödynnettiin osallistujia kahdesta suuresta suomalaisesta tutkimuksesta: 32,442 FINRISKIstä ja 218,792 FinnGenistä. FINRISKI on viiden vuoden välein toteutettava väestötutkimus kroonisten tarttumattomien tautien riski- ja suojatekijöistä. FinnGen on julkisen ja yksityisen sektorin yhteinen tutkimushanke joka yhdistää imputoitua geneettistä tietoa biopankeista, potilaskohorteista ja prospektiivisista epidemiologisista tutkimuksista. Koska jokainen Suomen kansalainen on yhdistetty terveydenhuollon rekistereihin henkilötunnuksella, pitkäaikaisseuranta on mahdollista kaikkien merkittävien päätepisteiden osalta verenpainetauti ja sydän- ja verisuonitaudit mukaan lukien. Lisäksi tutkimuksessa hyödynnettiin julkisesti saatavilla olevaa tietoa geneettisistä assosiaatioista muun muassa UK Biobank -tutkimuksesta.
FINRISKIssä havaittiin fenotyypityksen perusteella verenpainetaudin alatyyppi, jonka erityispiirteitä olivat korkea verensokeri ja kohonnut painoindeksi. Alatyyppi oli yhteydessä kohonneeseen sydän- ja verisuonitautiriskiin. FINRISKIn genotyypitetyssä alaryhmässä osoitettiin, että systolisen ja diastolisen verenpaineen polygeeniset riskipisteet paransivat verenpainetaudin puhkeamista ennustavan kliinisen riskilaskurin ennustevoimaa. Hyödyntämällä julkista tietoa verenpainetaudin geneettisistä assosiaatioista havaittiin neljä verenpainetaudin geneettistä osatekijää, joiden kliininen merkitys osoitettiin FINRISKIä ja FinnGeniä apuna käyttäen.
Löydökset viittaavat verenpainetaudin hyperglykeemisen fenotyypin ja usean geneettisen alatyypin olemassaoloon. Genetiikkaa ja menetelmäoppia yhdistämällä on nyt ja tulevaisuudessa mahdollista parantaa verenpainetaudin yksilöllistä hoitoa
Recommended from our members
Electronic Health Record-Derived Phenotyping Models to Improve Genomic Research in Stroke
Stroke is a highly heterogeneous and complex disease that is a leading cause of death in the United States. The landscape of risk factors for stroke is vast, and its large genetic burden has yet to be fully discovered. We hypothesize that the small number of stroke variants recovered so far is due to 1) the vast phenotypic heterogeneity of stroke and 2) binary labeling of stroke genome-wide association study (GWAS) participants as cases or controls. Specifically, genome-wide association studies accumulate hundreds of thousands to millions of participants to acquire adequate signal for variant discovery. This requires time-consuming manual curation of cases and controls often involving large-scale collaborations. Genetic biobanks connected to electronic health records (EHR) can facilitate these studies by using data routinely captured during clinical care like billing diagnosis codes. These data, however, do not define adjudicated cases and controls, with many patients falling somewhere in between. There is an opportunity to use machine learning to add nuance to these definitions. We hypothesize that an expanded definition of disease by incorporating correlated diseases and risk factors from EHR data will improve GWAS power. We also hypothesize that granularly subtyping stroke using unsupervised learning methods can provide insight into stroke etiology and heterogeneity. In Chapter 1, we described the motivation for building upon current phenotyping methods for subtyping and genome-wide association studies to improve GWAS power. In Chapter 2, using patients from Columbia-New York Presbyterian (NYP) Hospital, we built and evaluated machine learning models to identify patients with acute ischemic stroke based on 75 different case-control and classifier combinations. In chapter 3, we compared two data-driven and unsupervised methods, non-negative matrix factorization (NMF) and Hierarchical Poisson Factorization, to subtype stroke patients and determined whether any of the subtypes correlate to stroke severity. In chapter 4, we estimated the heritability of acute ischemic stroke by treating the patient probabilities assigned by the machine learning phenotyping models for acute ischemic stroke in chapter 2 as a quantitative trait and mapping the probabilities to Columbia-NYP EHR-generated pedigrees. We also applied our machine learning phenotyping algorithm method, which we call QTPhenProxy, to venous thromboembolism on Columbia eMERGE Consortium patients and ran a genome-wide association study using the model probabilities as a quantitative trait. Finally, we applied QTPhenProxy to subjects in the UK Biobank for stroke and 14 other diseases and ran genome-wide association studies for each disease. We found that our machine-learned models performed well in identifying acute ischemic stroke patients in the Columbia-NYP EHR and in the UK Biobank. We also found some NMF-derived subtypes that were significantly correlated with stroke severity. We were underpowered in the eMERGE venous thromboembolism cohort GWAS and did not recover any known or new variants. Finally, we found that QTPhenProxy improved the power of GWAS of stroke and several subtypes in the UK Biobank, recovered known variants, and discovered a new variant that replicates in a previous stroke GWAS. Our results for QTPhenProxy demonstrate the promise of incorporating large but messy sets of data, such as the electronic health record, to improve signal in genome-wide association studies
Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm: A Review
Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans
Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm
Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans
Systems Analytics and Integration of Big Omics Data
A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome
Recommended from our members
Learning and validating clinically meaningful phenotypes from electronic health data
The ever-growing adoption of electronic health records (EHR) to record patients' health journeys has resulted in vast amounts of heterogeneous, complex, and unwieldy information [Hripcsak and Albers, 2013]. Distilling this raw data into clinical insights presents great opportunities and challenges for the research and medical communities. One approach to this distillation is called computational phenotyping. Computational phenotyping is the process of extracting clinically relevant and interesting characteristics from a set of clinical documentation, such as that which is recorded in electronic health records (EHRs). Clinicians can use computational phenotyping, which can be viewed as a form of dimensionality reduction where a set of phenotypes form a latent space, to reason about populations, identify patients for randomized case-control studies, and extrapolate patient disease trajectories. In recent years, high-throughput computational approaches have made strides in extracting potentially clinically interesting phenotypes from data contained in EHR systems.
Tensor factorization methods have shown particular promise in deriving phenotypes. However, phenotyping methods via tensor factorization have the following weaknesses: 1) the extracted phenotypes can lack diversity, which makes them more difficult for clinicians to reason about and utilize in practice, 2) many of the tensor factorization methods are unsupervised and do not utilize side information that may be available about the population or about the relationships between the clinical characteristics in the data (e.g., diagnoses and medications), and 3) validating the clinical relevance of the extracted phenotypes requires domain training and expertise. This dissertation addresses all three of these limitations. First, we present tensor factorization methods that discover sparse and concise phenotypes in unsupervised, supervised, and semi-supervised settings. Second, via two tools we built, we show how to leverage domain expertise in the form of publicly available medical articles to evaluate the clinical validity of the discovered phenotypes. Third, we combine tensor factorization and the phenotype validation tools to guide the discovery process to more clinically relevant phenotypes.Computational Science, Engineering, and Mathematic
Computational modeling of aging-related gene networks: a review
The aging process is a complex and multifaceted phenomenon affecting all living organisms. It involves a gradual deterioration of tissue and cellular function, leading to a higher risk of developing various age-related diseases (ARDs), including cancer, neurodegenerative, and cardiovascular diseases. The gene regulatory networks (GRNs) and their respective niches are crucial in determining the aging rate. Unveiling these GRNs holds promise for developing novel therapies and diagnostic tools to enhance healthspan and longevity. This review examines GRN modeling approaches in aging, encompassing differential equations, Boolean/fuzzy logic decision trees, Bayesian networks, mutual information, and regression clustering. These approaches provide nuanced insights into the intricate gene-protein interactions in aging, unveiling potential therapeutic targets and ARD biomarkers. Nevertheless, outstanding challenges persist, demanding more comprehensive datasets and advanced algorithms to comprehend and predict GRN behavior accurately. Despite these hurdles, identifying GRNs associated with aging bears immense potential and is poised to transform our comprehension of human health and aging. This review aspires to stimulate further research in aging, fostering the innovation of computational approaches for promoting healthspan and longevity
Deep learning for health outcome prediction
Modern medical data contains rich information that allows us to make new types of inferences to predict health outcomes. However, the complexity of modern medical data has rendered many classical analysis approaches insufficient.
Machine learning with deep neural networks enables computational models to process raw data and learn useful representations with multiple levels of abstraction.
In this thesis, I present novel deep learning methods for health outcome prediction from brain MRI and genomic data.
I show that a deep neural network can learn a biomarker from structural brain MRI and that this biomarker provides a useful measure for investigating brain and systemic health, can augment neuroradiological research and potentially serve as a decision-support tool in clinical environments. I also develop two tensor methods for deep neural networks: the first, tensor dropout, for improving the robustness of deep neural networks, and the second, Kronecker machines, for combining multiple sources of data to improve prediction accuracy. Finally, I present a novel deep learning method for predicting polygenic risk scores from genome sequences by leveraging both local and global interactions between genetic variants.
These contributions demonstrate the benefits of using deep learning for health outcome prediction in both research and clinical settings.Open Acces
- …