2,349 research outputs found

    Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function

    Get PDF
    In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10−9) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10−4-2.2 × 10−7. Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in genera

    Genetic prediction of quantitative traits: a machine learner's guide focused on height

    Full text link
    Machine learning and deep learning have been celebrating many successes in the application to biological problems, especially in the domain of protein folding. Another equally complex and important question has received relatively little attention by the machine learning community, namely the one of prediction of complex traits from genetics. Tackling this problem requires in-depth knowledge of the related genetics literature and awareness of various subtleties associated with genetic data. In this guide, we provide an overview for the machine learning community on current state of the art models and associated subtleties which need to be taken into consideration when developing new models for phenotype prediction. We use height as an example of a continuous-valued phenotype and provide an introduction to benchmark datasets, confounders, feature selection, and common metrics

    Applying Process-Oriented Data Science to Dentistry

    Get PDF
    Background: Healthcare services now often follow evidence-based principles, so technologies such as process and data mining will help inform their drive towards optimal service delivery. Process mining (PM) can help the monitoring and reporting of this service delivery, measure compliance with guidelines, and assess effectiveness. In this research, PM extracts information about clinical activity recorded in dental electronic health records (EHRs) converts this into process-models providing stakeholders with unique insights to the dental treatment process. This thesis addresses a gap in prior research by demonstrating how process analytics can enhance our understanding of these processes and the effects of changes in strategy and policy over time. It also emphasises the importance of a rigorous and documented methodological approach often missing from the published literature. Aim: Apply the emerging technology of PM to an oral health dataset, illustrating the value of the data in the dental repository, and demonstrating how it can be presented in a useful and actionable manner to address public health questions. A subsidiary aim is to present the methodology used in this research in a way that provides useful guidance to future applications of dental PM. Objectives: Review dental and healthcare PM literature establishing state-of-the-art. Evaluate existing PM methods and their applicability to this research’s dataset. Extend existing PM methods achieving the aims of this research. Apply PM methods to the research dataset addressing public health questions. Document and present this research’s methodology. Apply data-mining, PM, and data-visualisation to provide insights into the variable pathways leading to different outcomes. Identify the data needed for PM of a dental EHR. Identify challenges to PM of dental EHR data. Methods: Extend existing PM methods to facilitate PM research in public health by detailing how data extracts from a dental EHR can be effectively managed, prepared, and used for PM. Use existing dental EHR and PM standards to generate a data reference model for effective PM. Develop a data-quality management framework. Results: Comparing the outputs of PM to established care-pathways showed that the dataset facilitated generation of high-level pathways but was less suitable for detailed guidelines. Used PM to identify the care pathway preceding a dental extraction under general anaesthetic and provided unique insights into this and the effects of policy decisions around school dental screenings. Conclusions: Research showed that PM and data-mining techniques can be applied to dental EHR data leading to fresh insights about dental treatment processes. This emerging technology along with established data mining techniques, should provide valuable insights to policy makers such as principal and chief dental officers to inform care pathways and policy decisions

    Statistical methods for clinical genome interpretation with specific application to inherited cardiac conditions

    Get PDF
    Background: While next-generation sequencing has enabled us to rapidly identify sequence variants, clinical application is limited by our ability to determine which rare variants impact disease risk. Aim: Developing computational methods to identify clinically important variants Methods and Results: (1) I built a disease-specific variant classifier for inherited cardiac conditions (ICCs), which outperforms genome-wide tools in a wide range of benchmarking. It discriminates pathogenic variants from benign variants with global accuracy improved by 4-24% over existing tools. Variants classified with >90% confidence are significantly associated with both disease status and clinical outcomes. (2) To better interpret missense variants, I examined evolutionarily equivalent residues across protein domain families, to identify positions intolerant of variations. Homologous residue constraint is a strong predictor of variant pathogenicity. It can identify a subset of de novo missense variants with comparable impact on developmental disorders as protein-truncating variants. Independent from existing approaches, it can also improve the prioritisation of disease-relevant gene for both developmental disorders and inherited hypertrophic cardiomyopathy. (3) TTN-truncating variants are known to cause dilated cardiomyopathy, but the effect of missense variants is poorly understood. Using the approach in (2), I studied the role of TTN missense variants on DCM. Our prioritised residues are enriched with known pathogenic variants, including the two known to cause DCM and others involved in skeletal myopathies. I also found a significant association between constrained variants of TTN I-set domains and DCM in a case-control burden test of Caucasian samples (OR=3.2, 95%CI=1.3-9.4). Within subsets of DCM, the association is replicated in alcoholic cardiomyopathy. (4) Finally, I also developed a tool to annotate 5’UTR variants creating or disrupting upstream open reading frames (uORF). Its utility is demonstrated to detect high-impact uORF-disturbing variants from ClinVar, gnomAD and Genomics England. Conclusion: These studies established broadly applicable methods and improved understanding of ICCs.Open Acces

    Geeniinfo vÀÀrtus sĂŒdame-veresoonkonnahaiguste riski hindamisel

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publiktasiooneFakt, et sĂŒdame-veresoonkonnahaigused on peamiseks suremuse pĂ”hjustajaks maailmas, rĂ”hutab vajadust edendada ja tĂ€iustada olemasolevaid haiguse ennetus- ja ennustusstrateegiaid. SĂŒdame-veresoonkonnahaiguste riski hindamine pĂ”hineb tĂ€nases kliinilises praktikas klassikalisi fenotĂŒĂŒbilisi riskitegureid arvestavatel riski hindamise mudelitel. Kuigi nimetatud strateegia vĂ”imaldab kĂ”rge riskiga indiviide suhteliselt hĂ€sti tuvastada, jÀÀb pea kolmandiku riski hinnang ebatĂ€pseks ning ravimÀÀramine ebaselgeks. Lisaks eelnevale peegeldub mudelite piiratud kasutus selles, et riskifaktorite loetlemisega hinnatakse tegelikkuses molekulaarsel tasandil juba toimunud muutusi. Seega leevendatakse praeguse strateegia kasutamisel pigem patoloogia progresseerunud kulgu, kui pĂ€rsitakse vĂ”i ennetatakse molekulaarsete mehhanismide hĂ€irumist varases staadiumis. Üheks vĂ”imalikuks edasiarenduse meetmeks pakutakse haiguse geneetilise informatsiooni arvestamist. Seda eeskĂ€tt seetĂ”ttu, et sĂŒdame-veresoonkonnahaiguste geneetiliste seoste uuringutega on tĂ€na jĂ”utud hinnanguteni, millel on potentsiaali muuta oluliselt tĂ€psemaks nii tervete indiviidide varast haigusriski hindamist kui ka haigete kliinilist kĂ€sitlust. Selle doktoritöö peamiseks eesmĂ€rgiks on anda ĂŒlevaade tĂ€nastest sĂŒdame-veresoonkonnahaiguste riski hindamise meetmetest ning sellest, kas ja kuidas geneetilise informatsiooni kaasamine igapĂ€eva kliinilistesse otsustesse neid edendada vĂ”iks. Lisaks toon nĂ€iteid, kuidas kĂ”rge resolutsiooniga genoomi jĂ€rjestusandmestik vĂ”imaldaks tunnusega seotud pĂ”hjuslikke geenivariante tĂ€psemini tuvastada ning kuidas populatsiooni-pĂ”hise biopanga andmete kasutamine tĂ”hustaks kĂ”rge riskiga indiviidide kliinilist kĂ€sitlust.Cardiovascular diseases are the main cause of morbidity and mortality worldwide, underscoring the requisite for improved strategies for disease prevention and risk prediction. The main approach applied in today's clinical practice to identify those at increased cardiovascular risk relies on the utilization of phenotypic risk models that facilitate the estimation of one's disease risk based on traditional risk factors. While this strategy is beneficial for avoiding disease incidence and it does on the whole target individuals at high risk for treatment sufficiently well, a third of individuals, who experience an adverse event, are misclassified into a lower risk category and are therefore advocated treatment ambiguously. Importantly, the current approach lacks in providing accurate estimation for primordial prevention, that is estimating risk before risk factors emerge. To overcome this issue and seek for approaches to enhance risk estimation, attention has now been turned to genetics with the aim of incorporating genetic information into established risk prediction strategies. The scrutiny of the genetic architecture of cardiovascular diseases conducted in recent decades has today resulted in estimates that can be of clinical utility and value. This doctoral thesis aims to give an overview of the status quo of the genomic research on cardiovascular diseases and contemplate on what the advances in molecular technology, computational capacities and large-scale initiatives have enabled, what the progress of these endeavours entail and whether these do bestow incremental value for clinical utility. Furthermore, I will bring examples of how the utilization of high-coverage sequencing data can enhance the search for the genetic underpinnings of cardiovascular disease-associated phenotypes, and how the use of large-scale cohorts and population-based biobanks can enable the anticipated improvement in disease risk estimation, especially when integrated into a national healthcare system.https://www.ester.ee/record=b522706
    • 

    corecore