137 research outputs found

    Genome annotation for clinical genomic diagnostics: strengths and weaknesses

    Get PDF
    The Human Genome Project and advances in DNA sequencing technologies have revolutionized the identification of genetic disorders through the use of clinical exome sequencing. However, in a considerable number of patients, the genetic basis remains unclear. As clinicians begin to consider whole-genome sequencing, an understanding of the processes and tools involved and the factors to consider in the annotation of the structure and function of genomic elements that might influence variant identification is crucial. Here, we discuss and illustrate the strengths and weaknesses of approaches for the annotation and classification of important elements of protein-coding genes, other genomic elements such as pseudogenes and the non-coding genome, comparative-genomic approaches for inferring gene function, and new technologies for aiding genome annotation, as a practical guide for clinicians when considering pathogenic sequence variation. Complete and accurate annotation of structure and function of genome features has the potential to reduce both false-negative (from missing annotation) and false-positive (from incorrect annotation) errors in causal variant identification in exome and genome sequences. Re-analysis of unsolved cases will be necessary as newer technology improves genome annotation, potentially improving the rate of diagnosis

    Identification of genetic factors underpinning phenotypic heterogeneity in Huntington's disease and other neurodegenerative disorders

    Get PDF
    Neurodegenerative diseases including Huntington’s disease (HD), the spinocerebellar ataxias and C9orf72 associated Amyotrophic Lateral Sclerosis / Frontotemporal dementia (ALS/FTD) do not present and progress in the same way in all patients. Instead there is phenotypic variability in age at onset, progression and symptoms. Understanding this variability is not only clinically valuable, but identification of the genetic factors underpinning this variability has the potential to highlight genes and pathways which may be amenable to therapeutic manipulation, hence help find drugs for these devastating and currently incurable diseases. Identification of genetic modifiers of neurodegenerative diseases is the overarching aim of this thesis. To identify genetic variants which modify disease progression it is first necessary to have a detailed characterization of the disease and its trajectory over time. In this thesis clinical data from the TRACK-HD studies, for which I collected data as a clinical fellow, was used to study disease progression over time in HD, and give subjects a progression score for subsequent analysis. In this thesis I show blood transcriptomic signatures of HD status and stage which parallel HD brain and overlap with Alzheimer’s disease brain. Using the Huntington’s disease progression score in a genome wide association study, both a locus on chromosome 5 tagging MSH3, and DNA handling pathways more broadly, are shown to modify HD progression: these results are explored. Transcriptomic signatures associated with HD progression rate are also investigated. In this thesis I show that DNA repair variants also modify age at onset in spinocerebellar ataxias (1, 2, 3, 6, 7 and 17), which are, like HD, caused by triplet repeat expansions, suggesting a common mechanism. Extending this thesis’ examination of the relationship between phenotype and genotype I show that the C9orf72 expansion, normally associated with ALS/FTD, is also the commonest cause of HD phenocopy presentations

    Improving CNV detection from short-read MPS data in neuromuscular disorders

    Get PDF
    Neuromuscular disorders (NMD) are highly heterogenic with around 1000 reported different subtypes. Most are genetic in origin, and some 500 genes are currently identified to cause NMDs. Massively parallel sequencing (MPS) approaches have been widely used to increase the cost-effectiveness and diagnostic yield in the work-up of the genetic molecular diagnosis and to speed up the process. Copy number variants (CNVs), deletions and duplications larger than 50 base pairs, explain approximately 10% of the Mendelian disorders. No best practices pipelines have been developed yet for CNV analysis from MPS data. Therefore, the detection and verification of CNV findings has often involved complementary methods, such as array comparative genomic hybridization (array CGH), multiplex ligation-dependent probe amplification (MLPA) and quantitative PCR approaches. Recently, various CNV detection programs have been developed, but for widely different types of designated research settings, which complicates choosing the correct approach for NMDs. These individual programs have generally exhibited less than ideal sensitivity and specificity for CNV detection. Our aim was to develop a comprehensive pipeline for the detection and annotation of CNVs with high accuracy from targeted gene panel sequencing and whole exome sequencing (WES) data of patients with NMDs. Four different CNV analysis programs were chosen for this study: CoNIFER, XHMM, ExomeDepth and CODEX. The targeted gene panel MYOcap includes 349 genes for myopathic disorders and MNDcap 302 genes for neurogenic disorders in their current panel versions. 2359 samples were sequenced with MYOcap, 942 samples with MNDcap and 262 samples with WES. This included for the targeted gene panels 24 positive control samples with previously characterized CNVs and 31 negative control samples with certain genes verified to not have CNVs. A detection sensitivity of 100% and specificity of 100% were reached for these control samples. Previously undetected CNVs from MYOcap or MNDcap sequenced samples were verified as true positive detections in 36 cases with MLPA, PCR or array CGH, and eight CNVs were verified as false positive detections. These and the positive control samples were utilized in validation of a predictive logistic regression model. In silico CNV generation into MYOcap sequenced samples provided 18,677 specific and 3892 unspecific CNV detections to initially train the model. The model was trained to differentiate true positive detections from false positive detections in order to increase the specificity of the CNV detection pipeline. The advantage of using four different CNV detection programs compared to using them individually, or with any other combination, was demonstrated by CNV detection sensitivity from the set of in silico CNVs. The predictive model with variables from all four programs provided the highest sensitivity (96.6%) and specificity (87.5%) for predicting CNV detections correctly, indicating an accuracy of 95.5% (95% CI 87.3–99.1%). The CNV detection pipeline together with the predictive model was validated for WES samples with control samples with 235 previously characterized CNVs. For CNVs spanning at least three exons, the detection sensitivity was 97.3% and the sensitivity of the predicative model was 99.3% after adjusting the model threshold for WES data. The CNV annotation platform cnvScan was expanded to contain the most recent CNV population databases as well as in-house CNV databases for all the sequenced sample sets. CNV detection results were filtered by < 1% frequency with reciprocal overlap of 90% in the common CNV population databases, with both it and < 5% frequency with 50% reciprocal overlap in the in-house CNV database, and by the true positive prediction with the model. These procedures significantly decreased the workload (with 3–13% of the original CNV detections preserved) in evaluating the CNVs further regarding clinical significance. The added value, i.e. the additional diagnostic yield from CNVs for both the targeted gene panel sequenced samples and WES samples was estimated to be 1.9%. Altogether 39 final genetic diagnoses were solved with these CNV findings. In addition, 18 patient cases had a likely pathogenic finding, and five had a heterozygous CNV likely pathogenic for a recessive disease without association to the patient’s phenotype. The clarified cases included six different DMD deletions or duplications causing dystrophinopathies. In three sequenced familial cases, the detected CNVs in CACNA1A, SGCD and TTN genes co-segregated with the disease. One case had two separate genetic diseases, tibial muscular dystrophy (TMD) and BMD, caused by the founder mutation FINmaj in the gene TTN and a deletion in DMD. Some of the solved cases had novel findings: the second ever reported large intragenic deletion in NEB causing dominant disease, and the first CNV, an intragenic deletion, in TIA1 in a patient diagnosed with Welander distal myopathy (WDM). Some of the genes associated with NMDs are challenging to analyze from short-read sequencing data due to homology or repetitive regions. An additional script was thus written to differentiate copy numbers of the highly homologous genes, SMN1 and SMN2. Two SMN1/SMN2 copy number 0/3 control cases were successfully recognized, and five cases were identified with a possible exon 7 conversion in SMN1 and a compatible spinal muscular atrophy phenotype. The latter findings were considered likely pathogenic and are awaiting further validation on the genomic level. Comparison of CNV detections within the in-house CNV database revealed divergences in the CNV detections within the triplicate repetitive region of NEB with potentially clinically significant changes. One array CGH validated change correlated well with the nemaline rod pathology observed in the patient. CNV analysis utilizing MPS data from targeted gene panels and WES samples provided increased diagnostic yield as reported also in other studies on NMDs. Our multi-algorithm and -platform approach decreased the workload in variant analysis and provided more insight into the many difficult to analyze genomic regions involved in NMDs. In the future, whole genome sequencing and long-read sequencing will likely provide higher resolution for CNV detections and reveal an even wider spectrum of structural genomic variants, together with other emerging comprehensive methods, such as optical mapping.Lihastaudit ovat hyvin heterogeenisiä, ja niistä on kuvattu noin tuhat alatyyppiä. Suurin osa on perinnöllisiä tauteja, ja tähän mennessä on tunnistettu noin 500 eri lihastauteja aiheuttavaa geeniä. Massiivista rinnakkaissekvensointia (MPS) on käytetty laajalti perinnöllisten tautien diagnostisen prosessin nopeuttamiseksi, kustannustehokkuuden parantamiseksi ja lopullisen geeniperäisen diagnoosin saavuttamiseksi. Kopiolukumuutokset, yli 50 emäsparin deleetiot tai duplikaatiot, aiheuttavat arviolta 10 % Mendelin mukaisesti periytyvistä taudeista. Kopiolukumuutosten havaitsemiseen sekvensointidatasta ei ole vielä kehitetty yleisesti hyväksyttyjä ja suositeltuja käytänteitä. Kopiolukumuutosten havaitsemiseksi ja varmistamiseksi käytetäänkin usein täydentäviä menetelmiä, kuten vertaileva genominen hybridisaatio sirulla (aCGH), rinnastettu ligaatio-riippuvainen alukemonistus (MLPA) ja kvantitatiivinen PCR. Kopiolukumuutosten havaitsemiseen sekvensointidatasta on kehitetty useita työkaluja vaihtelevissa tutkimusasetelmissa, mikä hankaloittaa oikean lähestymistavan valitsemista lihastaudeille. Yksittäisten ohjelmien on todettu tuottavan usein epätäsmällisiä ja herkkyydeltään vaihtelevia tai riittämättömiä havaintoja. Tämän tutkimuksen tavoitteena oli kehittää kattava menetelmä kopiolukumuutosten havaitsemiseen ja annotointiin suurella tarkkuudella kohdennetun geenipaneelin ja koko eksomin (WES) sekvensointidatasta lihastautipotilailta. Tutkimukseen valittiin neljä kopiolukumuutosanalyysin työkalua: CoNIFER, XHMM, ExomeDepth ja CODEX. Kohdennetuista geenipaneeleista MYOcap kattaa 349 geeniä lihaspainotteisille taudeille ja MNDcap 302 hermopainotteisille taudeille nykyisissä paneeliversioissa. MYOcap:lla sekvensointiin 2359 näytettä, MNDcap:lla 942 ja WES:llä 262. Kohdennetuilla geenipaneeleilla sekvensointiin 24 positiivista kontrollinäytettä, joissa on aiemmin tunnistettu kopiolukumuutos, ja 31 negatiivista kontrollinäytettä, joissa tietyt geenit oli varmistettu kopiolukumuutoksia sisältämättömiksi. Kontrollinäytteille saavutettiin kehittämällämme menetelmällä 100 % havaitsemisherkkyys ja 100 % tarkkuus. MYOcap:lla tai MNDcap:lla sekvensoiduista näytteistä havaituista kopiolukumuutoksista 36 varmistettiin todellisiksi havainnoiksi MLPA:lla, PCR:lla tai aCGH:llä ja kahdeksan varmistettiin vääriksi positiivisiksi. Nämä ja positiiviset kontrollinäytteet sisällytettiin logistiseen regressioon perustuvan tilastollisen mallin validointiin. Erottelumallin kehitysvaiheessa MYOcap-sekvensoituihin näytteisiin tehtiin in silico kopiolukumuutoksia, mikä tuotti 18677 spesifiä ja 3892 ei-spesifiä kopiolukumuutoshavaintoa mallinnukseen. Malli kehitettiin erottelemaan todelliset kopiolukumuutoshavainnot vääristä positiivista havainnoista havaintomenetelmän tarkkuuden lisäämiseksi. Neljän ohjelman havaintojen käyttämisen paremmuus verrattuna ohjelmien käyttämiseen yksittäin tai muilla yhdistelmillä todennettiin in silico kopiolukumuutosten havaitsemisen herkkyyden tuloksilla. Erottelumalli, jossa oli muuttujia kaikilta neljältä ohjelmalta, saavutti korkeimman herkkyyden (96,6 %), täsmällisyyden (87,5 %) ja tarkkuuden 95,5 % (95 % CI 87,3–99,1 %) kopiolukumuutosten erottelulle. Kopiolukumuutoshavaitsemismenetelmä ja erottelumalli validoitiin WES-kontrollinäytteillä, joissa oli 235 aiemmin tunnistettua kopiolukumuutosta. Havaitsemisherkkyys kopiolukumuutoksille, jotka sisältävät vähintään kolme eksonia oli 97,3 %, ja erottelumallin herkkyys oli 99,3 % kunhan mallin arviointiraja oli uudelleensäädetty WES-datalle. Kopiolukumuutosten annotaatiotyökalu cnvScan laajennettiin sisältämään uusimmat kopiolukumuutospopulaatiotietokannat ja talonsisäinen kopiolukumuutostietokanta kaikista sekvensointinäytejoukoista. Alkuperäiset kopiolukumuutoshavainnot neljältä ohjelmalta suodatettiin 1 % enimmäisyleisyyden ja vastavuoroisen 90 % muutoksen kattamisen vaatimuksella yleisissä kopiolukumuutospopulaatiotietokannoissa, tällä sekä 5 % enimmäisyleisyyden ja vastavuoroisen 50 % muutoksen kattamisen vaatimuksella talonsisäisessä tietokannassa, ja lisäksi erottelumallilla todellisiin havaintoihin. Nämä toimenpiteet vähensivät merkittävästi työmäärää kliinisen merkityksen arvioinnille kopiolukumuutoksille säästäen 3–13 % alkuperäisistä havainnoista. Lisääntyneiden diagnoosien määrä kopiolukumuutoshavaintojen myötä sekä kohdennetuilla geenipaneeleilla että WES-sekvensoiduilla näytteillä oli noin 1,9 %. Kopiolukumuutoshavainnoilla saavutettiin 39 lopullista geneettistä diagnoosia potilaille. Lisäksi 18:lla tutkitulla oli todennäköisesti patogeeninen löydös, ja viidellä tutkitulla havaittiin heterotsygoottinen kopiolukumuutos, jonka arvioitiin olevan patogeeninen peittyvästi periytyvän taudin variantti ilman yhteyttä potilaan taudinkuvaan. Selvitettyihin tapauksiin sisältyi kuusi eri DMD-geenissä olevaa deleetiota tai duplikaatiota, jotka aiheuttivat dystrofinopatioita. Kolme potilasta, joilla oli oireisia perheenjäseniä, sekvensointiin perhetapauksina, ja havaitut kopiolukumuutokset geeneissä CACNA1A, SGCD ja TTN segregoituivat yhdessä taudin kanssa. Yhdellä tutkitulla havaittiin kaksi perinnöllistä tautia, tibiaalinen lihasdystrofia (TMD) ja BMD, joiden aiheuttajina olivat perustajamutaatio FINmaj TTN-geenissä ja deleetio DMD-geenissä. Osalla selvitetyistä tapauksista oli ennen havaitsemattomia löydöksiä: NEB-geenissä toinen koskaan raportoitu iso geeninsisäinen deleetio, joka aiheuttaa vallitsevasti periytyvän taudin, sekä TIA1-geenin geeninsisäinen deleetio, joka on ensimmäinen havaittu kopiolukumuutos TIA1:ssä Welanderin distaalimyopatiaa (WDM) sairastavalla potilaalla. Jotkin geeneistä, jotka on liitetty lihastauteihin, ovat haastavia analysoitavia lyhytlukuisesta sekvensointidatasta homologian ja toistojaksojen takia. Hyvin homologisille geeneille SMN1 ja SMN2 kehitettiin erillinen ohjelma erottelemaan geenien kopiolukumäärät. Kaksi kontrollitapausta tunnistettiin onnistuneesti SMN1 ja SMN2 kopiolukumäärillä 0 ja 3, ja lisäksi tunnistettiin viisi tapausta, joilla on mahdollisesti eksonin 7 konversio SMN1:ssä ja yhteensopiva spinaalinen lihasatrofia. Jälkimmäiset löydökset luokiteltiin todennäköisesti patogeeniseksi, ja ne odottavat genomista lisävarmistusta. Kopiolukumuutoshavaintojen vertailu NEB-geenin triplikaattitoistoalueella talonsisäisessä tietokannassa paljasti eroavaisuuksia, joilla on potentiaalisesti kliinisesti merkitystä. Yksi aCGH:llä varmistettu muutos korreloi selkeästi nemaliinisauvakappalepatologian kanssa, joka potilaalla oli havaittu. Kopiolukumuutoshavainnointi käyttäen sekvensointidataa kohdennetusta geenipaneelista tai WES-näytteistä lisäsi diagnoosien määrää kuten aiemmissa vastaavissa tutkimuksissa lihastaudeille. Käyttämämme usean algoritmin ja alustan lähestymistapa vähensi varianttianalyysin työmäärää ja tarjosi lisää tietoa useista hankalasti analysoitavista genomisista alueista, jotka on liitetty lihastauteihin. Tulevaisuudessa koko genomin sekvensointi ja pitkälukuinen sekvensointi tarjonnevat paremman resoluution kopiolukumuutoksille ja paljastavat enemmän rakenteellisia genomin muutoksia yhdessä muiden kehitteillä olevien kattavien menetelmien kanssa, kuten optinen kartoitus

    Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers

    Get PDF
    Recently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved

    Bioinformatic approaches to determine pathogenicity and function of clinical genetic variants across ion channels and neurodevelopmental disorder associated genes

    Get PDF
    Clinical genetic testing for rare monogenic diseases has the scope of identifying the disease-causing variants. Identification of the molecular etiology of the disease can already today improve clinical care and is essential for the administration of precision medicines that are currently in development for many disorders. However, distinguishing pathogenic variants from benign genetic variants remains a challenge – in particular for missense variants where a single amino acid is substituted. The effects of a pathogenic variant on the protein function, for example, whether it causes a gain (GoF) or a loss (LoF) of the protein function, is most of the time not understood since most genetic variants are ultra-rare and have not been molecularly tested. In particular, for genes associated with severe developmental disorders, first-generation symptomatic treatments offer often only limited relief. Consequently, the development and application of targeted treatments that promise improvement is urgently needed. Identifying the disease-causing pathogenic and predicting their function is crucial as targeted therapies can only be administered to patients with classified pathogenic variants whose functional effects are known to avoid adverse treatment outcomes. In this dissertation, I present bioinformatic approaches to enhance the assessment of variant pathogenicity and understanding of the functional effects of genetic variants. The developed approaches were applied on an exome-wide scale using public datasets and for selected disorders for which I had expert-curated clinical-genetic data available from collaborators. The major focus of this thesis is on genes implicated in neurodevelopmental disorders and diseases associated with ion channel dysfunction for which collaboration with other research groups enabled the aggregation of required genetic, clinical, and functional datasets to develop and test the bioinformatic approaches. In the first study (Bruenger and Ivaniuk et al., in preparation for submission to Genetics in Medicine), we developed a novel approach to extend the application of current variant interpretation guidelines as proposed by the American College of Medical Genetics and Genomics (ACMG). Currently, a major limitation of interpreting the pathogenicity of variants with the ACMG guidelines presents the rare applicability of some of the proposed evidence criteria. We evaluated the potential of incorporating individual pathogenic variants observed in paralogous genes to extend the applicability of two criteria of the guidelines. Our results demonstrated that pathogenic variants in evolutionarily conserved paralogous genes can serve as evidence for a variant's pathogenicity and thus extend the current criteria's applicability by more than four times. We further explored whether the selection of the paralogous pathogenic variants can be improved by incorporating phenotype information. We assembled a clinically well-defined cohort of patients with variants in voltage-gated sodium channels (VGSC) and identified phenotype correlations among paralogous genes based on the shared variant properties. By integrating these phenotype correlations into our proposed extension of the ACMG criteria, we demonstrated an enhanced ability to provide evidence for the pathogenicity of genetic variants in VGSC-encoding genes. In the second study (Brunklaus, Feng, and Bruenger et al., Brain, 2022), we examined whether experimentally obtained functional effects of variants in one VGSC encoding gene could predict function in conserved variants in paralogous genes with high sequence similarity. We aggregated 437 in-vitro functionally tested variants from an intensive literature search and found that the functional effect across conserved variants in paralogous genes was conserved in 94% of cases. Our findings represent the first GoF versus LoF topological map of VGSC proteins, which could guide precision therapy as functionally tested variants are rare across VGSC. We integrated our findings into a publicly accessible webtool (http://SCN-viewer.broadinstitute.org) to facilitate functional variant interpretation across VGSC. In the third study (Bruenger et al., Brain, 2022), we systematically identified biological properties associated with variant pathogenicity across all major voltage and ligand-gated ion-channel families. We discovered and independently replicated that several pore residue properties and proximity to the pore axis were significantly enriched for pathogenic variants compared to population variants across all ion channels. Using a newly developed structural framework, we provide quantitative evidence that variants at the pore showed the strongest pathogenic variant enrichment. Moreover, we found that a hydrophobic pore environment was most strongly associated with variant pathogenicity. Finally, we showed that the identified biological properties correlated with in-vitro functional readouts from 679 variants and clinical phenotypes in 1,422 patients with neurodevelopmental disorders which were collected through collaboration with other research groups. In summary, we identified biological properties associated with ion-channel malfunction and show that these are correlated with in vitro functional readouts and clinical phenotypes in patients with neurodevelopmental disorders. Our results suggest that clinical decision support algorithms that predict variant pathogenicity and function are feasible in the future. In the fourth study (Iqbal and Bruenger et al., Brain, 2022), we developed a novel consensus approach that combines evolutionary and population-based genomic scores to identify 3D essential sites (Essential3D) on protein structures encoded by genes associated with neurodevelopmental disorders (NDDs). NDDs encompass severe clinical conditions caused by pathogenic variants in different genes. However, many of those genes were just recently associated with NDDs and are not well studied. We identified 14,377 Essential3D sites on protein structures encoded by 189 genes and found that these sites were eight-fold enriched for pathogenic versus population controls in an independent cohort of over 360,000 patient and population variants. The Essential3D sites offer insights into molecular mechanisms of protein function, such as key protein-protein interaction sites. The provided annotations are available at https://es-ndd.broadinstitute.org and will guide clinical variant interpretation. In summary, within these major studies in my Ph.D., we aggregated genetic, clinical, and functional datasets and developed bioinformatic approaches to enhance the assessment of variant pathogenicity and improve understanding of the functional effects of genetic variants on protein function. The advances made during my Ph.D. research demonstrate the power of integrating multiple data sources to study novel genetic variants and their implication for rare monogenic diseases. Our approaches specifically improve variant function and pathogenicity assessment in genes implicated in several severe diseases for which currently applied first-generation therapies cannot adequately lower the disease burden. Thus, our results contribute to a new era in precision medicine, where personalized treatments and improved clinical care become increasingly accessible to patients. Finally, the annotations developed in these can serve as a foundation for further studies, including the application of machine learning methods to predict variant pathogenicity and protein functional effects more accurately

    Towards the Next Generation of Clinical Decision Support: Overcoming the Integration Challenges of Genomic Data and Electronic Health Records

    Get PDF
    The wide adoption of electronic health records (EHRs), the unprecedented abundance of genomic data, and the rapid advancements in computational methods have paved the way for next generation clinical decision support (NGCDS) systems. NGCDS provides significant opportunities for the prevention, early detection, and the personalized treatment of complex diseases. The integration of genomic and EHR data into the NGCDS workflow is faced with significant challenges due to the high complexity and sheer magnitude of the associated data. This dissertation performs an in depth investigation to address the computational and algorithmic challenges of integrating genomic and EHR data within the NGCDS workflow. In particular, the dissertation (i) defines the major genomic challenges NGCDS faces and discusses possible resolution directions, (ii) proposes an accelerated method for processing raw genomic data, (iii) introduces a data representation and compression method to store the processed genomic outcomes in a database schema, and finally, (iv) investigates the feasibility of using EHR data to produce accurate disease risk assessments. We hope that the proposed solutions will expedite the adoption of NGCDS and help advance the state of healthcare

    Towards the Next Generation of Clinical Decision Support: Overcoming the Integration Challenges of Genomic Data and Electronic Health Records

    Get PDF
    The wide adoption of electronic health records (EHRs), the unprecedented abundance of genomic data, and the rapid advancements in computational methods have paved the way for next generation clinical decision support (NGCDS) systems. NGCDS provides significant opportunities for the prevention, early detection, and the personalized treatment of complex diseases. The integration of genomic and EHR data into the NGCDS workflow is faced with significant challenges due to the high complexity and sheer magnitude of the associated data. This dissertation performs an in depth investigation to address the computational and algorithmic challenges of integrating genomic and EHR data within the NGCDS workflow. In particular, the dissertation (i) defines the major genomic challenges NGCDS faces and discusses possible resolution directions, (ii) proposes an accelerated method for processing raw genomic data, (iii) introduces a data representation and compression method to store the processed genomic outcomes in a database schema, and finally, (iv) investigates the feasibility of using EHR data to produce accurate disease risk assessments. We hope that the proposed solutions will expedite the adoption of NGCDS and help advance the state of healthcare
    corecore