5 research outputs found

    Connecting Repositories to the Global Research Community: A Re-Curation Process

    Get PDF
    Over the last decade, significant changes have affected the work that data repositories of all kinds do. First, the emergence of globally unique and persistent identifiers (PIDs) has created new opportunities for repositories to engage with the global research community by connecting existing repository resources to the global research infrastructure. Second, repository use cases have evolved from data discovery to data discovery and reuse, significantly increasing metadata requirements.To respond to these evolving requirements, we need retrospective and on-going curation, i.e. re-curation, processes that 1) find identifiers and add them to existing metadata to connect datasets to a wider range of communities, and 2) add elements that support reuse to globally connected metadata.The goal of this work is to introduce the concept of re-curation with representative examples that are generally applicable to many repositories: 1) increasing completeness of affiliations and identifiers for organizations and funders in the Dryad Repository and 2) measuring and increasing FAIRness of DataCite metadata beyond required fields for institutional repositories.These re-curation efforts are a critical part of reshaping existing metadata and repository processes so they can take advantage of new connections, engage with global research communities, and facilitate data reuse

    Generation and Applications of Knowledge Graphs in Systems and Networks Biology

    Get PDF
    The acceleration in the generation of data in the biomedical domain has necessitated the use of computational approaches to assist in its interpretation. However, these approaches rely on the availability of high quality, structured, formalized biomedical knowledge. This thesis has the two goals to improve methods for curation and semantic data integration to generate high granularity biological knowledge graphs and to develop novel methods for using prior biological knowledge to propose new biological hypotheses. The first two publications describe an ecosystem for handling biological knowledge graphs encoded in the Biological Expression Language throughout the stages of curation, visualization, and analysis. Further, the second two publications describe the reproducible acquisition and integration of high-granularity knowledge with low contextual specificity from structured biological data sources on a massive scale and support the semi-automated curation of new content at high speed and precision. After building the ecosystem and acquiring content, the last three publications in this thesis demonstrate three different applications of biological knowledge graphs in modeling and simulation. The first demonstrates the use of agent-based modeling for simulation of neurodegenerative disease biomarker trajectories using biological knowledge graphs as priors. The second applies network representation learning to prioritize nodes in biological knowledge graphs based on corresponding experimental measurements to identify novel targets. Finally, the third uses biological knowledge graphs and develops algorithmics to deconvolute the mechanism of action of drugs, that could also serve to identify drug repositioning candidates. Ultimately, the this thesis lays the groundwork for production-level applications of drug repositioning algorithms and other knowledge-driven approaches to analyzing biomedical experiments

    Utvrđivanje povezanosti genotipa i fenotipa hipertrofične kardiomiopatije primenom mašinskog učenja

    Get PDF
    Hypertrophic cardiomyopathy (HCM) is the most prevailing heritable cardiomyopathy. HCM is diagnosed by the existence of left ventricular hypertrophy despite the lack of abnormal loading conditions causing it. HCM is a heterogeneous disease regarding genetic mutations. Clinical manifestations and prognosis vary widely as well. Some patients are completely asymptomatic, in some others, severe heart failure and sudden cardiac death may arise. Definitive genotype-phenotype associations are still unknown. Machine learning (ML) is a subdiscipline of artificial intelligence, wherein computer algorithms are used for learning complex patterns from data. The aim of this research was to decipher genotype-phenotype associations in HCM using ML. The study was multi-centric and retroprospective, and involved 143 adult HCM patients. Medical and family history, anthropometric measurements, genetic testing, blood markers, transthoracic echocardiography with Doppler, cardiopulmonary exercise testing (CPET), ECG and ECG-holter-monitoring data were collected and further analysed. HCM subphenotypes were identified using clustering. Associations of genotype and phenotype were evaluated used Python modules Scikit-learn and SHapley Additive exPlanation (SHAP). Genotype-specific echocardiogram findings were identified using Python deep learning (DL) and computer vision library Fast AI, by generation of DL models for classification of ultrasonic images, and later analysis of the most decisive image regions. Four HCM subtypes were identified based on the overall phenotypic appearance: cluster 0 (“AHOLD”), distinguishable by aortic root diameter (AO) and lactate dehydrogenase (LDH), with values mostly AO > 30 mm, and LDH > 300 U/L; cluster 1 (“RVSP ASCAOVS”), distinguishable by right ventricle systolic pressure (RVSP), diameter of ascending aorta (AscAO), and aortic leaflet separation diameter (AOvs), with the values of RVSP 27 m/s; cluster 2 (“weight”), recognizable by weight, wherein values being mostly > 95 kg; and cluster 3 (“AV LVOT PG”) distinguishable by aortic valve mean pressure gradient (AV meanPG), aortic valve peak pressure gradient (AV maxPG), and left ventricular outflow tract peak gradient (LVOT maxPG) wherein AV maxPG > 15 mmHg, AV meanPG > 6 mmHg, and LVOT maxPG > 15 mmHg. ML algorithms confirmed that the determination of genotype-phenotype associations in HCM is a cumbersome task. Two phenotypic outcomes that can be predicted from mutated genes are the absence or presence of sinus rhythm and the absence or presence of myocardial injury. Models predicting the absence or presence of sinus rhythm had similar performance when they were built using only causative genes and when using all analyzed genes, indicating potential importance of causative genes and irrelevance of non-causative genes for that outcome. On the other hand, models predicting myocardial injury — infarction had better performance when they were built using all analyzed genes (and not just causative ones), indicating a potentially significant role of non-causative genes in that outcome. The ML algorithms were able to predict phenotypic outcomes — fatigue, dyspnea, chest pain, palpitations, syncope, heart murmur, pretibial edema, systolic anterior motion, papillary muscle abnormalities, hypokinesia, atrial fibrillation (AF), first-degree atrioventricular (AV) block, left bundle branch block (LBBB), right bundle branch block (RBBB), left anterior hemiblock, ST segment abnormalities, and negative T wave — using genotypic and phenotypic data. The combination of a mutation in TNNT2 and peak respiratory exchange ratio (RER) contributed the most in predicting fatigue. The combination of a mutation in MYBPC3 and peak VO2 contributed the most in predicting dyspnea. The combination of a mutation in TNNI3 and high-density lipoprotein (HDL) level contributed the most in predicting chest pain. The combination of a mutation in MYH7 and pacemaker/defibrillator implants in family history, as well as the combination of a mutation in TNNT2 and left atrial volume (LAV), contributed the most in predicting heart murmur. Lastly, the combination of a mutation in MYBPC3 and transmitral maximal pressure gradient (MV maxPG) aided the most in predicting negative T wave. Genotype-specific echocardiogram findings were identified: for mutations in the MYH7 gene (vs. mutation not detected), the most discriminative structures are the left ventricular outflow tract, septum, anterior wall, apex, right ventricle, and mitral apparatus; for mutations in the TNNT2 gene (vs. mutation not detected), the most discriminative structures are septum and right ventricle; while for mutations in MYBPC3 gene (vs. mutation not detected) these are septum, left ventricle, and left ventricle chamber. ML has thus been demonstrated to be useful in deciphering genotype-phenotype associations in HCM.Hipertrofična kardiomiopatija (HCM) je najčešća nasledna kardiomiopatija. Dijagnoza HCM se postavlja na osnovu prisustva hipertrofije leve komore, uz isključivanje drugih uzroka hipertrofije. U pogledu genetičkih mutacija, HCM je heterogena bolest. Kliničke manifestacije i prognoza takođe mogu da budu veoma različite. Kod nekih pacijenata HCM je potpuno asimptomatska, dok kod drugih mogu da se razviju teška srčana insuficijencija i iznenadna srčana smrt. Povezanost genotipa i fenotipa HCM još uvek nije u potpunosti utvrđena. Mašinsko učenje je subdisciplina veštačke inteligencije u kojoj se kompjuterski algoritmi koriste za učenje kompleksnih šablona iz podataka. Cilj ovog istraživanja je bilo utvrđivanje povezanosti genotipa i fenotipa HCM primenom mašinskog učenja. Studija je bila multicentrična i retroprospektivna, obuhvatila je 143 odrasla pacijenta sa potvrđenom dijagnozom HCM. Anamnestički podaci, antropometrijska merenja, rezultati genetičkog testiranja, biohemijskih analiza, nalazi transtorakalne ehokardiografije sa doplerom, kardiopulmonalnog testa fizičkim opterećenjem, elektrokardiograma (EKG) i EKG-holter-monitoringa su prikupljeni i korišćeni u daljoj analizi. HCM subfenotipi su identifikovani klasterizacijom. Povezanost genotipa i fenotipa je evaluirana korišćenjem Python modula Scikit-learn i SHapley Additive exPlanation (SHAP). Genotip-specifični nalazi ehokardiograma su identifikovani korišćenjem Python biblioteke za duboko učenje i računarski vid Fast AI, izradom modela za klasifikaciju ehokardiograma i naknadnom analizom regiona koji su najviše doprineli razlikovanju klasa. Četiri podtipa HCM su identifikovana na osnovu svih dostupnih podataka o fenotipu: klaster 0 (“AHOLD”), koji se razlikuje od ostalih na osnovu prečnika korena aorte (AO) i laktat dehidrogenaze (LDH), pri čemu su vrednosti AO > 30 mm i LDH > 300 U/L; klaster 1 (“RVSP ASCAOVS”), koji se razlikuje od ostalih na osnovu sistolnog pritiska desne komore (RVSP), dijametra ascedentne aorte (AscAO), i separacije aortnih kuspisa (AOvs), pri čemu su vrednosti AOvs > 27 m/s, AscAO 95 kg; i klaster 3 (“AV LVOT PG”) koji se razlikuje od ostalih na osnovu srednjeg gradijenta pritisaka nad aortnom valvulom (AV meanPG), maksimalnog gradijenta pritisaka nad aortnom valvulom (AV maxPG), i maksimalnog gradijenta pritisaka nad izlaznim traktom leve komore (LVOT maxPG), pri čemu su vrednosti AV maxPG > 15 mmHg, AV meanPG > 6 mmHg, i LVOT maxPG > 15 mmHg. Algoritmi mašinskog učenja su potvrdili da utvrđivanje povezanosti genotipa i fenotipa HCM nije jednostavan zadatak. Predikcija ishoda fenotipa na osnovu informacije o mutiranim genima je moguća za prisustvo ili odsustvo sinusnog ritma i prisustvo ili odsustvo oštećenja miokarda. Modeli koji vrše predikciju prisustva ili odsustva sinusnog ritma su imali slične performanse kada su izrađeni samo na osnovu uzročnih gena za HCM i kada su izrađeni na osnovu svih analiziranih gena što sugeriše mogući značaj uzročnih gena za HCM i irelevantnost drugih analiziranih gena za ovaj ishod. Modeli koji vrše predikciju oštećenja miokarda su imali bolje performanse kada su korišćeni podaci o svim analiziranim genima (a ne samo o uzročnim genima za HCM), što sugeriše moguću važnu ulogu gena koji nisu uzročni, za ovaj ishod. Algoritmi mašinskog učenja su izvršili predikciju sledećih ishoda na osnovu podataka o genotipu i fenotipu: zamor, dispneja, bol u grudima, palpitacije, sinkopa, šum na srcu, pretibijalni edem, pokretanje mitralnog zalistka unapred (SAM), abnormalnost papilarnih mišića, hipokinezija, atrijalna fibrilacija, atrioventrikularni blok prvog stepena, blok leve grane (LBBB), blok desne grane (RBBB), prednji levi hemiblok, abnormalnosti ST segmenta, i negativni T talas. Prilikom predikcije zamora, najveći doprinos je imala kombinacija mutacije u TNNT2 i maksimalnog odnosa disajne razmene (RER). Prilikom predikcije dispneje najveći doprinos imala je kombinacija mutacije u MYBPC3 i vršne potrošnje kiseonika (peak VO2). Prilikom predikcije bola u grudima, najveći doprinos je imala kombinacija mutacije u TNNI3 i koncentracije lipoproteina visoke gustine (eng. high-density lipoprotein, HDL). Prilikom predikcije šuma na srcu najveći doprinos imala je kombinacija mutacije u MYH7 i podatka o implantiranju pejsmejkera/defibrilatora u porodičnoj istoriji, kao i kombinacija mutacije u TNNT2 i zapremine leve pretkomore (LAV). Prilikom predikcije negativnog T talasa, najveći doprinos imala je kombinacija mutacije u MYBPC3 i vrednosti transmitralnog maksimalnog gradijenta pritiska (MV maxPG). Identifikovani su genotip-specifični nalazi ehokardiograma: za mutaciju u MYH7 genu (nasuprot negativnom rezultatu na mutacije u analiziranim genima), strukture koje najviše utiču na raspoznavanje su septum, izlazni trakt leve komore (LVOT), prednji zid, vrh srca, desna komora i mitralni aparat; za mutaciju u TNNT2 genu (nasuprot negativnom rezultatu na mutacije u analiziranim genima) strukture koje najviše utiču na raspoznavanje su septum i desna komora; dok su za mutaciju u MYBPC3 genu (nasuprot negativnom rezultatu na mutacije u analiziranim genima) ove strukture septum, leva komora i šupljina leve komore. Mašinsko učenje je na ovaj način doprinelo u određenoj meri izučavanju povezanosti genotipa i fenotipa HCM

    Utvrđivanje povezanosti genotipa i fenotipa hipertrofične kardiomiopatije primenom mašinskog učenja

    Get PDF
    Hypertrophic cardiomyopathy (HCM) is the most prevailing heritable cardiomyopathy. HCM is diagnosed by the existence of left ventricular hypertrophy despite the lack of abnormal loading conditions causing it. HCM is a heterogeneous disease regarding genetic mutations. Clinical manifestations and prognosis vary widely as well. Some patients are completely asymptomatic, in some others, severe heart failure and sudden cardiac death may arise. Definitive genotype-phenotype associations are still unknown. Machine learning (ML) is a subdiscipline of artificial intelligence, wherein computer algorithms are used for learning complex patterns from data. The aim of this research was to decipher genotype-phenotype associations in HCM using ML. The study was multi-centric and retroprospective, and involved 143 adult HCM patients. Medical and family history, anthropometric measurements, genetic testing, blood markers, transthoracic echocardiography with Doppler, cardiopulmonary exercise testing (CPET), ECG and ECG-holter-monitoring data were collected and further analysed. HCM subphenotypes were identified using clustering. Associations of genotype and phenotype were evaluated used Python modules Scikit-learn and SHapley Additive exPlanation (SHAP). Genotype-specific echocardiogram findings were identified using Python deep learning (DL) and computer vision library Fast AI, by generation of DL models for classification of ultrasonic images, and later analysis of the most decisive image regions. Four HCM subtypes were identified based on the overall phenotypic appearance: cluster 0 (“AHOLD”), distinguishable by aortic root diameter (AO) and lactate dehydrogenase (LDH), with values mostly AO > 30 mm, and LDH > 300 U/L; cluster 1 (“RVSP ASCAOVS”), distinguishable by right ventricle systolic pressure (RVSP), diameter of ascending aorta (AscAO), and aortic leaflet separation diameter (AOvs), with the values of RVSP 27 m/s; cluster 2 (“weight”), recognizable by weight, wherein values being mostly > 95 kg; and cluster 3 (“AV LVOT PG”) distinguishable by aortic valve mean pressure gradient (AV meanPG), aortic valve peak pressure gradient (AV maxPG), and left ventricular outflow tract peak gradient (LVOT maxPG) wherein AV maxPG > 15 mmHg, AV meanPG > 6 mmHg, and LVOT maxPG > 15 mmHg. ML algorithms confirmed that the determination of genotype-phenotype associations in HCM is a cumbersome task. Two phenotypic outcomes that can be predicted from mutated genes are the absence or presence of sinus rhythm and the absence or presence of myocardial injury. Models predicting the absence or presence of sinus rhythm had similar performance when they were built using only causative genes and when using all analyzed genes, indicating potential importance of causative genes and irrelevance of non-causative genes for that outcome. On the other hand, models predicting myocardial injury — infarction had better performance when they were built using all analyzed genes (and not just causative ones), indicating a potentially significant role of non-causative genes in that outcome. The ML algorithms were able to predict phenotypic outcomes — fatigue, dyspnea, chest pain, palpitations, syncope, heart murmur, pretibial edema, systolic anterior motion, papillary muscle abnormalities, hypokinesia, atrial fibrillation (AF), first-degree atrioventricular (AV) block, left bundle branch block (LBBB), right bundle branch block (RBBB), left anterior hemiblock, ST segment abnormalities, and negative T wave — using genotypic and phenotypic data. The combination of a mutation in TNNT2 and peak respiratory exchange ratio (RER) contributed the most in predicting fatigue. The combination of a mutation in MYBPC3 and peak VO2 contributed the most in predicting dyspnea. The combination of a mutation in TNNI3 and high-density lipoprotein (HDL) level contributed the most in predicting chest pain. The combination of a mutation in MYH7 and pacemaker/defibrillator implants in family history, as well as the combination of a mutation in TNNT2 and left atrial volume (LAV), contributed the most in predicting heart murmur. Lastly, the combination of a mutation in MYBPC3 and transmitral maximal pressure gradient (MV maxPG) aided the most in predicting negative T wave. Genotype-specific echocardiogram findings were identified: for mutations in the MYH7 gene (vs. mutation not detected), the most discriminative structures are the left ventricular outflow tract, septum, anterior wall, apex, right ventricle, and mitral apparatus; for mutations in the TNNT2 gene (vs. mutation not detected), the most discriminative structures are septum and right ventricle; while for mutations in MYBPC3 gene (vs. mutation not detected) these are septum, left ventricle, and left ventricle chamber. ML has thus been demonstrated to be useful in deciphering genotype-phenotype associations in HCM.Hipertrofična kardiomiopatija (HCM) je najčešća nasledna kardiomiopatija. Dijagnoza HCM se postavlja na osnovu prisustva hipertrofije leve komore, uz isključivanje drugih uzroka hipertrofije. U pogledu genetičkih mutacija, HCM je heterogena bolest. Kliničke manifestacije i prognoza takođe mogu da budu veoma različite. Kod nekih pacijenata HCM je potpuno asimptomatska, dok kod drugih mogu da se razviju teška srčana insuficijencija i iznenadna srčana smrt. Povezanost genotipa i fenotipa HCM još uvek nije u potpunosti utvrđena. Mašinsko učenje je subdisciplina veštačke inteligencije u kojoj se kompjuterski algoritmi koriste za učenje kompleksnih šablona iz podataka. Cilj ovog istraživanja je bilo utvrđivanje povezanosti genotipa i fenotipa HCM primenom mašinskog učenja. Studija je bila multicentrična i retroprospektivna, obuhvatila je 143 odrasla pacijenta sa potvrđenom dijagnozom HCM. Anamnestički podaci, antropometrijska merenja, rezultati genetičkog testiranja, biohemijskih analiza, nalazi transtorakalne ehokardiografije sa doplerom, kardiopulmonalnog testa fizičkim opterećenjem, elektrokardiograma (EKG) i EKG-holter-monitoringa su prikupljeni i korišćeni u daljoj analizi. HCM subfenotipi su identifikovani klasterizacijom. Povezanost genotipa i fenotipa je evaluirana korišćenjem Python modula Scikit-learn i SHapley Additive exPlanation (SHAP). Genotip-specifični nalazi ehokardiograma su identifikovani korišćenjem Python biblioteke za duboko učenje i računarski vid Fast AI, izradom modela za klasifikaciju ehokardiograma i naknadnom analizom regiona koji su najviše doprineli razlikovanju klasa. Četiri podtipa HCM su identifikovana na osnovu svih dostupnih podataka o fenotipu: klaster 0 (“AHOLD”), koji se razlikuje od ostalih na osnovu prečnika korena aorte (AO) i laktat dehidrogenaze (LDH), pri čemu su vrednosti AO > 30 mm i LDH > 300 U/L; klaster 1 (“RVSP ASCAOVS”), koji se razlikuje od ostalih na osnovu sistolnog pritiska desne komore (RVSP), dijametra ascedentne aorte (AscAO), i separacije aortnih kuspisa (AOvs), pri čemu su vrednosti AOvs > 27 m/s, AscAO 95 kg; i klaster 3 (“AV LVOT PG”) koji se razlikuje od ostalih na osnovu srednjeg gradijenta pritisaka nad aortnom valvulom (AV meanPG), maksimalnog gradijenta pritisaka nad aortnom valvulom (AV maxPG), i maksimalnog gradijenta pritisaka nad izlaznim traktom leve komore (LVOT maxPG), pri čemu su vrednosti AV maxPG > 15 mmHg, AV meanPG > 6 mmHg, i LVOT maxPG > 15 mmHg. Algoritmi mašinskog učenja su potvrdili da utvrđivanje povezanosti genotipa i fenotipa HCM nije jednostavan zadatak. Predikcija ishoda fenotipa na osnovu informacije o mutiranim genima je moguća za prisustvo ili odsustvo sinusnog ritma i prisustvo ili odsustvo oštećenja miokarda. Modeli koji vrše predikciju prisustva ili odsustva sinusnog ritma su imali slične performanse kada su izrađeni samo na osnovu uzročnih gena za HCM i kada su izrađeni na osnovu svih analiziranih gena što sugeriše mogući značaj uzročnih gena za HCM i irelevantnost drugih analiziranih gena za ovaj ishod. Modeli koji vrše predikciju oštećenja miokarda su imali bolje performanse kada su korišćeni podaci o svim analiziranim genima (a ne samo o uzročnim genima za HCM), što sugeriše moguću važnu ulogu gena koji nisu uzročni, za ovaj ishod. Algoritmi mašinskog učenja su izvršili predikciju sledećih ishoda na osnovu podataka o genotipu i fenotipu: zamor, dispneja, bol u grudima, palpitacije, sinkopa, šum na srcu, pretibijalni edem, pokretanje mitralnog zalistka unapred (SAM), abnormalnost papilarnih mišića, hipokinezija, atrijalna fibrilacija, atrioventrikularni blok prvog stepena, blok leve grane (LBBB), blok desne grane (RBBB), prednji levi hemiblok, abnormalnosti ST segmenta, i negativni T talas. Prilikom predikcije zamora, najveći doprinos je imala kombinacija mutacije u TNNT2 i maksimalnog odnosa disajne razmene (RER). Prilikom predikcije dispneje najveći doprinos imala je kombinacija mutacije u MYBPC3 i vršne potrošnje kiseonika (peak VO2). Prilikom predikcije bola u grudima, najveći doprinos je imala kombinacija mutacije u TNNI3 i koncentracije lipoproteina visoke gustine (eng. high-density lipoprotein, HDL). Prilikom predikcije šuma na srcu najveći doprinos imala je kombinacija mutacije u MYH7 i podatka o implantiranju pejsmejkera/defibrilatora u porodičnoj istoriji, kao i kombinacija mutacije u TNNT2 i zapremine leve pretkomore (LAV). Prilikom predikcije negativnog T talasa, najveći doprinos imala je kombinacija mutacije u MYBPC3 i vrednosti transmitralnog maksimalnog gradijenta pritiska (MV maxPG). Identifikovani su genotip-specifični nalazi ehokardiograma: za mutaciju u MYH7 genu (nasuprot negativnom rezultatu na mutacije u analiziranim genima), strukture koje najviše utiču na raspoznavanje su septum, izlazni trakt leve komore (LVOT), prednji zid, vrh srca, desna komora i mitralni aparat; za mutaciju u TNNT2 genu (nasuprot negativnom rezultatu na mutacije u analiziranim genima) strukture koje najviše utiču na raspoznavanje su septum i desna komora; dok su za mutaciju u MYBPC3 genu (nasuprot negativnom rezultatu na mutacije u analiziranim genima) ove strukture septum, leva komora i šupljina leve komore. Mašinsko učenje je na ovaj način doprinelo u određenoj meri izučavanju povezanosti genotipa i fenotipa HCM
    corecore