792 research outputs found

    Detecting Early Warning Signal of Influenza A Disease Using Sample-Specific Dynamical Network Biomarkers

    Get PDF

    Integrative Analysis to Investigate Complex Interaction in Alzheimer’s Disease

    Get PDF
    Alzheimer’s disease (AD) is a neurodegenerative disorder featuring progressive cognitive and functional deficits. Pathologically, AD is characterized by tau and amyloid β protein deposition in the brain. As the sixth leading cause of death in the U.S., the disease course usually last from 7 to 10 years on average before the consequential death. In 2019 there are estimated 5.8 million Americans living with AD affecting 16 million family members. At certain stage of the disease course, patients with inability of maintaining their daily functioning highly depend on caregivers, primarily family caregivers, that incur estimated 18.4 billion unpaid hours of cares, which is equivalent to 232 billion dollars. These huge economic burdens and inevitable emotional distress on the family and the society would also increase as the number of AD affected population could triple by 2050. Altered cellular composition is associated with AD progression and decline in cognition, such as neuronal loss and astrocytosis, which is a key feature in neurodegeneration but has often been overlooked in transcriptome research. To explore the cellular composition changes in AD, I developed a deconvolution pipeline for bulk RNA-Seq to account for cell type specific effects in brain tissues. I found that neuronal and astrocyte relative proportions differ between healthy and diseased brains and also among AD cases that carry specific genetic risk variants. Brain carriers of pathogenic mutations in APP, PSEN1, or PSEN2 presented lower neuron and higher astrocyte relative proportions compared to sporadic AD. Similarly, the APOE ε4 allele also showed decreased neuronal and increased astrocyte relative proportions compared to AD non-carriers. In contrast, carriers of variants in TREM2 risk showed a lower degree of neuronal loss compared to matched AD cases in multiple independent studies. These findings suggest that genetic risk factors associated with AD etiology have a specific effect on the cellular composition of AD brains. The digital deconvolution approach provides an enhanced understanding of the fundamental molecular mechanisms underlying neurodegeneration, enabling the analysis of large bulk RNA-sequencing studies for cell composition. It also suggests that correcting for the cellular structure when performing transcriptomic analysis will lead to novel insights of AD. With deconvolution methods to delineate cell population changes in disease condition, it would help interpret transcriptomics results and reveal transcriptional changes in a cell type specific manner. One application demonstrated in this dissertation work is to use cell type proportion as quantitative trait to identify genetic factors associated with cellular composition changes. I performed cell type QTL analysis and identified a common pathway associated with neuronal protection underlying aging brains in the presence or absence of neurodegenerative disease symptoms. A protective variant of TMEM106B, which was previously identified with a protective effect in FTD, was identified to be associated with neuronal proportion in aging brains, suggesting a common pathway underlying neuronal protection and cognitive reservation in elderly. This extended analysis yield from deconvolution results demonstrated one promising direction of using deconvolution followed by cell type QTL analysis in identifying new genes or pathways underlying neurodegenerative or aging brains. To understand the complexity of the brain under disease condition, network analysis as a large-scale system-level approach provides unbiased and data-driven view to identify gene-gene interactions altered by disease status. Using network analysis, I replicated and reconfirmed the co-expression pattern between MS4A gene cluster and TREM2 in sporadic AD, from which further evidence was inferred from Bayesian network analysis to show that MS4A4A might be a potential regulator of TREM2 that is validated by in-vitro experiments. In Autosomal Dominant AD (ADAD) cohort, disrupted and acquired genes were identified from PSEN1 mutation carriers. Among these genes, previously identified AD risk genes and pathways were revealed along with novel findings. These results demonstrated the great potential of applying network approach in identifying disease associated genes and the interactions among them. To conclude the dissertation work from methodological, empirical, and theoretical levels, deconvolution pipeline for bulk RNA-Seq, cell type QTL analysis, and network analysis approaches were applied to understand transcriptome changes underlying disease etiology. From which previous AD related findings were replicated that validated the methods, and novel genes and pathways were identified as potential new therapeutic targets. Based on prior knowledge and empirical evidence observed from this dissertation work, a model is proposed to explain how genetic factors are assembled as a highly interconnected interactome network to affect proteinopathy observed in neurodegenerative disorders, that cause cellular composition changes in the brain, which ultimately leads to cognitive and functional deficits observed in AD patients

    Application of early warning signs to physiological contexts: a comparison of multivariate indices in patients on long-term hemodialysis

    Get PDF
    Early warnings signs (EWSs) can anticipate abrupt changes in system state, known as “critical transitions,” by detecting dynamic variations, including increases in variance, autocorrelation (AC), and cross-correlation. Numerous EWSs have been proposed; yet no consensus on which perform best exists. Here, we compared 15 multivariate EWSs in time series of 763 hemodialyzed patients, previously shown to present relevant critical transition dynamics. We calculated five EWSs based on AC, six on variance, one on cross-correlation, and three on AC and variance. We assessed their pairwise correlations, trends before death, and mortality predictive power, alone and in combination. Variance-based EWSs showed stronger correlations (r = 0.663 ± 0.222 vs. 0.170 ± 0.205 for AC-based indices) and a steeper increase before death. Two variance-based EWSs yielded HR95 > 9 (HR95 standing for a scale-invariant metric of hazard ratio), but combining them did not improve the area under the receiver-operating curve (AUC) much compared to using them alone (AUC = 0.798 vs. 0.796 and 0.791). Nevertheless, the AUC reached 0.825 when combining 13 indices. While some indicators did not perform overly well alone, their addition to the best performing EWSs increased the predictive power, suggesting that indices combination captures a broader range of dynamic changes occurring within the system. It is unclear whether this added benefit reflects measurement error of a unified phenomenon or heterogeneity in the nature of signals preceding critical transitions. Finally, the modest predictive performance and weak correlations among some indices call into question their validity, at least in this context

    Mechanism-driven hypothesis generation support for a predictive adverse effect in colorectal cancer treatment

    Get PDF
    Diese bioinformatische Dissertation beschreibt die tumorbiologische Hypothesengenierung, insbesondere im Kontext des Kolorektalkarzinoms. Hintergrund der Studien ist eine Beobachtung aus der klinischen Praxis. Verschiedene Autoren berichten, dass bei der Behandlung mit Inhibitoren des Epidermalen Wachstumsfaktor Rezeptors (EGFR), speziell des therapeutischen Antikörpers Cetuximab, eine Minderheit der Patienten die übliche Nebenwirkung der Hauttoxizität nicht oder in deutlich verminderter Form zeigt. Bei diesen Patienten wird gleichzeitig eine reduzierte Wirksamkeit der Therapie beschrieben. Das Ausbleiben der Nebenwirkung wird somit als phänotypischer Biomarker genutzt, um gegebenenfalls die Therapie anzupassen. Nachteilig erscheint in diesem Kontext allerdings die präventive Hautpflege sowie die Tatsache, dass eine Cetuximab-Behandlung zunächst gestartet werden muss, um eine Information über die Wirksamkeit zu gewinnen. Dadurch, dass der zugrunde liegende molekulare Mechanismus unbekannt ist, kann keine Vorhersage anhand eines klinischen Test getroffen werden. In der vorliegenden Arbeit war es das Ziel, Hypothesen zu generieren, welche Proteine und zellulären Signalwege kausal für das unterschiedliche Ansprechverhalten der Patientengruppen sein könnten. Ausgehend von der Annahme, dass natürliche Keimbahnvarianten in der Erbinformation der Individuen im Behandlungskontext diskriminatorisch wirken, baut die Dissertation auf einem kleinen Datensatz von 23 Exomen von Teilnehmern klinischer Studien auf. Diese Sequenzierungsdaten wurden in genomische Varianten überführt und auf ihren potentiellen genetisch-mechanistischen Einfluss hin untersucht. Gezielte Einschränkungen wurden dabei anhand einer Modellierung des biomedizinischen Kontextes des Anwendungsfalls eingeführt, um die reduzierte Datenlage gezielt mit Informationen anzureichern. Die so erhaltenen Kandidatengene, welche in nachfolgenden praktischen Arbeiten validiert werden müssen, werden im Einzelnen beschrieben und bewertet. Methodisch ist das Ergebnis dieser Dissertation die „Molecular Systems Map“, eine in Cytoscape modellierte Netzwerkstruktur, die funktionelle Interaktionen zwischen Proteinen interaktiv visualisiert und gleichzeitig als Filter auf Basis des biologischen Kontexts dient. Ziel hierbei ist es, einen biomedizinisch ausgebildeten Fachanwender bei der Generierung von Hypothesen zu unterstützen, indem im Gegensatz zu sonst häufig anzutreffenden tabellarischen Ansichten die Ergebnisse aus der Sequenzanalyse in eben jenem funktionalen Kontext dargestellt werden. Darüber hinaus wird so die Anwendung von Graphenalgorithmen und die Integration weiterer Daten ermöglicht, z.B. solcher aus komplementären ‘omics-Experimenten.This bioinformatics thesis describes work and results from a study on a use case in the context of colorectal cancer. Background of the studies is an observation form the clinical practice. Various authors report that upon treatment with inhibitors of the Epidermal Growth Factor Receptor (EGFR), in particular with the therapeutic antibody Cetuximab, a minority of patients does not, or in a clearly reduced form, show common adverse effects of skin toxicity. For these patients, at the same time a reduced efficacy of the therapy is described. The lack of the adverse effect therefore gets used as a phenotypic biomarker for inducing a switch of therapy. However, preventive skin care during treatment, counteracting the biomarker signal, and the necessity to start the therapy first in order to gain the information, appear unfavorable. As the underlying molecular mechanisms remain elusive, predictions ahead of treatment, e.g. by a clinical test, are not possible yet. In the presented work, the aim was to generate hypotheses, which proteins and cellular signaling pathways might be causal for the differentiating response of the patient groups. Starting from the assumption that naturally occurring germline variations functionally discriminate individuals in the context of the treatment, the thesis builds up on a small dataset of 23 exomes of patients from a clinical study context. These sequencing data were processed to genomic variants and analyzed for their potential influence on the mechanistic level. Targeted restrictions were introduced by modeling the biomedical context of the use case in order to enrich the sparse individual data with further information. The obtained candidate genes, which are necessary to be validated in practical studies, are described and evaluated in detail. Methodologically, the result of the thesis is the „Molecular Systems Map“, a network data structure modeled in Cytoscape, interactively visualizing the functional interactions of proteins and simulatenously filtering the called variants upon the biological context. Here, the aim is to enable biomedical domain experts, beyond scrolling tabular information on called variants, to review their experimental data in the functional context and support them in the hypothesis generation process. Additionally, this provides the opportunity to apply graph algorithms and integrate further data, e.g. such from completary ‘omics experiments

    Biomarker lists stability in genomic studies: analysis and improvement by prior biological knowledge integration into the learning process

    Get PDF
    The analysis of high-throughput sequencing, microarray and mass spectrometry data has been demonstrated extremely helpful for the identification of those genes and proteins, called biomarkers, helpful for answering to both diagnostic/prognostic and functional questions. In this context, robustness of the results is critical both to understand the biological mechanisms underlying diseases and to gain sufficient reliability for clinical/pharmaceutical applications. Recently, different studies have proved that the lists of identified biomarkers are poorly reproducible, making the validation of biomarkers as robust predictors of a disease a still open issue. The reasons of these differences are referable to both data dimensions (few subjects with respect to the number of features) and heterogeneity of complex diseases, characterized by alterations of multiple regulatory pathways and of the interplay between different genes and the environment. Typically in an experimental design, data to analyze come from different subjects and different phenotypes (e.g. normal and pathological). The most widely used methodologies for the identification of significant genes related to a disease from microarray data are based on computing differential gene expression between different phenotypes by univariate statistical tests. Such approach provides information on the effect of specific genes as independent features, whereas it is now recognized that the interplay among weakly up/down regulated genes, although not significantly differentially expressed, might be extremely important to characterize a disease status. Machine learning algorithms are, in principle, able to identify multivariate nonlinear combinations of features and have thus the possibility to select a more complete set of experimentally relevant features. In this context, supervised classification methods are often used to select biomarkers, and different methods, like discriminant analysis, random forests and support vector machines among others, have been used, especially in cancer studies. Although high accuracy is often achieved in classification approaches, the reproducibility of biomarker lists still remains an open issue, since many possible sets of biological features (i.e. genes or proteins) can be considered equally relevant in terms of prediction, thus it is in principle possible to have a lack of stability even by achieving the best accuracy. This thesis represents a study of several computational aspects related to biomarker discovery in genomic studies: from the classification and feature selection strategies to the type and the reliability of the biological information used, proposing new approaches able to cope with the problem of the reproducibility of biomarker lists. The study has highlighted that, although reasonable and comparable classification accuracy can be achieved by different methods, further developments are necessary to achieve robust biomarker lists stability, because of the high number of features and the high correlation among them. In particular, this thesis proposes two different approaches to improve biomarker lists stability by using prior information related to biological interplay and functional correlation among the analyzed features. Both approaches were able to improve biomarker selection. The first approach, using prior information to divide the application of the method into different subproblems, improves results interpretability and offers an alternative way to assess lists reproducibility. The second, integrating prior information in the kernel function of the learning algorithm, improves lists stability. Finally, the interpretability of results is strongly affected by the quality of the biological information available and the analysis of the heterogeneities performed in the Gene Ontology database has revealed the importance of providing new methods able to verify the reliability of the biological properties which are assigned to a specific feature, discriminating missing or less specific information from possible inconsistencies among the annotations. These aspects will be more and more deepened in the future, as the new sequencing technologies will monitor an increasing number of features and the number of functional annotations from genomic databases will considerably grow in the next years.L’analisi di dati high-throughput basata sull’utilizzo di tecnologie di sequencing, microarray e spettrometria di massa si è dimostrata estremamente utile per l’identificazione di quei geni e proteine, chiamati biomarcatori, utili per rispondere a quesiti sia di tipo diagnostico/prognostico che funzionale. In tale contesto, la stabilità dei risultati è cruciale sia per capire i meccanismi biologici che caratterizzano le malattie sia per ottenere una sufficiente affidabilità per applicazioni in campo clinico/farmaceutico. Recentemente, diversi studi hanno dimostrato che le liste di biomarcatori identificati sono scarsamente riproducibili, rendendo la validazione di tali biomarcatori come indicatori stabili di una malattia un problema ancora aperto. Le ragioni di queste differenze sono imputabili sia alla dimensione dei dataset (pochi soggetti rispetto al numero di variabili) sia all’eterogeneità di malattie complesse, caratterizzate da alterazioni di più pathway di regolazione e delle interazioni tra diversi geni e l’ambiente. Tipicamente in un disegno sperimentale, i dati da analizzare provengono da diversi soggetti e diversi fenotipi (e.g. normali e patologici). Le metodologie maggiormente utilizzate per l’identificazione di geni legati ad una malattia si basano sull’analisi differenziale dell’espressione genica tra i diversi fenotipi usando test statistici univariati. Tale approccio fornisce le informazioni sull’effetto di specifici geni considerati come variabili indipendenti tra loro, mentre è ormai noto che l’interazione tra geni debolmente up/down regolati, sebbene non differenzialmente espressi, potrebbe rivelarsi estremamente importante per caratterizzare lo stato di una malattia. Gli algoritmi di machine learning sono, in linea di principio, capaci di identificare combinazioni non lineari delle variabili e hanno quindi la possibilità di selezionare un insieme più dettagliato di geni che sono sperimentalmente rilevanti. In tale contesto, i metodi di classificazione supervisionata vengono spesso utilizzati per selezionare i biomarcatori, e diversi approcci, quali discriminant analysis, random forests e support vector machines tra altri, sono stati utilizzati, soprattutto in studi oncologici. Sebbene con tali approcci di classificazione si ottenga un alto livello di accuratezza di predizione, la riproducibilità delle liste di biomarcatori rimane ancora una questione aperta, dato che esistono molteplici set di variabili biologiche (i.e. geni o proteine) che possono essere considerati ugualmente rilevanti in termini di predizione. Quindi in teoria è possibile avere un’insufficiente stabilità anche raggiungendo il massimo livello di accuratezza. Questa tesi rappresenta uno studio su diversi aspetti computazionali legati all’identificazione di biomarcatori in genomica: dalle strategie di classificazione e di feature selection adottate alla tipologia e affidabilità dell’informazione biologica utilizzata, proponendo nuovi approcci in grado di affrontare il problema della riproducibilità delle liste di biomarcatori. Tale studio ha evidenziato che sebbene un’accettabile e comparabile accuratezza nella predizione può essere ottenuta attraverso diversi metodi, ulteriori sviluppi sono necessari per raggiungere una robusta stabilità nelle liste di biomarcatori, a causa dell’alto numero di variabili e dell’alto livello di correlazione tra loro. In particolare, questa tesi propone due diversi approcci per migliorare la stabilità delle liste di biomarcatori usando l’informazione a priori legata alle interazioni biologiche e alla correlazione funzionale tra le features analizzate. Entrambi gli approcci sono stati in grado di migliorare la selezione di biomarcatori. Il primo approccio, usando l’informazione a priori per dividere l’applicazione del metodo in diversi sottoproblemi, migliora l’interpretabilità dei risultati e offre un modo alternativo per verificare la riproducibilità delle liste. Il secondo, integrando l’informazione a priori in una funzione kernel dell’algoritmo di learning, migliora la stabilità delle liste. Infine, l’interpretabilità dei risultati è fortemente influenzata dalla qualità dell’informazione biologica disponibile e l’analisi delle eterogeneità delle annotazioni effettuata sul database Gene Ontology rivela l’importanza di fornire nuovi metodi in grado di verificare l’attendibilità delle proprietà biologiche che vengono assegnate ad una specifica variabile, distinguendo la mancanza o la minore specificità di informazione da possibili inconsistenze tra le annotazioni. Questi aspetti verranno sempre più approfonditi in futuro, dato che le nuove tecnologie di sequencing monitoreranno un maggior numero di variabili e il numero di annotazioni funzionali derivanti dai database genomici crescer`a considerevolmente nei prossimi anni

    Markers of Bone Turnover In Preclinical Development of Drugs for Skeletal Diseases

    Get PDF
    Skeletal tissue is constantly remodeled in a process where osteoclasts resorb old bone and osteoblasts form new bone. Balance in bone remodeling is related to age, gender and genetic factors, but also many skeletal diseases, such as osteoporosis and cancer-induced bone metastasis, cause imbalance in bone turnover and lead to decreased bone mass and increased fracture risk. Biochemical markers of bone turnover are surrogates for bone metabolism and may be used as indicators of the balance between bone resorption and formation. They are released during the remodeling process and can be conveniently and reliably measured from blood or urine by immunoassays. Most commonly used bone formation markers include N-terminal propeptides of type I collagen (PINP) and osteocalcin, whereas tartrate-resistant acid phosphatase isoform 5b (TRACP 5b) and C-terminal cross-linked telopeptide of type I collagen (CTX) are common resorption markers. Of these, PINP has been, until recently, the only marker not commercially available for preclinical use. To date, widespread use of bone markers is still limited due to their unclear biological significance, variability, and insufficient evidence of their prognostic value to reflect long term changes. In this study, the feasibility of bone markers as predictors of drug efficacy in preclinical osteoporosis models was elucidated. A non-radioactive PINP immunoassay for preclinical use was characterized and validated. The levels of PINP, N-terminal mid-fragment of osteocalcin, TRACP 5b and CTX were studied in preclinical osteoporosis models and the results were compared with the results obtained by traditional analysis methods such as histology, densitometry and microscopy. Changes in all bone markers at early timepoints correlated strongly with the changes observed in bone mass and bone quality parameters at the end of the study. TRACP 5b correlated strongly with the osteoclast number and CTX correlated with the osteoclast activity in both in vitro and in vivo studies. The concept “resorption index” was applied to the relation of CTX/TRACP 5b to describe the mean osteoclast activity. The index showed more substantial changes than either of the markers alone in the preclinical osteoporosis models used in this study. PINP was strongly associated with bone formation whereas osteocalcin was associated with both bone formation and resorption. These results provide novel insight into the feasibility of PINP, osteocalcin, TRACP 5b and CTX as predictors of drug efficacy in preclinical osteoporosis models. The results support clinical findings which indicate that short-term changes of these markers reflect long-term responses in bone mass and quality. Furthermore, this information may be useful when considering cost-efficient and clinically predictive drug screening and development assays for mining new drug candidates for skeletal diseases.Luun biokemialliset merkkiaineet luustosairauksien prekliinisessä lääkekehityksessä Luun uudismuodostusta tapahtuu koko elämän ajan. Tässä prosessissa osteoklastit hajottavat vanhaa luuta ja osteoblastit muodostavat uutta luuta. Tasapainoon vaikuttaa ikä, sukupuoli ja perinnöllisyys, mutta myös monissa luustosairauksissa, kuten osteoporoosissa ja syövän luustometastaaseissa, tämä tasapaino on järkkynyt johtaen vähentyneeseen luun määrään ja lisääntyneeseen murtumaherkkyyteen. Luun biokemialliset merkkiaineet kertovat luun aineenvaihdunnasta eli hajotuksen ja muodostuksen välisestä tasapainosta. Merkkiaineita vapautuu luun uudismuodostuksessa ja niitä voidaan helposti ja luotettavasti mitata seerumista tai virtsasta immunomääritysmenetelmillä. Yleisesti käytettyjä luun muodostuksen merkkiaineita ovat tyypin I kollageenin aminoterminaalinen propeptidi (PINP) ja osteokalsiini, sekä luun hajotuksen merkkiaineita tartraatti-resistentti hapan fosfataasi alatyyppi 5b (TRACP 5b) ja tyypin I kollageenin karboksiterminaalinen telopeptidi (CTX). Näistä PINP on ainoa merkkiaine, jolle ei ole aiemmin ollut saatavilla prekliiniseen käyttöön soveltuvaa kaupallista immunomääritysmenetelmää. Tällä hetkellä biokemiallisten merkkiaineiden laajamittainen käyttö on vielä rajoittunutta, koska niihin liittyy paljon biologista ja analyyttistä variaatiota eikä niiden merkitsevyydestä ja käyttökelpoisuudesta pitkän aikavälin muutosta ennustavana tekijänä ole riittävästi näyttöä. Tämän tutkimuksen tavoitteena oli selvittää luuston biokemiallisten merkkiaineiden soveltuvuutta lääkemolekyylien tehokkuuden ennustajina prekliinisissä osteoporoositutkimusmalleissa. Tutkimuksessa karakterisoitiin ja validoitiin PINP:lle prekliiniseen käyttöön kehitetty immunomääritysmenetelmä. PINP:n, osteokalsiinin N-terminaalisen keskifragmentin, TRACP 5b:n ja CTX:n tasoja tutkittiin prekliinisissä osteoporoosimalleissa, ja saatuja tuloksia verrattiin perinteisillä menetelmillä kuten histologialla, tiheysmittauksilla ja mikroskopialla saatuihin tuloksiin. Kaikkien tutkittujen luuston merkkiaineiden alkuvaiheen muutosten havaittiin korreloivan kokeen lopussa nähtyihin luuston rakenteellisiin muutoksiin. TRACP 5b korreloi osteoklastien lukumäärään ja CTX osteoklastien aktiivisuuteen sekä in vitro että in vivo kokeissa. CTX/TRACP 5b suhteelle luotiin termi resorptio-indeksi, joka kuvaa osteoklastien keskimääräistä aktiivisuutta. Indeksi antoi tarkempaa tietoa kuin kumpikaan merkkiaine erikseen käytetyissä prekliinisissä osteoporoosimalleissa. PINP:n havaittiin korreloivan vahvasti luun muodostukseen, kun taas osteokalsiini kuvasi sekä luun muodostusta että hajotusta. Tämän tutkimuksen tulokset antavat uutta tietoa luun biokemiallisten merkkiaineiden soveltuvuudesta lääkemolekyylien tehokkuuden ennustajina prekliinisissä osteoporoosimalleissa. Havainnot tukevat kliinisiä tutkimuksia, joissa merkkiaineiden on havaittu korreloivan myöhemmin nähtävien luuston rakenteellisten muutosten kanssa. Tutkimuksessa saatu tieto auttaa suunniteltaessa kliinisen ennustettavuuden kannalta parempia määritysmenetelmiä, joiden avulla voidaan nopeammin ja tehokkaammin löytää uusia toimivia lääkemolekyylejä luustosairauksien hoitoon.Siirretty Doriast

    Metabolic profiling on 2D NMR TOCSY spectra using machine learning

    Get PDF
    Due to the dynamicity of biological cells, the role of metabolic profiling in discovering biological fingerprints of diseases, and their evolution, as well as the cellular pathway of different biological or chemical stimuli is most significant. Two-dimensional nuclear magnetic resonance (2D NMR) is one of the fundamental and strong analytical instruments for metabolic profiling. Though, total correlation spectroscopy (2D NMR 1H -1H TOCSY) can be used to improve spectral overlap of 1D NMR, strong peak shift, signal overlap, spectral crowding and matrix effects in complex biological mixtures are extremely challenging in 2D NMR analysis. In this work, we introduce an automated metabolic deconvolution and assignment based on the deconvolution of 2D TOCSY of real breast cancer tissue, in addition to different differentiation pathways of adipose tissue-derived human Mesenchymal Stem cells. A major alternative to the common approaches in NMR based machine learning where images of the spectra are used as an input, our metabolic assignment is based only on the vertical and horizontal frequencies of metabolites in the 1H-1H TOCSY. One- and multi-class Kernel null foley–Sammon transform, support vector machines, polynomial classifier kernel density estimation, and support vector data description classifiers were tested in semi-supervised learning and novelty detection settings. The classifiers’ performance was evaluated by comparing the conventional human-based methodology and automatic assignments under different initial training sizes settings. The results of our novel metabolic profiling methods demonstrate its suitability, robustness, and speed in automated nontargeted NMR metabolic analysis

    Pathway Semantics: An Algebraic Data Driven Algorithm to Generate Hypotheses about Molecular Patterns Underlying Disease Progression

    Get PDF
    The overarching goal of the Pathway Semantics Algorithm (PSA) is to improve the in silico identification of clinically useful hypotheses about molecular patterns in disease progression. By framing biomedical questions within a variety of matrix representations, PSA has the flexibility to analyze combined quantitative and qualitative data over a wide range of stratifications. The resulting hypothetical answers can then move to in vitro and in vivo verification, research assay optimization, clinical validation, and commercialization. Herein PSA is shown to generate novel hypotheses about the significant biological pathways in two disease domains: shock / trauma and hemophilia A, and validated experimentally in the latter. The PSA matrix algebra approach identified differential molecular patterns in biological networks over time and outcome that would not be easily found through direct assays, literature or database searches. In this dissertation, Chapter 1 provides a broad overview of the background and motivation for the study, followed by Chapter 2 with a literature review of relevant computational methods. Chapters 3 and 4 describe PSA for node and edge analysis respectively, and apply the method to disease progression in shock / trauma. Chapter 5 demonstrates the application of PSA to hemophilia A and the validation with experimental results. The work is summarized in Chapter 6, followed by extensive references and an Appendix with additional material

    The Pharmacoepigenomics Informatics Pipeline and H-GREEN Hi-C Compiler: Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome

    Full text link
    Over the last decade, biomedical science has been transformed by the epigenome and spatial genome, but the discipline of pharmacogenomics, the study of the genetic underpinnings of pharmacological phenotypes like drug response and adverse events, has not. Scientists have begun to use omics atlases of increasing depth, and inferences relating to the bidirectional causal relationship between the spatial epigenome and gene expression, as a foundational underpinning for genetics research. The epigenome and spatial genome are increasingly used to discover causative regulatory variants in the significance regions of genome-wide association studies, for the discovery of the biological mechanisms underlying these phenotypes and the design of genetic tests to predict them. Such variants often have more predictive power than coding variants, but in the area of pharmacogenomics, such advances have been radically underapplied. The majority of pharmacogenomics tests are designed manually on the basis of mechanistic work with coding variants in candidate genes, and where genome wide approaches are used, they are typically not interpreted with the epigenome. This work describes a series of analyses of pharmacogenomics association studies with the tools and datasets of the epigenome and spatial genome, undertaken with the intent of discovering causative regulatory variants to enable new genetic tests. It describes the potent regulatory variants discovered thereby to have a putative causative and predictive role in a number of medically important phenotypes, including analgesia and the treatment of depression, bipolar disorder, and traumatic brain injury with opiates, anxiolytics, antidepressants, lithium, and valproate, and in particular the tendency for such variants to cluster into spatially interacting, conceptually unified pathways which offer mechanistic insight into these phenotypes. It describes the Pharmacoepigenomics Informatics Pipeline (PIP), an integrative multiple omics variant discovery pipeline designed to make this kind of analysis easier and cheaper to perform, more reproducible, and amenable to the addition of advanced features. It described the successes of the PIP in rediscovering manually discovered gene networks for lithium response, as well as discovering a previously unknown genetic basis for warfarin response in anticoagulation therapy. It describes the H-GREEN Hi-C compiler, which was designed to analyze spatial genome data and discover the distant target genes of such regulatory variants, and its success in discovering spatial contacts not detectable by preceding methods and using them to build spatial contact networks that unite disparate TADs with phenotypic relationships. It describes a potential featureset of a future pipeline, using the latest epigenome research and the lessons of the previous pipeline. It describes my thinking about how to use the output of a multiple omics variant pipeline to design genetic tests that also incorporate clinical data. And it concludes by describing a long term vision for a comprehensive pharmacophenomic atlas, to be constructed by applying a variant pipeline and machine learning test design system, such as is described, to thousands of phenotypes in parallel. Scientists struggled to assay genotypes for the better part of a century, and in the last twenty years, succeeded. The struggle to predict phenotypes on the basis of the genotypes we assay remains ongoing. The use of multiple omics variant pipelines and machine learning models with omics atlases, genetic association, and medical records data will be an increasingly significant part of that struggle for the foreseeable future.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145835/1/ariallyn_1.pd
    corecore