56 research outputs found

    Extending principal covariates regression for high-dimensional multi-block data

    Get PDF
    This dissertation addresses the challenge of deciphering extensive datasets collected from multiple sources, such as health habits and genetic information, in the context of studying complex issues like depression. A data analysis method known as Principal Covariate Regression (PCovR) provides a strong basis in this challenge.Yet, analyzing these intricate datasets is far from straightforward. The data often contain redundant and irrelevant variables, making it difficult to extract meaningful insights. Furthermore, these data may involve different types of outcome variables (for instance, the variable pertaining to depression could manifest as a score from a depression scale or a binary diagnosis (yes/no) from a medical professional), adding another layer of complexity.To overcome these obstacles, novel adaptations of PCovR are proposed in this dissertation. The methods automatically select important variables, categorize insights into those originating from a single source or multiple sources, and accommodate various outcome variable types. The effectiveness of these methods is demonstrated in predicting outcomes and revealing the subtle relationships within data from multiple sources.Moreover, the dissertation offers a glimpse of future directions in enhancing PCovR. Implications of extending the method such that it selects important variables are critically examined. Also, an algorithm that has the potential to yield optimal results is suggested. In conclusion, this dissertation proposes methods to tackle the complexity of large data from multiple sources, and points towards where opportunities may lie in the next line of research

    Extending principal covariates regression for high-dimensional multi-block data

    Get PDF
    This dissertation addresses the challenge of deciphering extensive datasets collected from multiple sources, such as health habits and genetic information, in the context of studying complex issues like depression. A data analysis method known as Principal Covariate Regression (PCovR) provides a strong basis in this challenge.Yet, analyzing these intricate datasets is far from straightforward. The data often contain redundant and irrelevant variables, making it difficult to extract meaningful insights. Furthermore, these data may involve different types of outcome variables (for instance, the variable pertaining to depression could manifest as a score from a depression scale or a binary diagnosis (yes/no) from a medical professional), adding another layer of complexity.To overcome these obstacles, novel adaptations of PCovR are proposed in this dissertation. The methods automatically select important variables, categorize insights into those originating from a single source or multiple sources, and accommodate various outcome variable types. The effectiveness of these methods is demonstrated in predicting outcomes and revealing the subtle relationships within data from multiple sources.Moreover, the dissertation offers a glimpse of future directions in enhancing PCovR. Implications of extending the method such that it selects important variables are critically examined. Also, an algorithm that has the potential to yield optimal results is suggested. In conclusion, this dissertation proposes methods to tackle the complexity of large data from multiple sources, and points towards where opportunities may lie in the next line of research

    New approaches in statistical network data analysis

    Get PDF
    This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems. A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed. To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications. Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles. All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschließend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen. Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhĂ€ngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binĂ€ren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert. Um der Arbeit einen generellen Kontext zu geben, wird den angehĂ€ngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben. Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von Großwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Übersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen. Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field

    Novel statistical and bioinformatic tools for identifying predictive metabolic biomarkers in molecular epidemiology studies

    No full text
    A top-down systems biology approach investigating metabolic responses to external stimuli or physiological processes requires multivariate statistical tools to identify metabolites associated with the global biochemical changes in a supra-organism. In this thesis I describe several tools I have developed to improve or supplement currently used methods in molecular epidemiology studies. First, I describe the MetaboNetworks toolbox which is able to create custom, multi-compartmental metabolic reaction networks for a supra-organism, combining both mammalian and microbial reactions. These networks are essentially a summary of the supra-organisms homeostatic signature. Second, I describe a novel statistical spectroscopy approach called STORM which aids in the elucidation of unknown biomarker signals in 1H NMR spectra. Third, I describe the Metabolome-Wide Association Study on obesity in U.S. and U.K. populations. Many novel metabolic associations with obesity are described in a systems framework, among which metabolites associated with energy, skeletal muscle, lipid, amino acid and gut microbial metabolism. Last, I describe a new multivariate approach to adjust for confounders, CA-OPLS. Correcting for confounders is an essential aspect in molecular epidemiology studies as metabolites can be related to a variety of factors such as lifestyle, diet and environmental exposures which or may not be causally related to disease risk. In developing CA-OPLS another aim was to simultaneously eliminate/minimize the effects of different types of sampling bias which are often not taken into account in modelling metabonomics data with current methods.Open Acces

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    Genome-scale modeling of redox metabolism and therapeutic response in radiation-resistant tumors

    Get PDF
    Despite being one of the oldest forms of cancer therapy and still a primary treatment modality, radiation therapy is not effective across all cancer types and tumor resistance to radiation is still not well understood. As our ability to characterize tumor pathophysiology increases with new -omic technologies, a broad clinical goal is prognostic indicators of therapeutic outcomes for personalizing therapeutic regimens. While redox metabolism is a known factor, methods for analyzing systems-level involvement of cellular metabolism in radiation response have not been previously developed. This dissertation presents the construction of novel genome-scale Flux Balance Analysis (FBA) models of individual radiation-sensitive and -resistant patient tumors from The Cancer Genome Atlas (TCGA) to explore the role of redox metabolism in radiation sensitivity, to identify diagnostic and therapeutic biomarkers for radiation response, and to predict response to radiation-sensitizing chemotherapies in radiation-resistant tumors. A novel bioinformatics platform was developed to integrate genomic, transcriptomic, kinetic, and thermodynamic parameters from 716 radiation-sensitive and 199 radiation-resistant TCGA tumors into personalized genome-scale FBA models. Pan-cancer model predictions identified increased mitochondrial production of redox cofactors, including NADPH and glutathione, as well as increased H2O2-scavenging fluxes in radiation-resistant tumors. Simulated gene knockout screens were utilized to discover novel targets in redox metabolism, central carbon metabolism, and folate metabolism which differentially impact antioxidant production and ROS clearance in radiation-resistant tumors; these targets were experimentally validated through siRNA gene knockdown in matched radiation-sensitive and -resistant cancer cell lines among multiple cancer types. Finally, personalized metabolic flux profiles were generated for individual radiation-resistant cancer patients to identify optimal targets for radiation sensitization. This work not only improved upon methodological shortcomings of previous FBA models of cancer metabolism, but is the first to utilize genome-scale modeling for identifying metabolic differences between radiation-sensitive and -resistant tumors that could be exploited for improving radiation sensitivity. Machine learning classifiers were developed which integrate multi-omic data from TCGA patients and novel metabolic outcomes from personalized FBA models to predict radiation sensitivity. A dataset- independent ensemble architecture with gradient boosting models and Bayesian optimization yielded improved predictive accuracy and biomarker detection compared to previously-developed classifiers for radiation response. Experimentally-validated predictions of metabolite production from radiation- sensitive and -resistant FBA tumor models were integrated into multi-omic classifiers; metabolites involved in lipid metabolism, nucleotide metabolism, and immune modulation were identified as having significant associations with radiation response. Subgroups of patients with differing utilities of clinical versus metabolomic datasets for radiation response prediction were discovered, and personalized panels of multi-omic and non-invasive biomarkers with optimal diagnostic utility were developed. This work made significant advancements by being the first to integrate FBA model predictions into machine learning classifiers for cancer treatment outcomes. Finally, FBA models of radiation-resistant TCGA tumors were used to predict response to radiation-sensitizing chemotherapies and investigate their effects on tumor redox metabolism. A novel multi-feature FBA objective function screen was developed, resulting in significant improvements in model predictions of treatment response, as well as identification of redox cofactors directly involved in drug metabolism. The radiation-sensitizing effect of chemotherapeutic treatment was predicted in radiation-resistant tumors by assessing drug-associated decreases in antioxidant levels, and machine learning regressors were utilized to identify multi-omic biomarkers from patient tumors which are associated with increased radiation sensitization. This work was the first to utilize genome-scale modeling to assess the role of chemotherapeutic treatment on tumor redox metabolism and radiation sensitization. In summary, a generalizable framework for creating genome-scale metabolic models of individual patient tumors was developed. The collective properties of these personalized models improved pathophysiological insights into the role of redox metabolism in the tumor responses to radiation and radiation-sensitizing chemotherapies. This framework resulted in a reduced set of clinically-useful biomarkers for both the a priori prediction of radiation response as well as targeted sensitization of radiation-resistant tumors to radiation therapy. This personalized medicine approach represents a paradigm shift in how diagnostic and treatment strategies for radiation-resistant cancer patients are developed, ultimately improving the standard of care for these patients.Ph.D
    • 

    corecore