1,648 research outputs found

    Inferential stability in systems biology

    Get PDF
    The modern biological sciences are fraught with statistical difficulties. Biomolecular stochasticity, experimental noise, and the “large p, small n” problem all contribute to the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful conclusions from observations. In this thesis, we explore methods for assessing the effects of data variability upon downstream inference, in an attempt to quantify and promote the stability of the inferences we make. We start with a review of existing methods for addressing this problem, focusing upon the bootstrap and similar methods. The key requirement for all such approaches is a statistical model that approximates the data generating process. We move on to consider biomarker discovery problems. We present a novel algorithm for proposing putative biomarkers on the strength of both their predictive ability and the stability with which they are selected. In a simulation study, we find our approach to perform favourably in comparison to strategies that select on the basis of predictive performance alone. We then consider the real problem of identifying protein peak biomarkers for HAM/TSP, an inflammatory condition of the central nervous system caused by HTLV-1 infection. We apply our algorithm to a set of SELDI mass spectral data, and identify a number of putative biomarkers. Additional experimental work, together with known results from the literature, provides corroborating evidence for the validity of these putative biomarkers. Having focused on static observations, we then make the natural progression to time course data sets. We propose a (Bayesian) bootstrap approach for such data, and then apply our method in the context of gene network inference and the estimation of parameters in ordinary differential equation models. We find that the inferred gene networks are relatively unstable, and demonstrate the importance of finding distributions of ODE parameter estimates, rather than single point estimates

    Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA

    Get PDF
    In genome-wide association studies, the primary task is to detect biomarkers in the form of Single Nucleotide Polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs comparing to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently the most commonly used approach is still to analyze one SNP at a time. In this pa- per, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a Majorization-Minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a Multiple Sclerosis data set and simulated data sets and shows promise in biomarker detection

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Artificial intelligence for dementia drug discovery and trials optimization

    Get PDF
    Drug discovery and clinical trial design for dementia have historically been challenging. In part these challenges have arisen from patient heterogeneity, length of disease course, and the tractability of a target for the brain. Applying big data analytics and machine learning tools for drug discovery and utilizing them to inform successful clinical trial design has the potential to accelerate progress. Opportunities arise at multiple stages in the therapy pipeline and the growing availability of large medical data sets opens possibilities for big data analyses to answer key questions in clinical and therapeutic challenges. However, before this goal is reached, several challenges need to be overcome and only a multi-disciplinary approach can promote data-driven decision-making to its full potential. Herein we review the current state of machine learning applications to clinical trial design and drug discovery, while presenting opportunities and recommendations that can break down the barriers to implementation

    Artificial intelligence for dementia drug discovery and trials optimization

    Get PDF
    Drug discovery and clinical trial design for dementia have historically been challenging. In part these challenges have arisen from patient heterogeneity, length of disease course, and the tractability of a target for the brain. Applying big data analytics and machine learning tools for drug discovery and utilizing them to inform successful clinical trial design has the potential to accelerate progress. Opportunities arise at multiple stages in the therapy pipeline and the growing availability of large medical data sets opens possibilities for big data analyses to answer key questions in clinical and therapeutic challenges. However, before this goal is reached, several challenges need to be overcome and only a multi-disciplinary approach can promote data-driven decision-making to its full potential. Herein we review the current state of machine learning applications to clinical trial design and drug discovery, while presenting opportunities and recommendations that can break down the barriers to implementation

    Artificial intelligence for dementia drug discovery and trials optimization

    Get PDF
    Drug discovery and clinical trial design for dementia have historically been challenging. In part these challenges have arisen from patient heterogeneity, length of disease course, and the tractability of a target for the brain. Applying big data analytics and machine learning tools for drug discovery and utilizing them to inform successful clinical trial design has the potential to accelerate progress. Opportunities arise at multiple stages in the therapy pipeline and the growing availability of large medical data sets opens possibilities for big data analyses to answer key questions in clinical and therapeutic challenges. However, before this goal is reached, several challenges need to be overcome and only a multi‐disciplinary approach can promote data‐driven decision‐making to its full potential. Herein we review the current state of machine learning applications to clinical trial design and drug discovery, while presenting opportunities and recommendations that can break down the barriers to implementation

    Differential Micro RNA Expression in PBMC from Multiple Sclerosis Patients

    Get PDF
    Differences in gene expression patterns have been documented not only in Multiple Sclerosis patients versus healthy controls but also in the relapse of the disease. Recently a new gene expression modulator has been identified: the microRNA or miRNA. The aim of this work is to analyze the possible role of miRNAs in multiple sclerosis, focusing on the relapse stage. We have analyzed the expression patterns of 364 miRNAs in PBMC obtained from multiple sclerosis patients in relapse status, in remission status and healthy controls. The expression patterns of the miRNAs with significantly different expression were validated in an independent set of samples. In order to determine the effect of the miRNAs, the expression of some predicted target genes of these were studied by qPCR. Gene interaction networks were constructed in order to obtain a co-expression and multivariate view of the experimental data. The data analysis and later validation reveal that two miRNAs (hsa-miR-18b and hsa-miR-599) may be relevant at the time of relapse and that another miRNA (hsa-miR-96) may be involved in remission. The genes targeted by hsa-miR-96 are involved in immunological pathways as Interleukin signaling and in other pathways as wnt signaling. This work highlights the importance of miRNA expression in the molecular mechanisms implicated in the disease. Moreover, the proposed involvement of these small molecules in multiple sclerosis opens up a new therapeutic approach to explore and highlight some candidate biomarker targets in MS

    Computational modelling of imaging markers to support the diagnosis and monitoring of multiple sclerosis

    Get PDF
    Multiple sclerosis is a leading cause of neurological disability in young adults which affects more than 2.5 million people worldwide. An important substrate of disability accrual is the loss of neurons and connections between them (neurodegeneration) which can be captured by serial brain imaging, especially in the cerebral grey matter. In this thesis in four separate subprojects, I aimed to assess the strength of imaging-derived grey matter volume as a biomarker in the diagnosis, predicting the evolution of multiple sclerosis, and developing a staging system to stratify patients. In total, I retrospectively studied 1701 subjects, of whom 1548 had longitudinal brain imaging data. I used advanced computational models to investigate cross-sectional and longitudinal datasets. In the cross-sectional study, I demonstrated that grey matter volumes could distinguish multiple sclerosis from another demyelinating disorder (neuromyelitis optica) with an accuracy of 74%. In longitudinal studies, I showed that over time the deep grey matter nuclei had the fastest rate of volume loss (up to 1.66% annual loss) across the brain regions in multiple sclerosis. The volume of the deep grey matter was the strongest predictor of disability progression. I found that multiple sclerosis affects different brain areas with a specific temporal order (or sequence) that starts with the deep grey matter nuclei, posterior cingulate cortex, precuneus, and cerebellum. Finally, with multivariate mechanistic and causal modelling, I showed that brain volume loss causes disability and cognitive worsening which can be delayed with a potential neuroprotective treatment (simvastatin). This work provides conclusive evidence that grey matter volume loss affects some brain regions more severely, can predict future disability progression, can be used as an outcome measure in phase II clinical trials, and causes clinical and cognitive worsening. This thesis also provides a subject staging system based on which patients can be scored during multiple sclerosis

    Knowledge Management approaches to model pathophysiological mechanisms and discover drug targets in Multiple Sclerosis

    Get PDF
    Multiple Sclerosis (MS) is one of the most prevalent neurodegenerative diseases for which a cure is not yet available. MS is a complex disease for numerous reasons; its etiology is unknown, the diagnosis is not exclusive, the disease course is unpredictable and therapeutic response varies from patient to patient. There are four established subtypes of MS, which are segregated based on different characteristics. Many environmental and genetic factors are considered to play a role in MS etiology, including viral infection, vitamin D deficiency, epigenetical changes and some genes. Despite the large body of diverse scientific knowledge, from laboratory findings to clinical trials, no integrated model which portrays the underlying mechanisms of the disease state of MS is available. Contemporary therapies only provide reduction in the severity of the disease, and there is an unmet need of efficient drugs. The present thesis provides a knowledge-based rationale to model MS disease mechanisms and identify potential drug candidates by using systems biology approaches. Systems biology is an emerging field which utilizes the computational methods to integrate datasets of various granularities and simulate the disease outcome. It provides a framework to model molecular dynamics with their precise interaction and contextual details. The proposed approaches were used to extract knowledge from literature by state of the art text mining technologies, integrate it with proprietary data using semantic platforms, and build different models (molecular interactions map, agent based models to simulate disease outcome, and MS disease progression model with respect to time). For better information representation, disease ontology was also developed and a methodology of automatic enrichment was derived. The models provide an insight into the disease, and several pathways were explored by combining the therapeutics and the disease-specific prescriptions. The approaches and models developed in this work resulted in the identification of novel drug candidates that are backed up by existing experimental and clinical knowledge
    corecore