109 research outputs found

    Integrative methods for analyzing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Integrative methods for analysing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

    A Statistical Framework For Nutriomics Data Analysis

    Get PDF
    Nutriomics is a new discipline that investigates the relationship between nutrition and health through the use of high throughput omics technologies. However, the inherent complexity of nutriomics data poses several challenges for data analysis. In this thesis, the author introduces nutriomics and the statistical challenges associated with its analysis. They propose statistical modelling and machine learning methods to tackle three main challenges: non-linearity, high dimensionality, and data heterogeneity. To deal with these challenges, we first propose a statistical framework, that we coin LC-N2G, to test whether the association between nutrition intake and omics features of interest are significantly different from being unrelated. We use public data as an example to show LC-N2G's ability to discover non-linear associations between nutrition and gene expression. Then we propose a statistical method, coined eNODAL, to cluster high-dimensional omics features based on how they respond to nutrition intake. The application of eNODAL to a mouse proteomics nutrition study shows that eNODAL can identify interpretable clusters of proteins with similar responses to diet and drug treatment. Finally, a statistical model, which we call NEMoE, is proposed to uncover the heterogeneous interplay among diet, omics, and health outcomes. We use a microbiome Parkinson’s disease (PD) study to illustrate the method and show that NEMoE is able to identify diet-specific microbial signatures of PD. Overall, this thesis proposes statistical methods to analyze nutriomics data and provides possible future extensions based on the research. The methods proposed in this thesis could help researchers better understand the complex relationships between nutrition and health, ultimately leading to improved health outcomes

    Computational approaches in infectious disease research: Towards improved diagnostic methods

    Get PDF
    Thesis advisor: Kenneth WilliamsDue to overuse and misuse of antibiotics, the global threat of antibiotic resistance is a growing crisis. Three critical issues surrounding antibiotic resistance are the lack of rapid testing, treatment failure, and evolution of resistance. However, with new technology facilitating data collection and powerful statistical learning advances, our understanding of the bacterial stress response to antibiotics is rapidly expanding. With a recent influx of omics data, it has become possible to develop powerful computational methods that make the best use of growing systems-level datasets. In this work, I present several such approaches that address the three challenges around resistance. While this body of work was motivated by the antibiotic resistance crisis, the approaches presented here favor generalization, that is, applicability beyond just one context. First, I present ShinyOmics, a web-based application that allow visualization, sharing, exploration and comparison of systems-level data. An overview of transcriptomics data in the bacterial pathogen Streptococcus pneumoniae led to the hypothesis that stress-susceptible strains have more chaotic gene expression patterns than stress-resistant ones. This hypothesis was supported by data from multiple strains, species, antibiotics and non-antibiotic stress factors, leading to the development of a transcriptomic entropy based, general predictor for bacterial fitness. I show the potential utility of this predictor in predicting antibiotic susceptibility phenotype, and drug minimum inhibitory concentrations, which can be applied to bacterial isolates from patients in the near future. Predictors for antibiotic susceptibility are of great value when there is large phenotypic variability across isolates from the same species. Phenotypic variability is accompanied by genomic diversity harbored within a species. I address the genomic diversity by developing BFClust, a software package that for the first time enables pan-genome analysis with confidence scores. Using pan-genome level information, I then develop predictors of essential genes unique to certain strains and predictors for genes that acquire adaptive mutations under prolonged stress exposure. Genes that are essential offer attractive drug targets, and those that are essential only in certain strains would make great targets for very narrow-spectrum antibiotics, potentially leading the way to personalized therapies in infectious disease. Finally, the prediction of adaptive outcome can lead to predictions of future cross-resistance or collateral sensitivities. Overall, this body of work exemplifies how computational methods can complement the increasingly rapid data generation in the lab, and pave the way to the development of more effective antibiotic stewardship practices.Thesis (PhD) — Boston College, 2020.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Can oral infection be a risk factor for Alzheimer’s disease?

    Get PDF
    Alzheimer’s disease (AD) is a scourge of longevity that will drain enormous resources from public health budgets in the future. Currently, there is no diagnostic biomarker and/or treatment for this most common form of dementia in humans. AD can be of early familial-onset or sporadic with a late-onset. Apart from the two main hallmarks, amyloid-beta and neurofibrillary tangles, inflammation is a characteristic feature of AD neuropathology. Inflammation may be caused by a local central nervous system insult and/or by peripheral infections. Numerous microorganisms are suspected in AD brains ranging from bacteria (mainly oral and non-oral Treponema species), viruses (Herpes simplex type I) and yeasts (Candida species). A causal relationship between periodontal pathogens/non-oral Treponema species of bacteria has been proposed via the amyloid-beta and inflammatory links. Periodontitis constitutes a peripheral oral infection that can provide the brain with intact bacteria and virulence factors and inflammatory mediators due to daily, transient bacteraemias. If and when genetic risk factors meet environmental risk factors in the brain, disease is expressed, in which neurocognition may be impacted, leading to the development of dementia. To achieve the goal of finding a diagnostic biomarker and possible prophylactic treatment for AD, there is an initial need to solve the etiological puzzle contributing to its pathogenesis. This review therefore addresses oral infection as the plausible aetiology of late onset AD (LOAD)

    Mise en place d'approches bioinformatiques innovantes pour l'intégration de données multi-omiques longitudinales

    Get PDF
    Les nouvelles technologies «omiques» à haut débit, incluant la génomique, l'épigénomique, la transcriptomique, la protéomique, la métabolomique ou encore la métagénomique, ont connues ces dernières années un développement considérable. Indépendamment, chaque technologie omique est une source d'information incontournable pour l'étude du génome humain, de l'épigénome, du transcriptome, du protéome, du métabolome, et également de son microbiote permettant ainsi d'identifier des biomarqueurs responsables de maladies, de déterminer des cibles thérapeutiques, d'établir des diagnostics préventifs et d'accroître les connaissances du vivant. La réduction des coûts et la facilité d'acquisition des données multi-omiques à permis de proposer de nouveaux plans expérimentaux de type série temporelle où le même échantillon biologique est séquencé, mesuré et quantifié à plusieurs temps de mesures. Grâce à l'étude combinée des technologies omiques et des séries temporelles, il est possible de capturer les changements d'expressions qui s'opèrent dans un système dynamique pour chaque molécule et avoir une vision globale des interactions multi-omiques, inaccessibles par une approche simple standard. Cependant le traitement de cette somme de connaissances multi-omiques fait face à de nouveaux défis : l'évolution constante des technologies, le volume des données produites, leur hétérogénéité, la variété des données omiques et l'interprétabilité des résultats d'intégration nécessitent de nouvelles méthodes d'analyses et des outils innovants, capables d'identifier les éléments utiles à travers cette multitude d'informations. Dans cette perspective, nous proposons plusieurs outils et méthodes pour faire face aux challenges liés à l'intégration et l'interprétation de ces données multi-omiques particulières. Enfin, l'intégration de données multi-omiques longitudinales offre des perspectives dans des domaines tels que la médecine de précision ou pour des applications environnementales et industrielles. La démocratisation des analyses multi-omiques et la mise en place de méthodes d'intégration et d'interprétation innovantes permettront assurément d'obtenir une meilleure compréhension des écosystèmes biologiques.New high-throughput «omics» technologies, including genomics, epigenomics, transcriptomics, proteomics, metabolomics and metagenomics, have expanded considerably in recent years. Independently, each omics technology is an essential source of knowledge for the study of the human genome, epigenome, transcriptome, proteome, metabolome, and also its microbiota, thus making it possible to identify biomarkers leading to diseases, to identify therapeutic targets, to establish preventive diagnoses and to increase knowledge of living organisms. Cost reduction and ease of multi-omics data acquisition resulted in new experimental designs based on time series in which the same biological sample is sequenced, measured and quantified at several measurement times. Thanks to the combined study of omics technologies and time series, it is possible to capture the changes in expression that take place in a dynamic system for each molecule and get a comprehensive view of the multi-omics interactions, which was inaccessible with a simple standard omics approach. However, dealing with this amount of multi-omics data faces new challenges: continuous technological evolution, large volumes of produced data, heterogeneity, variety of omics data and interpretation of integration results require new analysis methods and innovative tools, capable of identifying useful elements through this multitude of information. In this perspective, we propose several tools and methods to face the challenges related to the integration and interpretation of these particular multi-omics data. Finally, integration of longidinal multi-omics data offers prospects in fields such as precision medicine or for environmental and industrial applications. Democratisation of multi-omics analyses and the implementation of innovative integration and interpretation methods will definitely lead to a deeper understanding of eco-systems biology
    • …
    corecore