6 research outputs found

    Coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies

    Get PDF
    Background One of the main challenges of microbiome analysis is its compositional nature that if ignored can lead to spurious results. Addressing the compositional structure of microbiome data is particularly critical in longitudinal studies where abundances measured at different times can correspond to different sub-compositions. Results We developed coda4microbiome, a new R package for analyzing microbiome data within the Compositional Data Analysis (CoDA) framework in both, cross-sectional and longitudinal studies. The aim of coda4microbiome is prediction, more specifically, the method is designed to identify a model (microbial signature) containing the minimum number of features with the maximum predictive power. The algorithm relies on the analysis of log-ratios between pairs of components and variable selection is addressed through penalized regression on the “all-pairs log-ratio model”, the model containing all possible pairwise log-ratios. For longitudinal data, the algorithm infers dynamic microbial signatures by performing penalized regression over the summary of the log-ratio trajectories (the area under these trajectories). In both, cross-sectional and longitudinal studies, the inferred microbial signature is expressed as the (weighted) balance between two groups of taxa, those that contribute positively to the microbial signature and those that contribute negatively. The package provides several graphical representations that facilitate the interpretation of the analysis and the identified microbial signatures. We illustrate the new method with data from a Crohn's disease study (cross-sectional data) and on the developing microbiome of infants (longitudinal data). Conclusions coda4microbiome is a new algorithm for identification of microbial signatures in both, cross-sectional and longitudinal studies.Peer ReviewedPostprint (published version

    Kroak: A metadata collection system for long term microbial community monitoring

    Get PDF
    Amplytica is start-up company whose software, called the Amplytica Cloud Platform, helps organizations determine how microbes influence their bioprocesses. Examples of such bioprocesses include anaerobic digestion, wastewater treatment and mine site reclamation. The Amplytica Cloud Platform does this by integrating and analyzing metagenomically derived microbial community data (species composition, diversity, and abundance) and industrial bioprocess data (e.g. temperature, pH, nutrients). To achieve data integration, industrial bioprocess data is considered metadata to the microbial community information and describes the environmental conditions where the microbial community is found. The capture of this industrial metadata requires a robust metadata capture system. Kroak is a metadata capture system for the Amplytica Cloud Platform that facilitates tagging per-sample microbial community information with industrial environmental metadata. It uses a modern web interface for easy deployment, Office Open XML Workbook (XLSX) template files for easy metadata capture, and metadata classes to ensure data consistency and type identification for follow on automated statistics and machine learning. Kroak is a functional metadata capture system which will be iteratively improved upon by Amplytica. Potential improvements include changes to Kroak’s data model, increasing the reliability of its metadata parsing and the expansion of its existing web application programming interface

    Kroak: A metadata collection system for long term microbial community monitoring

    Get PDF
    Amplytica is start-up company whose software, called the Amplytica Cloud Platform, helps organizations determine how microbes influence their bioprocesses. Examples of such bioprocesses include anaerobic digestion, wastewater treatment and mine site reclamation. The Amplytica Cloud Platform does this by integrating and analyzing metagenomically derived microbial community data (species composition, diversity, and abundance) and industrial bioprocess data (e.g. temperature, pH, nutrients). To achieve data integration, industrial bioprocess data is considered metadata to the microbial community information and describes the environmental conditions where the microbial community is found. The capture of this industrial metadata requires a robust metadata capture system. Kroak is a metadata capture system for the Amplytica Cloud Platform that facilitates tagging per-sample microbial community information with industrial environmental metadata. It uses a modern web interface for easy deployment, Office Open XML Workbook (XLSX) template files for easy metadata capture, and metadata classes to ensure data consistency and type identification for follow on automated statistics and machine learning. Kroak is a functional metadata capture system which will be iteratively improved upon by Amplytica. Potential improvements include changes to Kroak’s data model, increasing the reliability of its metadata parsing and the expansion of its existing web application programming interface

    Microbial Community Structure and Function: Implications for Current and Future Respiratory Therapies

    Get PDF
    Thesis advisor: Babak MomeniDiseases of the upper respiratory tract encompass a plethora of complex multifaceted etiologies ranging from acute viral and bacterial infections to chronic diseases of the lung and nasal cavity. Due to this inherent complexity, typical treatments often fail in the face of recalcitrant infections and/or severe forms of chronic disease, including asthma. Thus, in order to provide improved standard of care, the mechanisms at play in hard-to-treat etiologies must be better understood. More recently, research has demonstrated a significant association between microbiota and many URT diseases. Previous work has also identified species capable of directly inhibiting standard treatments used to control asthma exacerbations. Despite an exhaustive collection of data characterizing microbiota composition in states of both health and disease, our knowledge of what microbiota profiles are observed in what specific disease etiologies is severely lacking. Yet, gaining these insights is crucial for the translation of such data into application. In this thesis I sought to: 1) identify gut microbiota profiles associated with severe and treatment resistant forms of childhood asthma, and 2) formulate a predictive model to facilitate the restructuring of microbiota for desired therapeutic outcomes. To identify gut microbiota and metabolites enriched in severe and treatment resistant childhood asthma, I looked to an ongoing longitudinal human study on vitamin D and childhood asthma. In this study, I find several fecal bacterial taxa and metabolites associated with more severe (i.e., higher wheeze proportion) and treatment resistant asthma in children at age 3 years. Specifically, several Veillonella species were enriched in children with higher wheeze proportion and in children that responded poorly to inhaled corticosteroid treatment (ICS) (i.e., non-responders). Haemophilus parainfluenzae, a species previously identified as enriched in the airway of adults with ICS-resistant asthma, was also uniquely enriched in children considered ICS non-responders in this study. Several metabolic pathways were also distinctly enriched: histidine metabolism was enriched in children with higher wheeze proportion while sphingolipid metabolism was enriched in ICS non-responders. Both metabolic pathways have been previously identified in association with asthma, further corroborating their role in this disease. Yet, this study is the first to identify these taxa and metabolites in children with preexisting and treatment resistant asthma. In the pursuit of improved treatment outcomes for recalcitrant URT diseases, recent efforts have turned towards microbiota-based therapies. While such treatments have proven successful in the treatment of gastrointestinal infections, these methods have not yet been extended to other conditions. Considering this, I ask whether a predictive model describing microbial interactions can facilitate the restructuring of microbiota for desired therapeutic outcomes. For this, I use a community of nasal microbiota to determine when a simply Lotka-Volterra-like (LV) model is a suitable representation for microbial interactions. I then utilize our LV-like model to examine whether environmental fluctuations have a major influence on community assembly and composition. For this, I looked specifically at pH fluctuations. In this study, I found that LV-like models are most suitable for describing community dynamics in complex low nutrient conditions. I also identified simple in vitro experiments that can reliably predict the suitability of a LV-like model for describing outcomes of a two-species community. When our LV-like model was applied to an in silico community of nasal species to determine the impact of environmental fluctuations, I find that nasal communities are generally robust against pH fluctuations and that, in this condition, facilitative interactions are a stabilizing force, and thus, selected for in in silico enrichment experiments. Overall, this thesis further corroborates the association of microbiota with URT diseases and treatment outcomes while also providing unique insight into their association with specific etiologies in childhood asthma. This thesis also provides a framework for developing models able to facilitate the development of future microbiota-based therapies while also determining how, and when, environmental factors impact community assembly and composition.Thesis (PhD) — Boston College, 2021.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology

    Mise en place d'approches bioinformatiques innovantes pour l'intégration de données multi-omiques longitudinales

    Get PDF
    Les nouvelles technologies «omiques» à haut débit, incluant la génomique, l'épigénomique, la transcriptomique, la protéomique, la métabolomique ou encore la métagénomique, ont connues ces dernières années un développement considérable. Indépendamment, chaque technologie omique est une source d'information incontournable pour l'étude du génome humain, de l'épigénome, du transcriptome, du protéome, du métabolome, et également de son microbiote permettant ainsi d'identifier des biomarqueurs responsables de maladies, de déterminer des cibles thérapeutiques, d'établir des diagnostics préventifs et d'accroître les connaissances du vivant. La réduction des coûts et la facilité d'acquisition des données multi-omiques à permis de proposer de nouveaux plans expérimentaux de type série temporelle où le même échantillon biologique est séquencé, mesuré et quantifié à plusieurs temps de mesures. Grâce à l'étude combinée des technologies omiques et des séries temporelles, il est possible de capturer les changements d'expressions qui s'opèrent dans un système dynamique pour chaque molécule et avoir une vision globale des interactions multi-omiques, inaccessibles par une approche simple standard. Cependant le traitement de cette somme de connaissances multi-omiques fait face à de nouveaux défis : l'évolution constante des technologies, le volume des données produites, leur hétérogénéité, la variété des données omiques et l'interprétabilité des résultats d'intégration nécessitent de nouvelles méthodes d'analyses et des outils innovants, capables d'identifier les éléments utiles à travers cette multitude d'informations. Dans cette perspective, nous proposons plusieurs outils et méthodes pour faire face aux challenges liés à l'intégration et l'interprétation de ces données multi-omiques particulières. Enfin, l'intégration de données multi-omiques longitudinales offre des perspectives dans des domaines tels que la médecine de précision ou pour des applications environnementales et industrielles. La démocratisation des analyses multi-omiques et la mise en place de méthodes d'intégration et d'interprétation innovantes permettront assurément d'obtenir une meilleure compréhension des écosystèmes biologiques.New high-throughput «omics» technologies, including genomics, epigenomics, transcriptomics, proteomics, metabolomics and metagenomics, have expanded considerably in recent years. Independently, each omics technology is an essential source of knowledge for the study of the human genome, epigenome, transcriptome, proteome, metabolome, and also its microbiota, thus making it possible to identify biomarkers leading to diseases, to identify therapeutic targets, to establish preventive diagnoses and to increase knowledge of living organisms. Cost reduction and ease of multi-omics data acquisition resulted in new experimental designs based on time series in which the same biological sample is sequenced, measured and quantified at several measurement times. Thanks to the combined study of omics technologies and time series, it is possible to capture the changes in expression that take place in a dynamic system for each molecule and get a comprehensive view of the multi-omics interactions, which was inaccessible with a simple standard omics approach. However, dealing with this amount of multi-omics data faces new challenges: continuous technological evolution, large volumes of produced data, heterogeneity, variety of omics data and interpretation of integration results require new analysis methods and innovative tools, capable of identifying useful elements through this multitude of information. In this perspective, we propose several tools and methods to face the challenges related to the integration and interpretation of these particular multi-omics data. Finally, integration of longidinal multi-omics data offers prospects in fields such as precision medicine or for environmental and industrial applications. Democratisation of multi-omics analyses and the implementation of innovative integration and interpretation methods will definitely lead to a deeper understanding of eco-systems biology
    corecore