31 research outputs found

    Scalable Profiling and Visualization for Characterizing Microbiomes

    Get PDF
    Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information. The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters are linked to changes in microbiome compositions. In this work we propose novel approaches for the creation and interpretation of microbial community profiles. This includes: (a) a cloud-based, distributed computational system that generates detailed community profiles by processing large DNA sequencing datasets against large reference genome collections, (b) the creation of Microbiome Maps: interpretable, high-resolution visualizations of community profiles, and (c) a machine learning framework for characterizing microbiomes from the Microbiome Maps that delivers deep insights into microbial communities. The proposed approaches have been implemented in three software solutions: Flint, a large scale profiling framework for commercial cloud systems that can process millions of DNA sequencing fragments and produces microbial community profiles at a very low cost; Jasper, a novel method for creating Microbiome Maps, which visualizes the abundance profiles based on the Hilbert curve; and Amber, a machine learning framework for characterizing microbiomes using the Microbiome Maps generated by Jasper with high accuracy. Results show that Flint scales well for reference genome collections that are an order of magnitude larger than those used by competing tools, while using less than a minute to profile a million reads on the cloud with 65 commodity processors. Microbiome maps produced by Jasper are compact, scalable representations of extremely complex microbial community profiles with numerous demonstrable advantages, including the ability to display latent relationships that are hard to elicit. Finally, experiments show that by using images as input instead of unstructured tabular input, the carefully engineered software, Amber, can outperform other sophisticated machine learning tools available for classification of microbiomes

    Large scale microbiome profiling in the cloud

    Get PDF
    Motivation Bacterial metagenomics profiling for metagenomic whole sequencing (mWGS) usually starts by aligning sequencing reads to a collection of reference genomes. Current profiling tools are designed to work against a small representative collection of genomes, and do not scale very well to larger reference genome collections. However, large reference genome collections are capable of providing a more complete and accurate profile of the bacterial population in a metagenomics dataset. In this paper, we discuss a scalable, efficient and affordable approach to this problem, bringing big data solutions within the reach of laboratories with modest resources. Results We developed FLINT, a metagenomics profiling pipeline that is built on top of the Apache Spark framework, and is designed for fast real-time profiling of metagenomic samples against a large collection of reference genomes. FLINT takes advantage of Spark’s built-in parallelism and streaming engine architecture to quickly map reads against a large (170 GB) reference collection of 43 552 bacterial genomes from Ensembl. FLINT runs on Amazon’s Elastic MapReduce service, and is able to profile 1 million Illumina paired-end reads against over 40 K genomes on 64 machines in 67 s—an order of magnitude faster than the state of the art, while using a much larger reference collection. Streaming the sequencing reads allows this approach to sustain mapping rates of 55 million reads per hour, at an hourly cluster cost of $8.00 USD, while avoiding the necessity of storing large quantities of intermediate alignments

    Adapting Flint for Calculating Bacterial Replication Rates in Microbiomes

    Get PDF
    We extend Flint, a Spark-based metagenomic profiling tool, to efficiently measure bacterial growth rates for large data sets. The tool bPTR for bacterial growth rate measurement from metagenomic samples [Brown et al., Nat Biotech, 2016] was adapted and integrated into Flint’s MapReduce framework in order to take advantage of Flint\u27s efficient read alignments and mapping, thus enabling the creation of bacterial abundance profiles that are enhanced with growth-rate information.To show the viability of our method we analyzed whole metagenome sequence data from a longitudinal study of sampled preterm infants [Gibson et al., Nat Micro, 2016], computing the abundance profile enhanced with growth rate information. The conclusions shed light on the new perspective obtained on antibiotics treatments and antibiotic resistance by looking at replication rates

    Modeling association in microbial communities with clique loglinear models

    Get PDF
    There is a growing awareness of the important roles that microbial communities play in complex biological processes. Modern investigation of these often uses next generation sequencing of metagenomic samples to determine community composition. We propose a statistical technique based on clique loglinear models and Bayes model averaging to identify microbial components in a metagenomic sample at various taxonomic levels that have significant associations. We describe the model class, a stochastic search technique for model selection, and the calculation of estimates of posterior probabilities of interest. We demonstrate our approach using data from the Human Microbiome Project and from a study of the skin microbiome in chronic wound healing. Our technique also identifies significant dependencies among microbial components as evidence of possible microbial syntrophy. KEYWORDS: contingency tables, graphical models, model selection, microbiome, next generation sequencingComment: 30 pages, 17 figur

    Modeling Association in Microbial Communities with Clique Loginear Models

    Get PDF
    There is a growing awareness of the important roles that microbial communities play in complex biological processes. Modern investigation of these often uses next generation sequencing of metagenomic samples to determine community composition. We propose a statistical technique based on clique loglinear models and Bayes model averaging to identify microbial components in a metagenomic sample at various taxonomic levels that have significant associations. We describe the model class, a stochastic search technique for model selection, and the calculation of estimates of posterior probabilities of interest. We demonstrate our approach using data from the Human Microbiome Project and from a study of the skin microbiome in chronic wound healing. Our technique also identifies significant dependencies among microbial components as evidence of possible microbial syntrophy

    GenSensor Suite: A Web-Based Tool for the Analysis of Gene and Protein Interactions, Pathways, and Regulation

    Get PDF
    The GenSensor Suite consists of four web tools for elucidating relationships among genes and proteins. GenPath results show which biochemical, regulatory, or other gene set categories are over- or under-represented in an input list compared to a background list. All common gene sets are available for searching in GenPath, plus some specialized sets. Users can add custom background lists. GenInteract builds an interaction gene list from a single gene input and then analyzes this in GenPath. GenPubMed uses a PubMed query to identify a list of PubMed IDs, from which a gene list is extracted and queried in GenPath. GenViewer allows the user to query one gene set against another in GenPath. GenPath results are presented with relevant P- and q-values in an uncluttered, fully linked, and integrated table. Users can easily copy this table and paste it directly into a spreadsheet or document

    Modeling Association in Microbial Communities with Clique Loginear Models

    Get PDF
    There is a growing awareness of the important roles that microbial communities play in complex biological processes. Modern investigation of these often uses next generation sequencing of metagenomic samples to determine community composition. We propose a statistical technique based on clique loglinear models and Bayes model averaging to identify microbial components in a metagenomic sample at various taxonomic levels that have significant associations. We describe the model class, a stochastic search technique for model selection, and the calculation of estimates of posterior probabilities of interest. We demonstrate our approach using data from the Human Microbiome Project and from a study of the skin microbiome in chronic wound healing. Our technique also identifies significant dependencies among microbial components as evidence of possible microbial syntrophy

    Thyroid stimulating hormone levels and geriatric syndromes : secondary nested case–control study of the Mexican Health and Aging Study

    Get PDF
    Q3Q3Abstract Purpose To determine the incidence of geriatric syndromes (GS) in community dwelling older adults with subclinical hypothyroidism. Methods This is an analysis from the Mexican Health and Aging Study, of a subsample of 2089 subjects with TSH determination. From this last subsample, we included 1628 individuals with TSH levels in the subclinical range (4.5–10 µU/ml). Results The multivariate analysis showed that when comparing data obtained from the 2012 wave with the 2015 wave results, there was a signifcant incidence of some GS such as falls (OR 1.79, CI 1.16–2.77, p=0.0116), fatigue (OR 2.17, CI 1.40–3.38, p=0.0348) and depression (OR 1.70, CI 1.06–2.71, p=0.0246) among the subclinical hypothyroidism group. Conclusion This study showed a greater incidence of GS in subjects 50 years and older with sub-clinical hypothyroidism, when compared to those with normal thyroid function. Keywords Thyroid stimulating hormone · Aging · Geriatric syndromes · Chronic disease · Subclinical hypothyroidismhttps://orcid.org/0000-0002-1652-5042https://scholar.google.com/citations?user=qUwLuswAAAAJ&hl=es&oi=aohttps://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000136038Revista Internacional - Indexad

    Country-level gender inequality is associated with structural differences in the brains of women and men

    Get PDF
    男女間の不平等と脳の性差 --男女間の不平等は脳構造の性差と関連する--. 京都大学プレスリリース. 2023-05-10.Gender inequality across the world has been associated with a higher risk to mental health problems and lower academic achievement in women compared to men. We also know that the brain is shaped by nurturing and adverse socio-environmental experiences. Therefore, unequal exposure to harsher conditions for women compared to men in gender-unequal countries might be reflected in differences in their brain structure, and this could be the neural mechanism partly explaining women’s worse outcomes in gender-unequal countries. We examined this through a random-effects meta-analysis on cortical thickness and surface area differences between adult healthy men and women, including a meta-regression in which country-level gender inequality acted as an explanatory variable for the observed differences. A total of 139 samples from 29 different countries, totaling 7, 876 MRI scans, were included. Thickness of the right hemisphere, and particularly the right caudal anterior cingulate, right medial orbitofrontal, and left lateral occipital cortex, presented no differences or even thicker regional cortices in women compared to men in gender-equal countries, reversing to thinner cortices in countries with greater gender inequality. These results point to the potentially hazardous effect of gender inequality on women’s brains and provide initial evidence for neuroscience-informed policies for gender equality
    corecore