673 research outputs found

    Microbial community pattern detection in human body habitats via ensemble clustering framework

    Full text link
    The human habitat is a host where microbial species evolve, function, and continue to evolve. Elucidating how microbial communities respond to human habitats is a fundamental and critical task, as establishing baselines of human microbiome is essential in understanding its role in human disease and health. However, current studies usually overlook a complex and interconnected landscape of human microbiome and limit the ability in particular body habitats with learning models of specific criterion. Therefore, these methods could not capture the real-world underlying microbial patterns effectively. To obtain a comprehensive view, we propose a novel ensemble clustering framework to mine the structure of microbial community pattern on large-scale metagenomic data. Particularly, we first build a microbial similarity network via integrating 1920 metagenomic samples from three body habitats of healthy adults. Then a novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is proposed and applied onto the network to detect clustering pattern. Extensive experiments are conducted to evaluate the effectiveness of our model on deriving microbial community with respect to body habitat and host gender. From clustering results, we observed that body habitat exhibits a strong bound but non-unique microbial structural patterns. Meanwhile, human microbiome reveals different degree of structural variations over body habitat and host gender. In summary, our ensemble clustering framework could efficiently explore integrated clustering results to accurately identify microbial communities, and provide a comprehensive view for a set of microbial communities. Such trends depict an integrated biography of microbial communities, which offer a new insight towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201

    Dynamics of interacting diseases

    Full text link
    Current modeling of infectious diseases allows for the study of complex and realistic scenarios that go from the population to the individual level of description. However, most epidemic models assume that the spreading process takes place on a single level (be it a single population, a meta-population system or a network of contacts). In particular, interdependent contagion phenomena can only be addressed if we go beyond the scheme one pathogen-one network. In this paper, we propose a framework that allows describing the spreading dynamics of two concurrent diseases. Specifically, we characterize analytically the epidemic thresholds of the two diseases for different scenarios and also compute the temporal evolution characterizing the unfolding dynamics. Results show that there are regions of the parameter space in which the onset of a disease's outbreak is conditioned to the prevalence levels of the other disease. Moreover, we show, for the SIS scheme, that under certain circumstances, finite and not vanishing epidemic thresholds are found even at the thermodynamic limit for scale-free networks. For the SIR scenario, the phenomenology is richer and additional interdependencies show up. We also find that the secondary thresholds for the SIS and SIR models are different, which results directly from the interaction between both diseases. Our work thus solve an important problem and pave the way towards a more comprehensive description of the dynamics of interacting diseases.Comment: 24 pages, 9 figures, 4 tables, 3 appendices. Final version accepted for publication in Physical Review

    An Ecological Perspective of American Rodent-Borne Orthohantavirus Surveillance

    Get PDF
    Orthohantaviruses are a global group of viruses found primarily in rodents, though several viruses have also been found in shrews and moles. Many rodent-borne orthohantaviruses are capable of causing one of several diseases in humans, and the mortality associated with these diseases ranges from \u3c 0.1% - 50% depending on the specific etiological virus. In North and South America, orthohantavirus research was ignited by an outbreak of severe disease in the Four Corners region of the United States in 1993. However, despite the discovery of over 20 orthohantaviruses in the Americas, our understanding of orthohantavirus ecology and virus-host dynamics in this region is still limited, and orthohantavirus surveillance is generally restricted in scope to select regions and small portions of host distributional ranges. In Chapter I, I present a literature review on the current understanding of American rodent-borne orthohantavirus ecology. This review focused on under-studied orthohantaviruses, addressing gaps in knowledge by extrapolating information from well-studied orthohantaviruses, general rodent ecology, and occassionally from Eurasian orthohantavirus-host ecology. There were several key conclusions generated from this review that warrant further research: 1) the large number of putative orthohantaviruses and gaps in orthohantavirus evolution necessitate further surveillance and characterization, 2) orthohantavirus traits differ and are more generalizable based on host taxonomy rather than geography, and 3) orthohantavirus host species are disproportionately found in grasslands and disturbed habitats. In Chapter II, I present a prioritized list of rodent species to target for orthohantavirus surveillance based on predictive modeling using machine learning. Probable orthohantavirus hosts were predicted based on traits of known orthohantavirus hosts using two different types of evidence: RT-PCR and virus isolation. Predicted host distributions were also mapped to identify geographic hotspots to spatially guide future surveillance efforts. In Chapter III, I present a framework for understanding and predicting orthohantavirus traits based on reservoir host phylogeny, as opposed to the traditional geographic dichotomy used to group orthohantaviruses. This framework establishes three distinct orthohantavirus groups: murid-borne orthohantaviruses, arvicoline-borne orthohantaviruses, and non-arvicoline cricetid-borne orthohantaviruses, which differ in several key traits, including the human disease they cause, transmission routes, and virus-host fidelity. In Chapter IV, I compare rodent communities and orthohantavirus prevalence among grassland management regimes. Sites that were periodically burned had high rodent diversity and a high proportion of grassland species. However, rodent seroprevalence for orthohantavirus was also highest in burned sites, representing a trade-off in habitat management outcomes. The high seroprevalence in burned sites is likely due to the robust populations supported by the high quality habitat resulting from prescribed burning. In Chapters V and VI, I describe Ozark virus and Sager Creek virus, two novel orthohantaviruses discovered from specimens collected during Chapter IV. Both chapters report full genome sequences of the respective viruses and compare both nucleotide and protein phylogenies with related orthohantaviruses. Additionally in Chapter VI, I support the genetic analyses with molecular and ecological characterizations, including seasonal fluctuations in host abundance, correlates of prevalence, evidence of virus shedding, and information on host cell susceptibility to Sager Creek virus

    An Ecological Perspective of American Rodent-Borne Orthohantavirus Surveillance

    Get PDF
    Orthohantaviruses are a global group of viruses found primarily in rodents, though several viruses have also been found in shrews and moles. Many rodent-borne orthohantaviruses are capable of causing one of several diseases in humans, and the mortality associated with these diseases ranges from \u3c 0.1% - 50% depending on the specific etiological virus. In North and South America, orthohantavirus research was ignited by an outbreak of severe disease in the Four Corners region of the United States in 1993. However, despite the discovery of over 20 orthohantaviruses in the Americas, our understanding of orthohantavirus ecology and virus-host dynamics in this region is still limited, and orthohantavirus surveillance is generally restricted in scope to select regions and small portions of host distributional ranges. In Chapter I, I present a literature review on the current understanding of American rodent-borne orthohantavirus ecology. This review focused on under-studied orthohantaviruses, addressing gaps in knowledge by extrapolating information from well-studied orthohantaviruses, general rodent ecology, and occassionally from Eurasian orthohantavirus-host ecology. There were several key conclusions generated from this review that warrant further research: 1) the large number of putative orthohantaviruses and gaps in orthohantavirus evolution necessitate further surveillance and characterization, 2) orthohantavirus traits differ and are more generalizable based on host taxonomy rather than geography, and 3) orthohantavirus host species are disproportionately found in grasslands and disturbed habitats. In Chapter II, I present a prioritized list of rodent species to target for orthohantavirus surveillance based on predictive modeling using machine learning. Probable orthohantavirus hosts were predicted based on traits of known orthohantavirus hosts using two different types of evidence: RT-PCR and virus isolation. Predicted host distributions were also mapped to identify geographic hotspots to spatially guide future surveillance efforts. In Chapter III, I present a framework for understanding and predicting orthohantavirus traits based on reservoir host phylogeny, as opposed to the traditional geographic dichotomy used to group orthohantaviruses. This framework establishes three distinct orthohantavirus groups: murid-borne orthohantaviruses, arvicoline-borne orthohantaviruses, and non-arvicoline cricetid-borne orthohantaviruses, which differ in several key traits, including the human disease they cause, transmission routes, and virus-host fidelity. In Chapter IV, I compare rodent communities and orthohantavirus prevalence among grassland management regimes. Sites that were periodically burned had high rodent diversity and a high proportion of grassland species. However, rodent seroprevalence for orthohantavirus was also highest in burned sites, representing a trade-off in habitat management outcomes. The high seroprevalence in burned sites is likely due to the robust populations supported by the high quality habitat resulting from prescribed burning. In Chapters V and VI, I describe Ozark virus and Sager Creek virus, two novel orthohantaviruses discovered from specimens collected during Chapter IV. Both chapters report full genome sequences of the respective viruses and compare both nucleotide and protein phylogenies with related orthohantaviruses. Additionally in Chapter VI, I support the genetic analyses with molecular and ecological characterizations, including seasonal fluctuations in host abundance, correlates of prevalence, evidence of virus shedding, and information on host cell susceptibility to Sager Creek virus

    Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients.

    Get PDF
    Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes

    Joint learning from multiple information sources for biological problems

    Get PDF
    Thanks to technological advancements, more and more biological data havebeen generated in recent years. Data availability offers unprecedented opportunities to look at the same problem from multiple aspects. It also unveils a more global view of the problem that takes into account the intricated inter-play between the involved molecules/entities. Nevertheless, biological datasets are biased, limited in quantity, and contain many false-positive samples. Such challenges often drastically downgrade the performance of a predictive model on unseen data and, thus, limit its applicability in real biological studies. Human learning is a multi-stage process in which we usually start with simple things. Through the accumulated knowledge over time, our cognition ability extends to more complex concepts. Children learn to speak simple words before being able to formulate sentences. Similarly, being able to speak correct sentences supports our learning to speak correct and meaningful paragraphs, etc. Generally, knowledge acquired from related learning tasks would help boost our learning capability in the current task. Motivated by such a phenomenon, in this thesis, we study supervised machine learning models for bioinformatics problems that can improve their performance through exploiting multiple related knowledge sources. More specifically, we concern with ways to enrich the supervised models’ knowledge base with publicly available related data to enhance the computational models’ prediction performance. Our work shares commonality with existing works in multimodal learning, multi-task learning, and transfer learning. Nevertheless, there are certain differences in some cases. Besides the proposed architectures, we present large-scale experiment setups with consensus evaluation metrics along with the creation and release of large datasets to showcase our approaches’ superiority. Moreover, we add case studies with detailed analyses in which we place no simplified assumptions to demonstrate the systems’ utilities in realistic application scenarios. Finally, we develop and make available an easy-to-use website for non-expert users to query the model’s generated prediction results to facilitate field experts’ assessments and adaptation. We believe that our work serves as one of the first steps in bridging the gap between “Computer Science” and “Biology” that will open a new era of fruitful collaboration between computer scientists and biological field experts
    corecore