673 research outputs found
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Dynamics of interacting diseases
Current modeling of infectious diseases allows for the study of complex and
realistic scenarios that go from the population to the individual level of
description. However, most epidemic models assume that the spreading process
takes place on a single level (be it a single population, a meta-population
system or a network of contacts). In particular, interdependent contagion
phenomena can only be addressed if we go beyond the scheme one pathogen-one
network. In this paper, we propose a framework that allows describing the
spreading dynamics of two concurrent diseases. Specifically, we characterize
analytically the epidemic thresholds of the two diseases for different
scenarios and also compute the temporal evolution characterizing the unfolding
dynamics. Results show that there are regions of the parameter space in which
the onset of a disease's outbreak is conditioned to the prevalence levels of
the other disease. Moreover, we show, for the SIS scheme, that under certain
circumstances, finite and not vanishing epidemic thresholds are found even at
the thermodynamic limit for scale-free networks. For the SIR scenario, the
phenomenology is richer and additional interdependencies show up. We also find
that the secondary thresholds for the SIS and SIR models are different, which
results directly from the interaction between both diseases. Our work thus
solve an important problem and pave the way towards a more comprehensive
description of the dynamics of interacting diseases.Comment: 24 pages, 9 figures, 4 tables, 3 appendices. Final version accepted
for publication in Physical Review
An Ecological Perspective of American Rodent-Borne Orthohantavirus Surveillance
Orthohantaviruses are a global group of viruses found primarily in rodents, though several viruses have also been found in shrews and moles. Many rodent-borne orthohantaviruses are capable of causing one of several diseases in humans, and the mortality associated with these diseases ranges from \u3c 0.1% - 50% depending on the specific etiological virus. In North and South America, orthohantavirus research was ignited by an outbreak of severe disease in the Four Corners region of the United States in 1993. However, despite the discovery of over 20 orthohantaviruses in the Americas, our understanding of orthohantavirus ecology and virus-host dynamics in this region is still limited, and orthohantavirus surveillance is generally restricted in scope to select regions and small portions of host distributional ranges. In Chapter I, I present a literature review on the current understanding of American rodent-borne orthohantavirus ecology. This review focused on under-studied orthohantaviruses, addressing gaps in knowledge by extrapolating information from well-studied orthohantaviruses, general rodent ecology, and occassionally from Eurasian orthohantavirus-host ecology. There were several key conclusions generated from this review that warrant further research: 1) the large number of putative orthohantaviruses and gaps in orthohantavirus evolution necessitate further surveillance and characterization, 2) orthohantavirus traits differ and are more generalizable based on host taxonomy rather than geography, and 3) orthohantavirus host species are disproportionately found in grasslands and disturbed habitats. In Chapter II, I present a prioritized list of rodent species to target for orthohantavirus surveillance based on predictive modeling using machine learning. Probable orthohantavirus hosts were predicted based on traits of known orthohantavirus hosts using two different types of evidence: RT-PCR and virus isolation. Predicted host distributions were also mapped to identify geographic hotspots to spatially guide future surveillance efforts. In Chapter III, I present a framework for understanding and predicting orthohantavirus traits based on reservoir host phylogeny, as opposed to the traditional geographic dichotomy used to group orthohantaviruses. This framework establishes three distinct orthohantavirus groups: murid-borne orthohantaviruses, arvicoline-borne orthohantaviruses, and non-arvicoline cricetid-borne orthohantaviruses, which differ in several key traits, including the human disease they cause, transmission routes, and virus-host fidelity. In Chapter IV, I compare rodent communities and orthohantavirus prevalence among grassland management regimes. Sites that were periodically burned had high rodent diversity and a high proportion of grassland species. However, rodent seroprevalence for orthohantavirus was also highest in burned sites, representing a trade-off in habitat management outcomes. The high seroprevalence in burned sites is likely due to the robust populations supported by the high quality habitat resulting from prescribed burning. In Chapters V and VI, I describe Ozark virus and Sager Creek virus, two novel orthohantaviruses discovered from specimens collected during Chapter IV. Both chapters report full genome sequences of the respective viruses and compare both nucleotide and protein phylogenies with related orthohantaviruses. Additionally in Chapter VI, I support the genetic analyses with molecular and ecological characterizations, including seasonal fluctuations in host abundance, correlates of prevalence, evidence of virus shedding, and information on host cell susceptibility to Sager Creek virus
An Ecological Perspective of American Rodent-Borne Orthohantavirus Surveillance
Orthohantaviruses are a global group of viruses found primarily in rodents, though several viruses have also been found in shrews and moles. Many rodent-borne orthohantaviruses are capable of causing one of several diseases in humans, and the mortality associated with these diseases ranges from \u3c 0.1% - 50% depending on the specific etiological virus. In North and South America, orthohantavirus research was ignited by an outbreak of severe disease in the Four Corners region of the United States in 1993. However, despite the discovery of over 20 orthohantaviruses in the Americas, our understanding of orthohantavirus ecology and virus-host dynamics in this region is still limited, and orthohantavirus surveillance is generally restricted in scope to select regions and small portions of host distributional ranges. In Chapter I, I present a literature review on the current understanding of American rodent-borne orthohantavirus ecology. This review focused on under-studied orthohantaviruses, addressing gaps in knowledge by extrapolating information from well-studied orthohantaviruses, general rodent ecology, and occassionally from Eurasian orthohantavirus-host ecology. There were several key conclusions generated from this review that warrant further research: 1) the large number of putative orthohantaviruses and gaps in orthohantavirus evolution necessitate further surveillance and characterization, 2) orthohantavirus traits differ and are more generalizable based on host taxonomy rather than geography, and 3) orthohantavirus host species are disproportionately found in grasslands and disturbed habitats. In Chapter II, I present a prioritized list of rodent species to target for orthohantavirus surveillance based on predictive modeling using machine learning. Probable orthohantavirus hosts were predicted based on traits of known orthohantavirus hosts using two different types of evidence: RT-PCR and virus isolation. Predicted host distributions were also mapped to identify geographic hotspots to spatially guide future surveillance efforts. In Chapter III, I present a framework for understanding and predicting orthohantavirus traits based on reservoir host phylogeny, as opposed to the traditional geographic dichotomy used to group orthohantaviruses. This framework establishes three distinct orthohantavirus groups: murid-borne orthohantaviruses, arvicoline-borne orthohantaviruses, and non-arvicoline cricetid-borne orthohantaviruses, which differ in several key traits, including the human disease they cause, transmission routes, and virus-host fidelity. In Chapter IV, I compare rodent communities and orthohantavirus prevalence among grassland management regimes. Sites that were periodically burned had high rodent diversity and a high proportion of grassland species. However, rodent seroprevalence for orthohantavirus was also highest in burned sites, representing a trade-off in habitat management outcomes. The high seroprevalence in burned sites is likely due to the robust populations supported by the high quality habitat resulting from prescribed burning. In Chapters V and VI, I describe Ozark virus and Sager Creek virus, two novel orthohantaviruses discovered from specimens collected during Chapter IV. Both chapters report full genome sequences of the respective viruses and compare both nucleotide and protein phylogenies with related orthohantaviruses. Additionally in Chapter VI, I support the genetic analyses with molecular and ecological characterizations, including seasonal fluctuations in host abundance, correlates of prevalence, evidence of virus shedding, and information on host cell susceptibility to Sager Creek virus
Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients.
Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes
Joint learning from multiple information sources for biological problems
Thanks to technological advancements, more and more biological data havebeen generated in recent years. Data availability offers unprecedented opportunities to look at the same problem from multiple aspects. It also unveils a more global view of the problem that takes into account the intricated inter-play between the involved molecules/entities. Nevertheless, biological datasets are biased, limited in quantity, and contain many false-positive samples. Such challenges often drastically downgrade the performance of a predictive model on unseen data and, thus, limit its applicability in real biological studies.
Human learning is a multi-stage process in which we usually start with simple things. Through the accumulated knowledge over time, our cognition ability extends to more complex concepts. Children learn to speak simple words before being able to formulate sentences. Similarly, being able to speak correct sentences supports our learning to speak correct and meaningful paragraphs, etc. Generally, knowledge acquired from related learning tasks would help boost our learning capability in the current task. Motivated by such a phenomenon, in this thesis, we study supervised machine learning models for bioinformatics problems that can improve their performance through exploiting multiple related knowledge sources. More specifically, we concern with ways to enrich the supervised models’ knowledge base with publicly available related data to enhance the computational models’ prediction performance.
Our work shares commonality with existing works in multimodal learning, multi-task learning, and transfer learning. Nevertheless, there are certain differences in some cases. Besides the proposed architectures, we present large-scale experiment setups with consensus evaluation metrics along with the creation and release of large datasets to showcase our approaches’ superiority. Moreover, we add case studies with detailed analyses in which we place no simplified assumptions to demonstrate the systems’ utilities in realistic application scenarios. Finally, we develop and make available an easy-to-use website for non-expert users to query the model’s generated prediction results to facilitate field experts’ assessments and adaptation. We believe that our work serves as one of the first steps in bridging the gap between “Computer Science” and “Biology” that will open a new era of fruitful collaboration between computer scientists and biological field experts
- …