365,563 research outputs found

    Clusters of Adaptive Evolution in the Human Genome

    Get PDF
    Considerable work has been devoted to identifying regions of the human genome that have been subjected to recent positive selection. Although detailed follow-up studies of putatively selected regions are critical for a deeper understanding of human evolutionary history, such studies have received comparably less attention. Recently, we have shown that ALMS1 has been the target of recent positive selection acting on standing variation in Eurasian populations. Here, we describe a careful follow-up analysis of genetic variation across the ALMS1 region, which unexpectedly revealed a cluster of substrates of positive selection. Specifically, through the analysis of SNP data from the HapMap and Human Genome Diversity Project–Centre d’Etude du Polymorphisme Humain samples as well sequence data from the region, we find compelling evidence for three independent and distinct signals of recent positive selection across this 3 Mb region surrounding ALMS1. Moreover, we analyzed the HapMap data to identify other putative clusters of independent selective events and conservatively discovered 19 additional clusters of adaptive evolution. This work has important implications for the interpretation of genome-scans for positive selection in humans and more broadly contributes to a better understanding of how recent positive selection has shaped genetic variation across the human genome

    Gut microbiota related to Giardia duodenalis, Entamoeba spp. and Blastocystis hominis infections in humans from CĂ´te d'Ivoire.

    Get PDF
    INTRODUCTION: Literature data provide little information about protozoa infections and gut microbiota compositional shifts in humans. This preliminary study aimed to describe the fecal bacterial community composition of people from Côte d'Ivoire harboring Giardia duodenalis, Entamoeba spp., and Blastocystis hominis, in trying to discover possible alterations in their fecal microbiota structure related to the presence of such parasites. METHODOLOGY: Twenty fecal samples were collected from people inhabiting three different localities of Côte d'Ivoire for copromicroscopic analysis and molecular identification of G. duodenalis, Entamoeba spp., and B. hominis. Temporal temperature gradient gel electrophoresis (TTGE) was used to obtain a fingerprint of the overall bacterial community; quantitative polymerase chain reaction (qPCR) was used to define the relative abundances of selected bacterial species/group, and multivariate statistical analyses were employed to correlate all data. RESULTS: Cluster analysis revealed a significant separation of TTGE profiles into four clusters (p < 0.0001), with a marked difference for G. duodenalis-positive samples in relation to the others (p = 5.4×10-6). Interestingly, qPCR data showed how G. duodenalis-positive samples were related to a dysbiotic condition that favors potentially harmful species (such as Escherichia coli), while Entamoeba spp./B. hominis-positive subjects were linked to a eubiotic condition, as shown by a significantly higher Faecalibacterium prausnitzii-Escherichia coli ratio. CONCLUSIONS: This preliminary investigation demonstrates a differential fecal microbiota structure in subjects infected with G. duodenalis or Entamoeba spp./B. hominis, paving the way for using further next-generation DNA technologies to better understand host-parasite-bacteria interactions, aimed at identifying potential indicators of microbiota changes

    UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

    Get PDF
    Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

    Centralized and distributed learning methods for predictive health analytics

    Get PDF
    The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability
    • …
    corecore