1,320 research outputs found

    clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets

    Get PDF
    Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells. It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperiment provides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers. The R package clusterExperiment is publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function.</div

    The Mycobacterium tuberculosis transposon sequencing database (MtbTnDB): a large-scale guide to genetic conditional essentiality [preprint]

    Get PDF
    Characterization of gene essentiality across different conditions is a useful approach for predicting gene function. Transposon sequencing (TnSeq) is a powerful means of generating genome-wide profiles of essentiality and has been used extensively in Mycobacterium tuberculosis (Mtb) genetic research. Over the past two decades, dozens of TnSeq screens have been published, yielding valuable insights into the biology of Mtb in vitro, inside macrophages, and in model host organisms. However, these Mtb TnSeq profiles are distributed across dozens of research papers within supplementary materials, which makes querying them cumbersome and assembling a complete and consistent synthesis of existing data challenging. Here, we address this problem by building a central repository of publicly available TnSeq screens performed in M. tuberculosis, which we call the Mtb transposon sequencing database (MtbTnDB). The MtbTnDB encompasses 64 published and unpublished TnSeq screens, and is standardized, open-access, and allows users easy access to data, visualizations, and functional predictions through an interactive web-app (www.mtbtndb.app). We also present evidence that (i) genes in the same genomic neighborhood tend to have similar TnSeq profiles, and (ii) clusters of genes with similar TnSeq profiles tend to be enriched for genes belonging to the same functional categories. Finally, we test and evaluate machine learning models trained on TnSeq profiles to guide functional annotation of orphan genes in Mtb. In addition to facilitating the exploration of conditional genetic essentiality in this important human pathogen via a centralized TnSeq data repository, the MtbTnDB will enable hypothesis generation and the extraction of meaningful patterns by facilitating the comparison of datasets across conditions. This will provide a basis for insights into the functional organization of Mtb genes as well as gene function prediction

    Machine Intelligence Identifies Soluble TNFa as a Therapeutic Target for Spinal Cord Injury

    Get PDF
    Traumatic spinal cord injury (SCI) produces a complex syndrome that is expressed across multiple endpoints ranging from molecular and cellular changes to functional behavioral deficits. Effective therapeutic strategies for CNS injury are therefore likely to manifest multi-factorial effects across a broad range of biological and functional outcome measures. Thus, multivariate analytic approaches are needed to capture the linkage between biological and neurobehavioral outcomes. Injury-induced neuroinflammation (NI) presents a particularly challenging therapeutic target, since NI is involved in both degeneration and repair. Here, we used big-data integration and large-scale analytics to examine a large dataset of preclinical efficacy tests combining five different blinded, fully counter-balanced treatment trials for different acute anti-inflammatory treatments for cervical spinal cord injury in rats. Multi-dimensional discovery, using topological data analysis (TDA) and principal components analysis (PCA) revealed that only one showed consistent multidimensional syndromic benefit: intrathecal application of recombinant soluble TNFα receptor 1 (sTNFR1), which showed an inverse-U dose response efficacy. Using the optimal acute dose, we showed that clinically-relevant 90 min delayed treatment profoundly affected multiple biological indices of NI in the first 48 h after injury, including reduction in pro-inflammatory cytokines and gene expression of a coherent complex of acute inflammatory mediators and receptors. Further, a 90 min delayed bolus dose of sTNFR1 reduced the expression of NI markers in the chronic perilesional spinal cord, and consistently improved neurological function over 6 weeks post SCI. These results provide validation of a novel strategy for precision preclinical drug discovery that is likely to improve translation in the difficult landscape of CNS trauma, and confirm the importance of TNFα signaling as a therapeutic target

    Quantifying the Economic Costs of Global Warming

    Get PDF
    Climate change poses a threat to the well-being of people across the globe. Rising global temperatures will increase the frequency and magnitude of extreme climate events, threatening the lives and livelihoods of vulnerable people. Yet the magnitude and persistence of these economic impacts are poorly understood, making it difficult both to design equitable mitigation and adaptation strategies and to hold emitters accountable for the impacts of their emissions. In this thesis, I combine methods from detection and attribution, climate projection, and causal inference to understand the global economic consequences of past and future climate change. I show that two extreme climate events that have not been previously integrated into climate-economy analyses---heat waves and El Niño events---reduce economic growth globally. But these impacts are highly unequal across the globe: Heat waves have their greatest effects in warm regions, and El Niño events primarily harm highly teleconnected countries. As a result, these effects fall most severely on the people that have contributed least to warming, a sign of the inequities embedded in the causes and consequences of global warming. To quantitively understand these inequities and support efforts to hold major emitters accountable for the impacts of their emissions, I develop an end-to-end attribution framework that links individual emitters to the economic effects of the warming induced by their emissions. I show that warming from the emissions of high-income countries in the global North have driven billions of dollars of economic losses in low-income, low-emitting countries. I then combine this framework with my previous results on extreme heat, showing that the emissions of major fossil fuel firms have intensified heat waves, and the resulting economic penalties, across the global tropics. These first-of-their-kind results lend scientific support to emerging discussions over climate liability and loss and damage payments. More broadly, these findings together highlight the already-emerging economic threat of global warming, raising the importance of climate mitigation and adaptation in order to avoid accelerating losses to the most vulnerable people around the globe

    Assessment of Stability in Partitional Clustering Using Resampling Techniques

    Get PDF
    The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data

    Machine Learning-Based Rockfalls Detection with 3D Point Clouds, Example in the Montserrat Massif (Spain)

    Full text link
    Rock slope monitoring using 3D point cloud data allows the creation of rockfall inventories, provided that an efficient methodology is available to quantify the activity. However, monitoring with high temporal and spatial resolution entails the processing of a great volume of data, which can become a problem for the processing system. The standard methodology for monitoring includes the steps of data capture, point cloud alignment, the measure of differences, clustering differences, and identification of rockfalls. In this article, we propose a new methodology adapted from existing algorithms (multiscale model to model cloud comparison and density-based spatial clustering of applications with noise algorithm) and machine learning techniques to facilitate the identification of rockfalls from compared temporary 3D point clouds, possibly the step with most user interpretation. Point clouds are processed to generate 33 new features related to the rock cliff differences, predominant differences, or orientation for classification with 11 machine learning models, combined with 2 undersampling and 13 oversampling methods. The proposed methodology is divided into two software packages: point cloud monitoring and cluster classification. The prediction model applied in two study cases in the Montserrat conglomeratic massif (Barcelona, Spain) reveal that a reduction of 98% in the initial number of clusters is sufficient to identify the totality of rockfalls in the first case study. The second case study requires a 96% reduction to identify 90% of the rockfalls, suggesting that the homogeneity of the rockfall characteristics is a key factor for the correct prediction of the machine learning models
    • 

    corecore