66 research outputs found

    An AUC-based Permutation Variable Importance Measure for Random Forests

    Get PDF
    The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html

    Conditional variable importance for random forests

    Get PDF
    Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables. We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure. The resulting conditional variable importance is shown to reflect the true impact of each predictor variable more reliably than the original marginal approach

    Bioassays to Monitor Taspase1 Function for the Identification of Pharmacogenetic Inhibitors

    Get PDF
    Background: Threonine Aspartase 1 (Taspase1) mediates cleavage of the mixed lineage leukemia (MLL) protein and leukemia provoking MLL-fusions. In contrast to other proteases, the understanding of Taspase1's (patho)biological relevance and function is limited, since neither small molecule inhibitors nor cell based functional assays for Taspase1 are currently available. Methodology/Findings: Efficient cell-based assays to probe Taspase1 function in vivo are presented here. These are composed of glutathione S-transferase, autofluorescent protein variants, Taspase1 cleavage sites and rational combinations of nuclear import and export signals. The biosensors localize predominantly to the cytoplasm, whereas expression of biologically active Taspase1 but not of inactive Taspase1 mutants or of the protease Caspase3 triggers their proteolytic cleavage and nuclear accumulation. Compared to in vitro assays using recombinant components the in vivo assay was highly efficient. Employing an optimized nuclear translocation algorithm, the triple-color assay could be adapted to a high-throughput microscopy platform (Z'factor = 0.63). Automated high-content data analysis was used to screen a focused compound library, selected by an in silico pharmacophor screening approach, as well as a collection of fungal extracts. Screening identified two compounds, N-[2-[(4-amino-6-oxo-3H-pyrimidin-2-yl)sulfanyl]ethyl]benzenesulfonamideand 2-benzyltriazole-4,5-dicarboxylic acid, which partially inhibited Taspase1 cleavage in living cells. Additionally, the assay was exploited to probe endogenous Taspase1 in solid tumor cell models and to identify an improved consensus sequence for efficient Taspase1 cleavage. This allowed the in silico identification of novel putative Taspase1 targets. Those include the FERM Domain-Containing Protein 4B, the Tyrosine-Protein Phosphatase Zeta, and DNA Polymerase Zeta. Cleavage site recognition and proteolytic processing of these substrates were verified in the context of the biosensor. Conclusions: The assay not only allows to genetically probe Taspase1 structure function in vivo, but is also applicable for high-content screening to identify Taspase1 inhibitors. Such tools will provide novel insights into Taspase1's function and its potential therapeutic relevance

    Mutations in KEOPS-Complex Genes Cause Nephrotic Syndrome with Primary Microcephaly

    Get PDF
    Galloway-Mowat syndrome (GAMOS) is an autosomal-recessive disease characterized by the combination of early-onset nephrotic syndrome (SRNS) and microcephaly with brain anomalies. Here we identified recessive mutations in OSGEP, TP53RK, TPRKB, and LAGE3, genes encoding the four subunits of the KEOPS complex, in 37 individuals from 32 families with GAMOS. CRISPR-Cas9 knockout in zebrafish and mice recapitulated the human phenotype of primary microcephaly and resulted in early lethality. Knockdown of OSGEP, TP53RK, or TPRKB inhibited cell proliferation, which human mutations did not rescue. Furthermore, knockdown of these genes impaired protein translation, caused endoplasmic reticulum stress, activated DNA-damage-response signaling, and ultimately induced apoptosis. Knockdown of OSGEP or TP53RK induced defects in the actin cytoskeleton and decreased the migration rate of human podocytes, an established intermediate phenotype of SRNS. We thus identified four new monogenic causes of GAMOS, describe a link between KEOPS function and human disease, and delineate potential pathogenic mechanisms

    National identity predicts public health support during a global pandemic

    Get PDF
    Changing collective behaviour and supporting non-pharmaceutical interventions is an important component in mitigating virus transmission during a pandemic. In a large international collaboration (Study 1, N = 49,968 across 67 countries), we investigated self-reported factors associated with public health behaviours (e.g., spatial distancing and stricter hygiene) and endorsed public policy interventions (e.g., closing bars and restaurants) during the early stage of the COVID-19 pandemic (April-May 2020). Respondents who reported identifying more strongly with their nation consistently reported greater engagement in public health behaviours and support for public health policies. Results were similar for representative and non-representative national samples. Study 2 (N = 42 countries) conceptually replicated the central finding using aggregate indices of national identity (obtained using the World Values Survey) and a measure of actual behaviour change during the pandemic (obtained from Google mobility reports). Higher levels of national identification prior to the pandemic predicted lower mobility during the early stage of the pandemic (r = −0.40). We discuss the potential implications of links between national identity, leadership, and public health for managing COVID-19 and future pandemics.publishedVersio

    National identity predicts public health support during a global pandemic (vol 13, 517, 2022) : National identity predicts public health support during a global pandemic (Nature Communications, (2022), 13, 1, (517), 10.1038/s41467-021-27668-9)

    Get PDF
    Publisher Copyright: © The Author(s) 2022.In this article the author name ‘Agustin Ibanez’ was incorrectly written as ‘Augustin Ibanez’. The original article has been corrected.Peer reviewe

    Predicting attitudinal and behavioral responses to COVID-19 pandemic using machine learning

    Get PDF
    At the beginning of 2020, COVID-19 became a global problem. Despite all the efforts to emphasize the relevance of preventive measures, not everyone adhered to them. Thus, learning more about the characteristics determining attitudinal and behavioral responses to the pandemic is crucial to improving future interventions. In this study, we applied machine learning on the multinational data collected by the International Collaboration on the Social and Moral Psychology of COVID-19 (N = 51,404) to test the predictive efficacy of constructs from social, moral, cognitive, and personality psychology, as well as socio-demographic factors, in the attitudinal and behavioral responses to the pandemic. The results point to several valuable insights. Internalized moral identity provided the most consistent predictive contribution—individuals perceiving moral traits as central to their self-concept reported higher adherence to preventive measures. Similar results were found for morality as cooperation, symbolized moral identity, self-control, open-mindedness, and collective narcissism, while the inverse relationship was evident for the endorsement of conspiracy theories. However, we also found a non-neglible variability in the explained variance and predictive contributions with respect to macro-level factors such as the pandemic stage or cultural region. Overall, the results underscore the importance of morality-related and contextual factors in understanding adherence to public health recommendations during the pandemic.Peer reviewe
    corecore