3 research outputs found

    Effect of hyperparameters on variable selection in random forests

    Full text link
    Random forests (RFs) are well suited for prediction modeling and variable selection in high-dimensional omics studies. The effect of hyperparameters of the RF algorithm on prediction performance and variable importance estimation have previously been investigated. However, how hyperparameters impact RF-based variable selection remains unclear. We evaluate the effects on the Vita and the Boruta variable selection procedures based on two simulation studies utilizing theoretical distributions and empirical gene expression data. We assess the ability of the procedures to select important variables (sensitivity) while controlling the false discovery rate (FDR). Our results show that the proportion of splitting candidate variables (mtry.prop) and the sample fraction (sample.fraction) for the training dataset influence the selection procedures more than the drawing strategy of the training datasets and the minimal terminal node size. A suitable setting of the RF hyperparameters depends on the correlation structure in the data. For weakly correlated predictor variables, the default value of mtry is optimal, but smaller values of sample.fraction result in larger sensitivity. In contrast, the difference in sensitivity of the optimal compared to the default value of sample.fraction is negligible for strongly correlated predictor variables, whereas smaller values than the default are better in the other settings. In conclusion, the default values of the hyperparameters will not always be suitable for identifying important variables. Thus, adequate values differ depending on whether the aim of the study is optimizing prediction performance or variable selection.Comment: 18 pages, 2 figures + 2 figures in appendix, 3 table

    Assessment of causality of natriuretic peptides and atrial fibrillation and heart failure : a Mendelian randomization study in the FINRISK cohort

    Get PDF
    Aims Natriuretic peptides are extensively studied biomarkers for atrial fibrillation (AF) and heart failure (HF). Their role in the pathogenesis of both diseases is not entirely understood and previous studies several single-nucleotide poly-morphisms (SNPs) at the NPPA-NPPB locus associated with natriuretic peptides have been identified. We investigated the causal relationship between natriuretic peptides and AF as well as HF using a Mendelian randomization approach. Methods and results N-terminal pro B-type natriuretic peptide (NT-proBNP) (N= 6669), B-type natriuretic peptide (BNP) (N= 6674), and mid-regional pro atrial natriuretic peptide (MR-proANP) (N= 6813) were measured in the FINRISK 1997 cohort. N=30 common SNPs related to NT-proBNP, BNP, and MR-proANP were selected from studies. We performed six Mendelian randomizations for all three natriuretic peptide biomarkers and for both outcomes, AF and HF, separately. Polygenic risk scores (PRSs) based on multiple SNPs were used as genetic instrumental variable in Mendelian randomizations. Polygenic risk scores were significantly associated with the three natriuretic peptides. Polygenic risk scores were not significantly associated with incident AF nor HF. Most cardiovascular risk factors showed significant confounding percentages, but no association with PRS. A causal relation except for small causal betas is unlikely. Conclusion In our Mendelian randomization approach, we confirmed an association between common genetic variation at the NPPA-NPPB locus and natriuretic peptides. A strong causal relationship between natriuretic peptides and incidence of AF as well as HF at the community-level was ruled out. Therapeutic approaches targeting natriuretic peptides will therefore very likely work through indirect mechanisms.Peer reviewe
    corecore