31 research outputs found
Accelerated and interpretable oblique random survival forests
The oblique random survival forest (RSF) is an ensemble supervised learning
method for right-censored outcomes. Trees in the oblique RSF are grown using
linear combinations of predictors to create branches, whereas in the standard
RSF, a single predictor is used. Oblique RSF ensembles often have higher
prediction accuracy than standard RSF ensembles. However, assessing all
possible linear combinations of predictors induces significant computational
overhead that limits applications to large-scale data sets. In addition, few
methods have been developed for interpretation of oblique RSF ensembles, and
they remain more difficult to interpret compared to their axis-based
counterparts. We introduce a method to increase computational efficiency of the
oblique RSF and a method to estimate importance of individual predictor
variables with the oblique RSF. Our strategy to reduce computational overhead
makes use of Newton-Raphson scoring, a classical optimization technique that we
apply to the Cox partial likelihood function within each non-leaf node of
decision trees. We estimate the importance of individual predictors for the
oblique RSF by negating each coefficient used for the given predictor in linear
combinations, and then computing the reduction in out-of-bag accuracy. In
general benchmarking experiments, we find that our implementation of the
oblique RSF is approximately 450 times faster with equivalent discrimination
and superior Brier score compared to existing software for oblique RSFs. We
find in simulation studies that 'negation importance' discriminates between
relevant and irrelevant predictors more reliably than permutation importance,
Shapley additive explanations, and a previously introduced technique to measure
variable importance with oblique RSFs based on analysis of variance. Methods
introduced in the current study are available in the aorsf R package.Comment: 40 pages, 6 figure
High resolution spatiotemporal patterns of flow at the landscape scale in montane nonâperennial streams
Intermittent and ephemeral streams in dryland environments support diverse assemblages of aquatic and terrestrial life. Understanding when and where water flows provide insights into the availability of water, its response to external controlling factors, and potential sensitivity to climate change and a host of human activities. Knowledge regarding the timing of drying/wetting cycles can also be useful to map critical habitats for species and ecosystems that rely on these temporary water sources. However, identifying the locations and monitoring the timing of streamflow and channel sediment moisture remains a challenging endeavor. In this paper, we analyzed daily conductivity from 37 sensors distributed along 10 streams across an arid mountain front in Arizona (United States) to assess spatiotemporal patterns in flow permanence, defined as the timing and extent of water in streams. Conductivity sensors provide information on surface flow and sediment moisture, supporting a stream classification based on seasonal flow dynamics. Our results provide insight into flow responses to seasonal rainfall, highlighting stream reaches very reactive to rainfall versus those demonstrating more stable streamflow. The strength of stream responses to precipitation are explored in the context of surficial geology. In summary, conductivity data can be used to map potential stream habitat for waterâdependent species in both space and time, while also providing the basis upon which sensitivity to ongoing climate change can be evaluated
The natural wood regime in rivers
International audienc
A many-analysts approach to the relation between religiosity and well-being
The relation between religiosity and well-being is one of the most researched topics in the psychology of religion, yet the directionality and robustness of the effect remains debated. Here, we adopted a many-analysts approach to assess the robustness of this relation based on a new cross-cultural dataset (N=10,535 participants from 24 countries). We recruited 120 analysis teams to investigate (1) whether religious people self-report higher well-being, and (2) whether the relation between religiosity and self-reported well-being depends on perceived cultural norms of religion (i.e., whether it is considered normal and desirable to be religious in a given country). In a two-stage procedure, the teams first created an analysis plan and then executed their planned analysis on the data. For the first research question, all but 3 teams reported positive effect sizes with credible/confidence intervals excluding zero (median reported ÎČ=0.120). For the second research question, this was the case for 65% of the teams (median reported ÎČ=0.039). While most teams applied (multilevel) linear regression models, there was considerable variability in the choice of items used to construct the independent variables, the dependent variable, and the included covariates
A Many-analysts Approach to the Relation Between Religiosity and Well-being
The relation between religiosity and well-being is one of the most researched topics in the psychology of religion, yet the directionality and robustness of the effect remains debated. Here, we adopted a many-analysts approach to assess the robustness of this relation based on a new cross-cultural dataset (N = 10, 535 participants from 24 countries). We recruited 120 analysis teams to investigate (1) whether religious people self-report higher well-being, and (2) whether the relation between religiosity and self-reported well-being depends on perceived cultural norms of religion (i.e., whether it is considered normal and desirable to be religious in a given country). In a two-stage procedure, the teams first created an analysis plan and then executed their planned analysis on the data. For the first research question, all but 3 teams reported positive effect sizes with credible/confidence intervals excluding zero (median reported ÎČ = 0.120). For the second research question, this was the case for 65% of the teams (median reported ÎČ = 0.039). While most teams applied (multilevel) linear regression models, there was considerable variability in the choice of items used to construct the independent variables, the dependent variable, and the included covariates
Creative destruction in science
Drawing on the concept of a gale of creative destruction in a capitalistic economy, we argue that initiatives to assess the robustness of findings in the organizational literature should aim to simultaneously test competing ideas operating in the same theoretical space. In other words, replication efforts should seek not just to support or question the original findings, but also to replace them with revised, stronger theories with greater explanatory power. Achieving this will typically require adding new measures, conditions, and subject populations to research designs, in order to carry out conceptual tests of multiple theories in addition to directly replicating the original findings. To illustrate the value of the creative destruction approach for theory pruning in organizational scholarship, we describe recent replication initiatives re-examining culture and work morality, working parents\u2019 reasoning about day care options, and gender discrimination in hiring decisions.
Significance statement
It is becoming increasingly clear that many, if not most, published research findings across scientific fields are not readily replicable when the same method is repeated. Although extremely valuable, failed replications risk leaving a theoretical void\u2014 reducing confidence the original theoretical prediction is true, but not replacing it with positive evidence in favor of an alternative theory. We introduce the creative destruction approach to replication, which combines theory pruning methods from the field of management with emerging best practices from the open science movement, with the aim of making replications as generative as possible. In effect, we advocate for a Replication 2.0 movement in which the goal shifts from checking on the reliability of past findings to actively engaging in competitive theory testing and theory building.
Scientific transparency statement
The materials, code, and data for this article are posted publicly on the Open Science Framework, with links provided in the article
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers âŒ99% of the euchromatic genome and is accurate to an error rate of âŒ1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead