11,552 research outputs found
The problem of evaluating automated large-scale evidence aggregators
In the biomedical context, policy makers face a large amount of potentially discordant evidence from different sources. This prompts the question of how this evidence should be aggregated in the interests of best-informed policy recommendations. The starting point of our discussion is Hunter and Williams’ recent work on an automated aggregation method for medical evidence. Our negative claim is that it is far from clear what the relevant criteria for evaluating an evidence aggregator of this sort are. What is the appropriate balance between explicitly coded algorithms and implicit reasoning involved, for instance, in the packaging of input evidence? In short: What is the optimal degree of ‘automation’? On the positive side: We propose the ability to perform an adequate robustness analysis as the focal criterion, primarily because it directs efforts to what is most important, namely, the structure of the algorithm and the appropriate extent of automation. Moreover, where there are resource constraints on the aggregation process, one must also consider what balance between volume of evidence and accuracy in the treatment of individual evidence best facilitates inference. There is no prerogative to aggregate the total evidence available if this would in fact reduce overall accuracy
Hierarchical spatial models for predicting tree species assemblages across large domains
Spatially explicit data layers of tree species assemblages, referred to as
forest types or forest type groups, are a key component in large-scale
assessments of forest sustainability, biodiversity, timber biomass, carbon
sinks and forest health monitoring. This paper explores the utility of coupling
georeferenced national forest inventory (NFI) data with readily available and
spatially complete environmental predictor variables through spatially-varying
multinomial logistic regression models to predict forest type groups across
large forested landscapes. These models exploit underlying spatial associations
within the NFI plot array and the spatially-varying impact of predictor
variables to improve the accuracy of forest type group predictions. The
richness of these models incurs onerous computational burdens and we discuss
dimension reducing spatial processes that retain the richness in modeling. We
illustrate using NFI data from Michigan, USA, where we provide a comprehensive
analysis of this large study area and demonstrate improved prediction with
associated measures of uncertainty.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS250 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Truly Unordered Probabilistic Rule Sets for Multi-class Classification
Rule set learning has long been studied and has recently been frequently
revisited due to the need for interpretable models. Still, existing methods
have several shortcomings: 1) most recent methods require a binary feature
matrix as input, while learning rules directly from numeric variables is
understudied; 2) existing methods impose orders among rules, either explicitly
or implicitly, which harms interpretability; and 3) currently no method exists
for learning probabilistic rule sets for multi-class target variables (there is
only one for probabilistic rule lists).
We propose TURS, for Truly Unordered Rule Sets, which addresses these
shortcomings. We first formalize the problem of learning truly unordered rule
sets. To resolve conflicts caused by overlapping rules, i.e., instances covered
by multiple rules, we propose a novel approach that exploits the probabilistic
properties of our rule sets. We next develop a two-phase heuristic algorithm
that learns rule sets by carefully growing rules. An important innovation is
that we use a surrogate score to take the global potential of the rule set into
account when learning a local rule.
Finally, we empirically demonstrate that, compared to non-probabilistic and
(explicitly or implicitly) ordered state-of-the-art methods, our method learns
rule sets that not only have better interpretability but also better predictive
performance.Comment: Camera ready version for ECMLPKDD 2022, with Supplementary Material
- …