21,718 research outputs found
Recursive Partitioning for Heterogeneous Causal Effects
In this paper we study the problems of estimating heterogeneity in causal
effects in experimental or observational studies and conducting inference about
the magnitude of the differences in treatment effects across subsets of the
population. In applications, our method provides a data-driven approach to
determine which subpopulations have large or small treatment effects and to
test hypotheses about the differences in these effects. For experiments, our
method allows researchers to identify heterogeneity in treatment effects that
was not specified in a pre-analysis plan, without concern about invalidating
inference due to multiple testing. In most of the literature on supervised
machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal
is to build a model of the relationship between a unit's attributes and an
observed outcome. A prominent role in these methods is played by
cross-validation which compares predictions to actual outcomes in test samples,
in order to select the level of complexity of the model that provides the best
predictive power. Our method is closely related, but it differs in that it is
tailored for predicting causal effects of a treatment rather than a unit's
outcome. The challenge is that the "ground truth" for a causal effect is not
observed for any individual unit: we observe the unit with the treatment, or
without the treatment, but not both at the same time. Thus, it is not obvious
how to use cross-validation to determine whether a causal effect has been
accurately predicted. We propose several novel cross-validation criteria for
this problem and demonstrate through simulations the conditions under which
they perform better than standard methods for the problem of causal effects. We
then apply the method to a large-scale field experiment re-ranking results on a
search engine
Test Set Diameter: Quantifying the Diversity of Sets of Test Cases
A common and natural intuition among software testers is that test cases need
to differ if a software system is to be tested properly and its quality
ensured. Consequently, much research has gone into formulating distance
measures for how test cases, their inputs and/or their outputs differ. However,
common to these proposals is that they are data type specific and/or calculate
the diversity only between pairs of test inputs, traces or outputs.
We propose a new metric to measure the diversity of sets of tests: the test
set diameter (TSDm). It extends our earlier, pairwise test diversity metrics
based on recent advances in information theory regarding the calculation of the
normalized compression distance (NCD) for multisets. An advantage is that TSDm
can be applied regardless of data type and on any test-related information, not
only the test inputs. A downside is the increased computational time compared
to competing approaches.
Our experiments on four different systems show that the test set diameter can
help select test sets with higher structural and fault coverage than random
selection even when only applied to test inputs. This can enable early test
design and selection, prior to even having a software system to test, and
complement other types of test automation and analysis. We argue that this
quantification of test set diversity creates a number of opportunities to
better understand software quality and provides practical ways to increase it.Comment: In submissio
Metamodel Instance Generation: A systematic literature review
Modelling and thus metamodelling have become increasingly important in
Software Engineering through the use of Model Driven Engineering. In this paper
we present a systematic literature review of instance generation techniques for
metamodels, i.e. the process of automatically generating models from a given
metamodel. We start by presenting a set of research questions that our review
is intended to answer. We then identify the main topics that are related to
metamodel instance generation techniques, and use these to initiate our
literature search. This search resulted in the identification of 34 key papers
in the area, and each of these is reviewed here and discussed in detail. The
outcome is that we are able to identify a knowledge gap in this field, and we
offer suggestions as to some potential directions for future research.Comment: 25 page
- …