49,361 research outputs found
Regression Trees for Longitudinal Data
While studying response trajectory, often the population of interest may be
diverse enough to exist distinct subgroups within it and the longitudinal
change in response may not be uniform in these subgroups. That is, the
timeslope and/or influence of covariates in longitudinal profile may vary among
these different subgroups. For example, Raudenbush (2001) used depression as an
example to argue that it is incorrect to assume that all the people in a given
population would be experiencing either increasing or decreasing levels of
depression. In such cases, traditional linear mixed effects model (assuming
common parametric form for covariates and time) is not directly applicable for
the entire population as a group-averaged trajectory can mask important
subgroup differences. Our aim is to identify and characterize longitudinally
homogeneous subgroups based on the combination of baseline covariates in the
most parsimonious way. This goal can be achieved via constructing regression
tree for longitudinal data using baseline covariates as partitioning variables.
We have proposed LongCART algorithm to construct regression tree for the
longitudinal data. In each node, the proposed LongCART algorithm determines the
need for further splitting (i.e. whether parameter(s) of longitudinal profile
is influenced by any baseline attributes) via parameter instability tests and
thus the decision of further splitting is type-I error controlled. We have
obtained the asymptotic results for the proposed instability test and examined
finite sample behavior of the whole algorithm through simulation studies.
Finally, we have applied the LongCART algorithm to study the longitudinal
changes in choline level among HIV patients
Recommended from our members
A cognitive architecture for learning in reactive environments
Previous research in machine learning has viewed the process of empirical discovery as search through a space of 'theoretical' terms. In this paper, we propose a problem space for empirical discovery, specifying six complementary operators for defining new terms that ease the statement of empirical laws. The six types of terms include: numeric attributes (such as PV/T); intrinsic properties (such as mass); composite objects (such as pairs of colliding balls); classes of objects (such as acids and alkalis); composite relations (such as chemical reactions); and classes of relations (such as combustion/oxidation). We review existing machine discovery systems in light of this framework, examining which parts of the problem space were, covered by these systems. Finally, we outline an integrated discovery system (IDS) we are constructing that includes all six of the operators and which should be able to discover a broad range of empirical laws
Recommended from our members
A framework for empirical discovery
Previous research in machine learning has viewed the process of empirical discovery as search through a space of 'theoretical' terms. In this paper, we propose a problem space for empirical discovery, specifying six complementary operators for defining new terms that ease the statement of empirical laws. The six types of terms include: numeric attributes (such as PV/T); intrinsic properties (such as mass); composite objects (such as pairs of colliding balls); classes of objects (such as acids and alkalis); composite relations (such as chemical reactions); and classes of relations (such as combustion/oxidation). We review existing machine discovery systems in light of this framework, examining which parts of the problem space were, covered by these systems. Finally, we outline an integrated discovery system (IDS) we are constructing that includes all six of the operators and which should be able to discover a broad range of empirical laws
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
Recursive Partitioning for Heterogeneous Causal Effects
In this paper we study the problems of estimating heterogeneity in causal
effects in experimental or observational studies and conducting inference about
the magnitude of the differences in treatment effects across subsets of the
population. In applications, our method provides a data-driven approach to
determine which subpopulations have large or small treatment effects and to
test hypotheses about the differences in these effects. For experiments, our
method allows researchers to identify heterogeneity in treatment effects that
was not specified in a pre-analysis plan, without concern about invalidating
inference due to multiple testing. In most of the literature on supervised
machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal
is to build a model of the relationship between a unit's attributes and an
observed outcome. A prominent role in these methods is played by
cross-validation which compares predictions to actual outcomes in test samples,
in order to select the level of complexity of the model that provides the best
predictive power. Our method is closely related, but it differs in that it is
tailored for predicting causal effects of a treatment rather than a unit's
outcome. The challenge is that the "ground truth" for a causal effect is not
observed for any individual unit: we observe the unit with the treatment, or
without the treatment, but not both at the same time. Thus, it is not obvious
how to use cross-validation to determine whether a causal effect has been
accurately predicted. We propose several novel cross-validation criteria for
this problem and demonstrate through simulations the conditions under which
they perform better than standard methods for the problem of causal effects. We
then apply the method to a large-scale field experiment re-ranking results on a
search engine
Price Indexes For Multi-dwelling Properties In Sweden
The econometric test in this paper indicates that standard property and municipality attributes are important determinants of sales prices for MDCBs (multi-dwelling and commercial buildings) in Sweden. I also employ spatial econometric techniques and find that spatial specified regressions improved the models? explanatory power. The constant quality price for a model estimated with OLS is roughly one percentage point higher than for a model controlling for spatial autocorrelation. When the constant quality price trend is estimated on a yearly basis, there are hardly any differences between the estimated parameters, notwithstand-ing if all MDCBs are in the sample or if the sample is split into sub markets. However, estimating models with a quarterly constant quality price trend to some extent shows different price trends for the three sub markets.
- …