49,361 research outputs found

    Measurement and Measurement Methodology

    Get PDF

    Regression Trees for Longitudinal Data

    Get PDF
    While studying response trajectory, often the population of interest may be diverse enough to exist distinct subgroups within it and the longitudinal change in response may not be uniform in these subgroups. That is, the timeslope and/or influence of covariates in longitudinal profile may vary among these different subgroups. For example, Raudenbush (2001) used depression as an example to argue that it is incorrect to assume that all the people in a given population would be experiencing either increasing or decreasing levels of depression. In such cases, traditional linear mixed effects model (assuming common parametric form for covariates and time) is not directly applicable for the entire population as a group-averaged trajectory can mask important subgroup differences. Our aim is to identify and characterize longitudinally homogeneous subgroups based on the combination of baseline covariates in the most parsimonious way. This goal can be achieved via constructing regression tree for longitudinal data using baseline covariates as partitioning variables. We have proposed LongCART algorithm to construct regression tree for the longitudinal data. In each node, the proposed LongCART algorithm determines the need for further splitting (i.e. whether parameter(s) of longitudinal profile is influenced by any baseline attributes) via parameter instability tests and thus the decision of further splitting is type-I error controlled. We have obtained the asymptotic results for the proposed instability test and examined finite sample behavior of the whole algorithm through simulation studies. Finally, we have applied the LongCART algorithm to study the longitudinal changes in choline level among HIV patients

    Automated data pre-processing via meta-learning

    Get PDF
    The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

    Recursive Partitioning for Heterogeneous Causal Effects

    Full text link
    In this paper we study the problems of estimating heterogeneity in causal effects in experimental or observational studies and conducting inference about the magnitude of the differences in treatment effects across subsets of the population. In applications, our method provides a data-driven approach to determine which subpopulations have large or small treatment effects and to test hypotheses about the differences in these effects. For experiments, our method allows researchers to identify heterogeneity in treatment effects that was not specified in a pre-analysis plan, without concern about invalidating inference due to multiple testing. In most of the literature on supervised machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal is to build a model of the relationship between a unit's attributes and an observed outcome. A prominent role in these methods is played by cross-validation which compares predictions to actual outcomes in test samples, in order to select the level of complexity of the model that provides the best predictive power. Our method is closely related, but it differs in that it is tailored for predicting causal effects of a treatment rather than a unit's outcome. The challenge is that the "ground truth" for a causal effect is not observed for any individual unit: we observe the unit with the treatment, or without the treatment, but not both at the same time. Thus, it is not obvious how to use cross-validation to determine whether a causal effect has been accurately predicted. We propose several novel cross-validation criteria for this problem and demonstrate through simulations the conditions under which they perform better than standard methods for the problem of causal effects. We then apply the method to a large-scale field experiment re-ranking results on a search engine

    Price Indexes For Multi-dwelling Properties In Sweden

    Get PDF
    The econometric test in this paper indicates that standard property and municipality attributes are important determinants of sales prices for MDCBs (multi-dwelling and commercial buildings) in Sweden. I also employ spatial econometric techniques and find that spatial specified regressions improved the models? explanatory power. The constant quality price for a model estimated with OLS is roughly one percentage point higher than for a model controlling for spatial autocorrelation. When the constant quality price trend is estimated on a yearly basis, there are hardly any differences between the estimated parameters, notwithstand-ing if all MDCBs are in the sample or if the sample is split into sub markets. However, estimating models with a quarterly constant quality price trend to some extent shows different price trends for the three sub markets.
    corecore