279 research outputs found
Regression tree models for designed experiments
Although regression trees were originally designed for large datasets, they
can profitably be used on small datasets as well, including those from
replicated or unreplicated complete factorial experiments. We show that in the
latter situations, regression tree models can provide simpler and more
intuitive interpretations of interaction effects as differences between
conditional main effects. We present simulation results to verify that the
models can yield lower prediction mean squared errors than the traditional
techniques. The tree models span a wide range of sophistication, from piecewise
constant to piecewise simple and multiple linear, and from least squares to
Poisson and logistic regression.Comment: Published at http://dx.doi.org/10.1214/074921706000000464 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
PLUTO: Penalized Unbiased Logistic Regression Trees
We propose a new algorithm called PLUTO for building logistic regression
trees to binary response data. PLUTO can capture the nonlinear and interaction
patterns in messy data by recursively partitioning the sample space. It fits a
simple or a multiple linear logistic regression model in each partition. PLUTO
employs the cyclical coordinate descent method for estimation of multiple
linear logistic regression models with elastic net penalties, which allows it
to deal with high-dimensional data efficiently. The tree structure comprises a
graphical description of the data. Together with the logistic regression
models, it provides an accurate classifier as well as a piecewise smooth
estimate of the probability of "success". PLUTO controls selection bias by: (1)
separating split variable selection from split point selection; (2) applying an
adjusted chi-squared test to find the split variable instead of exhaustive
search. A bootstrap calibration technique is employed to further correct
selection bias. Comparison on real datasets shows that on average, the multiple
linear PLUTO models predict more accurately than other algorithms.Comment: 59 pages, 25 figures, 14 table
A Machine-Learning Classification Tree Model of Perceived Organizational Performance in U.S. Federal Government Health Agencies
Perceived organizational performance (POP) is an important factor that influences employees’ attitudes and behaviors such as retention and turnover, which in turn improve or impede organizational sustainability. The current study aims to identify interaction patterns of risk factors that differentiate public health and human services employees who perceived their agency performance as low. The 2018 Federal Employee Viewpoint Survey (FEVS), a nationally representative sample of U.S. federal government employees, was used for this study. The study included 43,029 federal employees (weighted n = 75,706) among 10 sub-agencies in the public health and human services sector. The machine-learning classification decision-tree modeling identified several tree-splitting variables and classified 33 subgroups of employees with 2 high-risk, 6 moderate-risk and 25 low-risk subgroups of POP. The important variables predicting POP included performance-oriented culture, organizational satisfaction, organizational procedural justice, task-oriented leadership, work security and safety, and employees’ commitment to their agency, and important variables interacted with one another in predicting risks of POP. Complex interaction patterns in high- and moderate-risk subgroups, the importance of a machine-learning approach to sustainable human resource management in industry 4.0, and the limitations and future research are discussed
Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables
We describe and evaluate a regression tree algorithm for finding subgroups with differential treatments effects in randomized trials with multivariate outcomes. The data may contain missing values in the outcomes and covariates, and the treatment variable is not limited to two levels. Simulation results show that the regression tree models have unbiased variable selection and the estimates of subgroup treatment effects are approximately unbiased. A bootstrap calibration technique is proposed for constructing confidence intervals for the treatment effects. The method is illustrated with data from a longitudinal study comparing two diabetes drugs and a mammography screening trial comparing two treatments and a control
- …