43 research outputs found
Fitting Prediction Rule Ensembles with R Package pre
Prediction rule ensembles (PREs) are sparse collections of rules, offering
highly interpretable regression and classification models. This paper presents
the R package pre, which derives PREs through the methodology of Friedman and
Popescu (2008). The implementation and functionality of package pre is
described and illustrated through application on a dataset on the prediction of
depression. Furthermore, accuracy and sparsity of PREs is compared with that of
single trees, random forest and lasso regression in four benchmark datasets.
Results indicate that pre derives ensembles with predictive accuracy comparable
to that of random forests, while using a smaller number of variables for
prediction
Optimizing the assessment of suicidal behavior: the application of curtailment techniques
Background:
Given their length, commonly used scales to assess suicide risk, such as the Beck Scale for Suicide Ideation (SSI) are of limited use as screening tools. In the current study we tested whether deterministic and stochastic curtailment can be applied to shorten the 19-item SSI, without compromising its accuracy.
Methods:
Data from 366 patients, who were seen by a liaison psychiatry service in a general hospital in Scotland after a suicide attempt, were used. Within 24Â h of admission, the SSI was administered; 15 months later, it was determined whether a patient was re-admitted to a hospital as the result of another suicide attempt. We fitted a Receiver Operating Characteristic curve to derive the best cut-off value of the SSI for predicting future suicidal behavior. Using this cut-off, both deterministic and stochastic curtailment were simulated on the item score patterns of the SSI.
Results:
A cut-off value of SSI≥6 provided the best classification accuracy for future suicidal behavior. Using this cut-off, we found that both deterministic and stochastic curtailment reduce the length of the SSI, without reducing the accuracy of the final classification decision. With stochastic curtailment, on average, less than 8 items are needed to assess whether administration of the full-length test will result in an SSI score below or above the cut-off value of 6.
Limitations:
New studies using other datasets should re-validate the optimal cut-off for risk of repeated suicidal behavior after being treated in a hospital following an attempt.
Conclusions:
Curtailment can be used to simplify the assessment of suicidal behavior, and should be considered as an alternative to the full scale
Fitting Prediction Rule Ensembles to Psychological Research Data: An Introduction and Tutorial
Prediction rule ensembles (PREs) are a relatively new statistical learning
method, which aim to strike a balance between predictive accuracy and
interpretability. Starting from a decision tree ensemble, like a boosted tree
ensemble or a random forest, PREs retain a small subset of tree nodes in the
final predictive model. These nodes can be written as simple rules of the form
if [condition] then [prediction]. As a result, PREs are often much less complex
than full decision tree ensembles, while they have been found to provide
similar predictive accuracy in many situations. The current paper introduces
the methodology and shows how PREs can be fitted using the R package pre
through several real-data examples from psychological research. The examples
also illustrate a number of features of package \textbf{pre} that may be
particularly useful for applications in psychology: support for categorical,
multivariate and count responses, application of (non-)negativity constraints,
inclusion of confirmatory rules and standardized variable importance measures.Comment: Published in Psychological Method
Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning
In biomedical research, many different types of patient data can be
collected, such as various types of omics data and medical imaging modalities.
Applying multi-view learning to these different sources of information can
increase the accuracy of medical classification models compared with
single-view procedures. However, collecting biomedical data can be expensive
and/or burdening for patients, so that it is important to reduce the amount of
required data collection. It is therefore necessary to develop multi-view
learning methods which can accurately identify those views that are most
important for prediction. In recent years, several biomedical studies have used
an approach known as multi-view stacking (MVS), where a model is trained on
each view separately and the resulting predictions are combined through
stacking. In these studies, MVS has been shown to increase classification
accuracy. However, the MVS framework can also be used for selecting a subset of
important views. To study the view selection potential of MVS, we develop a
special case called stacked penalized logistic regression (StaPLR). Compared
with existing view-selection methods, StaPLR can make use of faster
optimization algorithms and is easily parallelized. We show that nonnegativity
constraints on the parameters of the function which combines the views play an
important role in preventing unimportant views from entering the model. We
investigate the performance of StaPLR through simulations, and consider two
real data examples. We compare the performance of StaPLR with an existing view
selection method called the group lasso and observe that, in terms of view
selection, StaPLR is often more conservative and has a consistently lower false
positive rate.Comment: 26 pages, 9 figures. Accepted manuscrip
Why we need systematic reviews and meta-analyses in the testing and assessment literature
Multivariate analysis of psychological dat
Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer's disease classification
Multi-view data refers to a setting where features are divided into feature
sets, for example because they correspond to different sources. Stacked
penalized logistic regression (StaPLR) is a recently introduced method that can
be used for classification and automatically selecting the views that are most
important for prediction. We introduce an extension of this method to a setting
where the data has a hierarchical multi-view structure. We also introduce a new
view importance measure for StaPLR, which allows us to compare the importance
of views at any level of the hierarchy. We apply our extended StaPLR algorithm
to Alzheimer's disease classification where different MRI measures have been
calculated from three scan types: structural MRI, diffusion-weighted MRI, and
resting-state fMRI. StaPLR can identify which scan types and which derived MRI
measures are most important for classification, and it outperforms elastic net
regression in classification performance.Comment: 36 pages, 9 figures. Accepted manuscrip
Predicting mental health improvement and deterioration in a large community sample of 11- to 13-year-olds.
Of children with mental health problems who access specialist help, 50% show reliable improvement on self-report measures at case closure and 10% reliable deterioration. To contextualise these figures it is necessary to consider rates of improvement for those in the general population. This study examined rates of reliable improvement/deterioration for children in a school sample over time. N = 9074 children (mean age 12; 52% female; 79% white) from 118 secondary schools across England provided self-report mental health (SDQ), quality of life and demographic data (age, ethnicity and free school meals (FSM) at baseline and 1 year and self-report data on access to mental health support at 1 year). Multinomial logistic regressions and classification trees were used to analyse the data. Of 2270 (25%) scoring above threshold for mental health problems at outset, 27% reliably improved and 9% reliably deteriorated at 1-year follow up. Of 6804 (75%) scoring below threshold, 4% reliably improved and 12% reliably deteriorated. Greater emotional difficulties at outset were associated with greater rates of reliable improvement for both groups (above threshold group: OR = 1.89, p < 0.001, 95% CI [1.64, 2.17], below threshold group: OR = 2.23, p < 0.001, 95% CI [1.93, 2.57]). For those above threshold, higher baseline quality of life was associated with greater likelihood of reliable improvement (OR = 1.28, p < 0.001, 95% CI [1.13, 1.46]), whilst being in receipt of FSM was associated with reduced likelihood of reliable improvement (OR = 0.68, p < 0.01, 95% CI [0.53, 0.88]). For the group below threshold, being female was associated with increased likelihood of reliable deterioration (OR = 1.20, p < 0.025, 95% CI [1.00, 1.42]), whereas being from a non-white ethnic background was associated with decreased likelihood of reliable deterioration (OR = 0.66, p < 0.001, 95% CI [0.54, 0.80]). For those above threshold, almost one in three children showed reliable improvement at 1 year. The extent of emotional difficulties at outset showed the highest associations with rates of reliable improvement