152 research outputs found
Risk estimation using probability machines
BACKGROUND: Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. RESULTS: We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. CONCLUSIONS: The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from
The Collapse of Bell Determinism
The Bell-Kochen-Specker conditions (BKS) for a deterministic noncontextual
hidden-variable model are wonderfully simple to state, deal with just
one-dimensional projectors on a Hilbert space H and make no reference to a
probabilistic phase space or quantum system. They only ask for an assignment of
zero or one to every projector such that the assignment respects orthogonal
resolutions of the identity. Various no-go results in the literature show that
the pair of statements {BKS is valid; dim H greater than or equal to 3} are
inconsistent. Here we show, more radically, that the pair actually contradicts
the dimensionality of the space itself, by implying that there can exist at
most a single one-dimensional projector acting on H. Our derivation involves
only elementary inner product spaces. It is non-probabilistic, inequality-free,
state independent, does not use entanglement, and is simultaneously valid in
all dimensions three or greater.Comment: accepted for publication in Physics Letters A; expanded abstract;
typos correcte
The behaviour of random forest permutation-based variable importance measures under predictor correlation
<p>Abstract</p> <p>Background</p> <p>Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results.</p> <p>Results</p> <p>In the case when both predictor correlation was present and predictors were associated with the outcome (H<sub>A</sub>), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H<sub>0</sub>) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under H<sub>A </sub>and was unbiased under H<sub>0</sub>. Scaled VIMs were clearly biased under H<sub>A </sub>and H<sub>0</sub>.</p> <p>Conclusions</p> <p>Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.</p
Using Multivariate Machine Learning Methods and Structural MRI to Classify Childhood Onset Schizophrenia and Healthy Controls
Introduction: Multivariate machine learning methods can be used to classify groups of schizophrenia patients and controls using structural magnetic resonance imaging (MRI). However, machine learning methods to date have not been extended beyond classification and contemporaneously applied in a meaningful way to clinical measures. We hypothesized that brain measures would classify groups, and that increased likelihood of being classified as a patient using regional brain measures would be positively related to illness severity, developmental delays, and genetic risk. Methods: Using 74 anatomic brain MRI sub regions and Random Forest (RF), a machine learning method, we classified 98 childhood onset schizophrenia (COS) patients and 99 age, sex, and ethnicity-matched healthy controls. We also used RF to estimate the probability of being classified as a schizophrenia patient based on MRI measures. We then explored relationships between brain-based probability of illness and symptoms, premorbid development, and presence of copy number variation (CNV) associated with schizophrenia. Results: Brain regions jointly classified COS and control groups with 73.7% accuracy. Greater brain-based probability of illness was associated with worse functioning (p = 0.0004) and fewer developmental delays (p = 0.02). Presence of CNV was associated with lower probability of being classified as schizophrenia (p = 0.001). The regions that were most important in classifying groups included left temporal lobes, bilateral dorsolateral prefrontal regions, and left medial parietal lobes. Conclusion: Schizophrenia and control groups can be well classified using RF and anatomic brain measures, and brain-based probability of illness has a positive relationship with illness severity and a negative relationship with developmental delays/problems and CNV-based risk
Detecting Gene-Gene Interactions Using a Permutation-Based Random Forest Method
Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions
- …
