25,486 research outputs found
Exploratory Mediation Analysis with Many Potential Mediators
Social and behavioral scientists are increasingly employing technologies such
as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional'
datasets with more columns than rows. There is increasing interest, but little
substantive theory, in the role the variables in these data play in known
processes. This necessitates exploratory mediation analysis, for which
structural equation modeling is the benchmark method. However, this method
cannot perform mediation analysis with more variables than observations. One
option is to run a series of univariate mediation models, which incorrectly
assumes independence of the mediators. Another option is regularization, but
the available implementations may lead to high false positive rates. In this
paper, we develop a hybrid approach which uses components of both filter and
regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering
conditional on the other selected mediators. We show through simulation that it
improves performance over existing methods. Finally, we provide an empirical
example, showing how our method may be used for epigenetic research.Comment: R code and package are available online as supplementary material at
https://github.com/vankesteren/cmfilter and
https://github.com/vankesteren/ema_simulation
Bayesian threshold selection for extremal models using measures of surprise
Statistical extreme value theory is concerned with the use of asymptotically
motivated models to describe the extreme values of a process. A number of
commonly used models are valid for observed data that exceed some high
threshold. However, in practice a suitable threshold is unknown and must be
determined for each analysis. While there are many threshold selection methods
for univariate extremes, there are relatively few that can be applied in the
multivariate setting. In addition, there are only a few Bayesian-based methods,
which are naturally attractive in the modelling of extremes due to data
scarcity. The use of Bayesian measures of surprise to determine suitable
thresholds for extreme value models is proposed. Such measures quantify the
level of support for the proposed extremal model and threshold, without the
need to specify any model alternatives. This approach is easily implemented for
both univariate and multivariate extremes.Comment: To appear in Computational Statistics and Data Analysi
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
Unsupervised Learning via Total Correlation Explanation
Learning by children and animals occurs effortlessly and largely without
obvious supervision. Successes in automating supervised learning have not
translated to the more ambiguous realm of unsupervised learning where goals and
labels are not provided. Barlow (1961) suggested that the signal that brains
leverage for unsupervised learning is dependence, or redundancy, in the sensory
environment. Dependence can be characterized using the information-theoretic
multivariate mutual information measure called total correlation. The principle
of Total Cor-relation Ex-planation (CorEx) is to learn representations of data
that "explain" as much dependence in the data as possible. We review some
manifestations of this principle along with successes in unsupervised learning
problems across diverse domains including human behavior, biology, and
language.Comment: Invited contribution for IJCAI 2017 Early Career Spotlight. 5 pages,
1 figur
Informative Data Projections: A Framework and Two Examples
Methods for Projection Pursuit aim to facilitate the visual exploration of
high-dimensional data by identifying interesting low-dimensional projections. A
major challenge is the design of a suitable quality metric of projections,
commonly referred to as the projection index, to be maximized by the Projection
Pursuit algorithm. In this paper, we introduce a new information-theoretic
strategy for tackling this problem, based on quantifying the amount of
information the projection conveys to a user given their prior beliefs about
the data. The resulting projection index is a subjective quantity, explicitly
dependent on the intended user. As a useful illustration, we developed this
idea for two particular kinds of prior beliefs. The first kind leads to PCA
(Principal Component Analysis), shining new light on when PCA is (not)
appropriate. The second kind leads to a novel projection index, the
maximization of which can be regarded as a robust variant of PCA. We show how
this projection index, though non-convex, can be effectively maximized using a
modified power method as well as using a semidefinite programming relaxation.
The usefulness of this new projection index is demonstrated in comparative
empirical experiments against PCA and a popular Projection Pursuit method
Pattern classification of valence in depression
Copyright @ The authors, 2013. This is an open access article available under Creative Commons Licence, CC-BY-NC-ND 3.0.Neuroimaging biomarkers of depression have potential to aid diagnosis, identify individuals at risk and predict treatment response or course of illness. Nevertheless none have been identified so far, potentially because no single brain parameter captures the complexity of the pathophysiology of depression. Multi-voxel pattern analysis (MVPA) may overcome this issue as it can identify patterns of voxels that are spatially distributed across the brain. Here we present the results of an MVPA to investigate the neuronal patterns underlying passive viewing of positive, negative and neutral pictures in depressed patients. A linear support vector machine (SVM) was trained to discriminate different valence conditions based on the functional magnetic resonance imaging (fMRI) data of nine unipolar depressed patients. A similar dataset obtained in nine healthy individuals was included to conduct a group classification analysis via linear discriminant analysis (LDA). Accuracy scores of 86% or higher were obtained for each valence contrast via patterns that included limbic areas such as the amygdala and frontal areas such as the ventrolateral prefrontal cortex. The LDA identified two areas (the dorsomedial prefrontal cortex and caudate nucleus) that allowed group classification with 72.2% accuracy. Our preliminary findings suggest that MVPA can identify stable valence patterns, with more sensitivity than univariate analysis, in depressed participants and that it may be possible to discriminate between healthy and depressed individuals based on differences in the brain's response to emotional cues.This work was supported by a PhD studentship to I.H. from the National Institute for Social Care and Health Research (NISCHR) HS/10/25 and MRC grant G 1100629
- …