25,482 research outputs found

    Exploratory Mediation Analysis with Many Potential Mediators

    Full text link
    Social and behavioral scientists are increasingly employing technologies such as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional' datasets with more columns than rows. There is increasing interest, but little substantive theory, in the role the variables in these data play in known processes. This necessitates exploratory mediation analysis, for which structural equation modeling is the benchmark method. However, this method cannot perform mediation analysis with more variables than observations. One option is to run a series of univariate mediation models, which incorrectly assumes independence of the mediators. Another option is regularization, but the available implementations may lead to high false positive rates. In this paper, we develop a hybrid approach which uses components of both filter and regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering conditional on the other selected mediators. We show through simulation that it improves performance over existing methods. Finally, we provide an empirical example, showing how our method may be used for epigenetic research.Comment: R code and package are available online as supplementary material at https://github.com/vankesteren/cmfilter and https://github.com/vankesteren/ema_simulation

    Bayesian threshold selection for extremal models using measures of surprise

    Full text link
    Statistical extreme value theory is concerned with the use of asymptotically motivated models to describe the extreme values of a process. A number of commonly used models are valid for observed data that exceed some high threshold. However, in practice a suitable threshold is unknown and must be determined for each analysis. While there are many threshold selection methods for univariate extremes, there are relatively few that can be applied in the multivariate setting. In addition, there are only a few Bayesian-based methods, which are naturally attractive in the modelling of extremes due to data scarcity. The use of Bayesian measures of surprise to determine suitable thresholds for extreme value models is proposed. Such measures quantify the level of support for the proposed extremal model and threshold, without the need to specify any model alternatives. This approach is easily implemented for both univariate and multivariate extremes.Comment: To appear in Computational Statistics and Data Analysi

    Subjectively Interesting Subgroup Discovery on Real-valued Targets

    Get PDF
    Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

    Unsupervised Learning via Total Correlation Explanation

    Full text link
    Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.Comment: Invited contribution for IJCAI 2017 Early Career Spotlight. 5 pages, 1 figur

    Informative Data Projections: A Framework and Two Examples

    Get PDF
    Methods for Projection Pursuit aim to facilitate the visual exploration of high-dimensional data by identifying interesting low-dimensional projections. A major challenge is the design of a suitable quality metric of projections, commonly referred to as the projection index, to be maximized by the Projection Pursuit algorithm. In this paper, we introduce a new information-theoretic strategy for tackling this problem, based on quantifying the amount of information the projection conveys to a user given their prior beliefs about the data. The resulting projection index is a subjective quantity, explicitly dependent on the intended user. As a useful illustration, we developed this idea for two particular kinds of prior beliefs. The first kind leads to PCA (Principal Component Analysis), shining new light on when PCA is (not) appropriate. The second kind leads to a novel projection index, the maximization of which can be regarded as a robust variant of PCA. We show how this projection index, though non-convex, can be effectively maximized using a modified power method as well as using a semidefinite programming relaxation. The usefulness of this new projection index is demonstrated in comparative empirical experiments against PCA and a popular Projection Pursuit method

    Pattern classification of valence in depression

    Get PDF
    Copyright @ The authors, 2013. This is an open access article available under Creative Commons Licence, CC-BY-NC-ND 3.0.Neuroimaging biomarkers of depression have potential to aid diagnosis, identify individuals at risk and predict treatment response or course of illness. Nevertheless none have been identified so far, potentially because no single brain parameter captures the complexity of the pathophysiology of depression. Multi-voxel pattern analysis (MVPA) may overcome this issue as it can identify patterns of voxels that are spatially distributed across the brain. Here we present the results of an MVPA to investigate the neuronal patterns underlying passive viewing of positive, negative and neutral pictures in depressed patients. A linear support vector machine (SVM) was trained to discriminate different valence conditions based on the functional magnetic resonance imaging (fMRI) data of nine unipolar depressed patients. A similar dataset obtained in nine healthy individuals was included to conduct a group classification analysis via linear discriminant analysis (LDA). Accuracy scores of 86% or higher were obtained for each valence contrast via patterns that included limbic areas such as the amygdala and frontal areas such as the ventrolateral prefrontal cortex. The LDA identified two areas (the dorsomedial prefrontal cortex and caudate nucleus) that allowed group classification with 72.2% accuracy. Our preliminary findings suggest that MVPA can identify stable valence patterns, with more sensitivity than univariate analysis, in depressed participants and that it may be possible to discriminate between healthy and depressed individuals based on differences in the brain's response to emotional cues.This work was supported by a PhD studentship to I.H. from the National Institute for Social Care and Health Research (NISCHR) HS/10/25 and MRC grant G 1100629