702 research outputs found
Clustering with shallow trees
We propose a new method for hierarchical clustering based on the optimisation
of a cost function over trees of limited depth, and we derive a
message--passing method that allows to solve it efficiently. The method and
algorithm can be interpreted as a natural interpolation between two well-known
approaches, namely single linkage and the recently presented Affinity
Propagation. We analyze with this general scheme three biological/medical
structured datasets (human population based on genetic information, proteins
based on sequences and verbal autopsies) and show that the interpolation
technique provides new insight.Comment: 11 pages, 7 figure
Recommended from our members
The perfect recovery? Interactive influence of perfectionism and spillover work tasks on changes in exhaustion and mood around a vacation
This study examined week-level changes in affective well-being among school teachers as they transitioned into and out of a 1-week vacation. In addition, we investigated the interactive influence of personality characteristics (specifically perfectionism) and spillover work activities during the vacation on changes in teachers' well-being. A sample of 224 teachers completed study measures across 7 consecutive weeks, spanning the period before, during, and after a midterm vacation (providing a total of 1,525 responses across the study period). Results obtained from discontinuous multilevel growth models revealed evidence of a vacation effect, indicated by significant reductions in emotional exhaustion, anxiety, and depressed mood from before to during the vacation. Across 4 working weeks following the vacation, exhaustion and negative mood exhibited a nonlinear pattern of gradual convergence back to prevacation levels. Teachers with a higher level of perfectionistic concerns experienced elevated working week levels of exhaustion, anxious mood, and depressed mood, followed by pronounced reductions in anxious and depressed mood as they transitioned into the vacation. However, a strongly beneficial effect of the vacation was only obtained by perfectionistic teachers who refrained from spillover work tasks during the vacation. This pattern of findings is consistent with a diathesis-stress model, in that the perfectionists' vulnerability was relatively dormant (or deactivated) during a respite from job demands. Our results may provide an explanation for why engaging in work-related activities during vacations has previously exhibited weak relationships with employees' recovery and well-being
Feature-to-feature regression for a two-step conditional independence test
The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data
Bayesian kernel two-sample testing
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference between kernel mean embeddings in the reproducing kernel Hilbert space utilising the framework established by Flaxman et al (2016). The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e. testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach
Scalable high-resolution forecasting of sparse spatiotemporal events with kernel methods: a winning solution to the NIJ "Real-Time Crime Forecasting Challenge"
We propose a generic spatiotemporal event forecasting method, which we developed for the National Institute of Justice’s (NIJ) RealTime Crime Forecasting Challenge (National Institute of Justice, 2017). Our method is a spatiotemporal forecasting model combining scalable randomized Reproducing Kernel Hilbert Space (RKHS) methods for approximating Gaussian processes with autoregressive smoothing kernels in a regularized supervised learning framework. While the smoothing kernels capture the two main approaches in current use in the field of crime forecasting, kernel density estimation (KDE) and self-exciting point process (SEPP) models, the RKHS component of the model can be understood as an approximation to the popular log-Gaussian Cox Process model. For inference, we discretize the spatiotemporal point pattern and learn a log-intensity function using the Poisson likelihood and highly efficient gradientbased optimization methods. Model hyperparameters including quality of RKHS approximation, spatial and temporal kernel lengthscales, number of autoregressive lags, bandwidths for smoothing kernels, as well as cell shape, size, and rotation, were learned using crossvalidation. Resulting predictions significantly exceeded baseline KDE estimates and SEPP models for sparse events
Recommended from our members
Self-guided mindfulness and cognitive behavioural practices reduce anxiety in autistic adults: A pilot 8-month waitlist-controlled trial of widely available online tools
Anxiety in autism is an important treatment target because of its consequences for quality of life and wellbeing. Growing evidence suggests that Cognitive Behaviour Therapies (CBT) and Mindfulness-Based Therapies (MBT) can ameliorate anxiety in autism but cost-effective delivery remains a challenge. This pilot randomized controlled trial examined whether online CBT and MBT self-help programmes could help reduce anxiety in 54 autistic adults who were randomly allocated to either an online CBT (n=16) or MBT (n=19) programme or a waitlist control group (WL; n=19). Primary outcome measures of anxiety, secondary outcome measures of broader wellbeing, and potential process of change variables were collected at baseline, after programme completion, and then 3 and 6 months post-completion. Baseline data confirmed that intolerance of uncertainty and emotional acceptance accounted for up to 61% of self-reported anxiety across all participants. The 23 participants who were retained in the active conditions (14 MBT, 9 CBT) showed significant decreases in anxiety that were maintained over 3, and to some extent also 6 months. Overall, results suggest that online self-help CBT and MBT tools may provide a cost-effective method for delivering mental health support to those autistic adults who can engage effectively with online support tools
Evaluating distributional regression strategies for modelling self-reported sexual age-mixing
The age dynamics of sexual partnership formation determine patterns of sexually transmitted disease transmission and have long been a focus of researchers studying human immunodeficiency virus. Data on self-reported sexual partner age distributions are available from a variety of sources. We sought to explore statistical models that accurately predict the distribution of sexual partner ages over age and sex. We identified which probability distributions and outcome specifications best captured variation in partner age and quantified the benefits of modelling these data using distributional regression. We found that distributional regression with a sinh-arcsinh distribution replicated observed partner age distributions most accurately across three geographically diverse data sets. This framework can be extended with well-known hierarchical modelling tools and can help improve estimates of sexual age-mixing dynamics
PriorVAE: encoding spatial priors with variational autoencoders for small-area estimation.
Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatial statistical modelling. In this context, they are used to encode correlation structures over space and can generalize well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges which limit their scalability and practical usefulness in applied settings. Here, we propose a novel, deep generative modelling approach to tackle this challenge, termed PriorVAE: for a particular spatial setting, we approximate a class of GP priors through prior sampling and subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the resultant decoder allows spatial inference to become incredibly efficient due to the low dimensional, independently distributed latent Gaussian space representation of the VAE. Once trained, inference using the VAE decoder replaces the GP within a Bayesian sampling framework. This approach provides tractable and easy-to-implement means of approximately encoding spatial priors and facilitates efficient statistical inference. We demonstrate the utility of our VAE two-stage approach on Bayesian, small-area estimation tasks
Modeling and forecasting art movements with CGANs
Conditional generative adversarial networks (CGANs) are a recent and popular method for generating samples from a probability distribution conditioned on latent information. The latent information often comes in the form of a discrete label from a small set. We propose a novel method for training CGANs which allows us to condition on a sequence of continuous latent distributions f(1), …, f(K). This training allows CGANs to generate samples from a sequence of distributions. We apply our method to paintings from a sequence of artistic movements, where each movement is considered to be its own distribution. Exploiting the temporal aspect of the data, a vector autoregressive (VAR) model is fitted to the means of the latent distributions that we learn, and used for one-step-ahead forecasting, to predict the latent distribution of a future art movement f(K+1). Realizations from this distribution can be used by the CGAN to generate ‘future’ paintings. In experiments, this novel methodology generates accurate predictions of the evolution of art. The training set consists of a large dataset of past paintings. While there is no agreement on exactly what current art period we find ourselves in, we test on plausible candidate sets of present art, and show that the mean distance to our predictions is small
Probabilistic Analysis of Facility Location on Random Shortest Path Metrics
The facility location problem is an NP-hard optimization problem. Therefore,
approximation algorithms are often used to solve large instances. Such
algorithms often perform much better than worst-case analysis suggests.
Therefore, probabilistic analysis is a widely used tool to analyze such
algorithms. Most research on probabilistic analysis of NP-hard optimization
problems involving metric spaces, such as the facility location problem, has
been focused on Euclidean instances, and also instances with independent
(random) edge lengths, which are non-metric, have been researched. We would
like to extend this knowledge to other, more general, metrics.
We investigate the facility location problem using random shortest path
metrics. We analyze some probabilistic properties for a simple greedy heuristic
which gives a solution to the facility location problem: opening the
cheapest facilities (with only depending on the facility opening
costs). If the facility opening costs are such that is not too large,
then we show that this heuristic is asymptotically optimal. On the other hand,
for large values of , the analysis becomes more difficult, and we
provide a closed-form expression as upper bound for the expected approximation
ratio. In the special case where all facility opening costs are equal this
closed-form expression reduces to or or even
if the opening costs are sufficiently small.Comment: A preliminary version accepted to CiE 201
- …