25,237 research outputs found
Transferable neural networks for enhanced sampling of protein dynamics
Variational auto-encoder frameworks have demonstrated success in reducing
complex nonlinear dynamics in molecular simulation to a single non-linear
embedding. In this work, we illustrate how this non-linear latent embedding can
be used as a collective variable for enhanced sampling, and present a simple
modification that allows us to rapidly perform sampling in multiple related
systems. We first demonstrate our method is able to describe the effects of
force field changes in capped alanine dipeptide after learning a model using
AMBER99. We further provide a simple extension to variational dynamics encoders
that allows the model to be trained in a more efficient manner on larger
systems by encoding the outputs of a linear transformation using time-structure
based independent component analysis (tICA). Using this technique, we show how
such a model trained for one protein, the WW domain, can efficiently be
transferred to perform enhanced sampling on a related mutant protein, the GTT
mutation. This method shows promise for its ability to rapidly sample related
systems using a single transferable collective variable and is generally
applicable to sets of related simulations, enabling us to probe the effects of
variation in increasingly large systems of biophysical interest.Comment: 20 pages, 10 figure
Approximated and User Steerable tSNE for Progressive Visual Analytics
Progressive Visual Analytics aims at improving the interactivity in existing
analytics techniques by means of visualization as well as interaction with
intermediate results. One key method for data analysis is dimensionality
reduction, for example, to produce 2D embeddings that can be visualized and
analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a
well-suited technique for the visualization of several high-dimensional data.
tSNE can create meaningful intermediate results but suffers from a slow
initialization that constrains its application in Progressive Visual Analytics.
We introduce a controllable tSNE approximation (A-tSNE), which trades off speed
and accuracy, to enable interactive data exploration. We offer real-time
visualization techniques, including a density-based solution and a Magic Lens
to inspect the degree of approximation. With this feedback, the user can decide
on local refinements and steer the approximation level during the analysis. We
demonstrate our technique with several datasets, in a real-world research
scenario and for the real-time analysis of high-dimensional streams to
illustrate its effectiveness for interactive data analysis
Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis
Factor analysis aims to determine latent factors, or traits, which summarize
a given data set. Inter-battery factor analysis extends this notion to multiple
views of the data. In this paper we show how a nonlinear, nonparametric version
of these models can be recovered through the Gaussian process latent variable
model. This gives us a flexible formalism for multi-view learning where the
latent variables can be used both for exploratory purposes and for learning
representations that enable efficient inference for ambiguous estimation tasks.
Learning is performed in a Bayesian manner through the formulation of a
variational compression scheme which gives a rigorous lower bound on the log
likelihood. Our Bayesian framework provides strong regularization during
training, allowing the structure of the latent space to be determined
efficiently and automatically. We demonstrate this by producing the first (to
our knowledge) published results of learning from dozens of views, even when
data is scarce. We further show experimental results on several different types
of multi-view data sets and for different kinds of tasks, including exploratory
data analysis, generation, ambiguity modelling through latent priors and
classification.Comment: 49 pages including appendi
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Accelerating Parallel Tempering: Quantile Tempering Algorithm (QuanTA)
Using MCMC to sample from a target distribution, on a
-dimensional state space can be a difficult and computationally expensive
problem. Particularly when the target exhibits multimodality, then the
traditional methods can fail to explore the entire state space and this results
in a bias sample output. Methods to overcome this issue include the parallel
tempering algorithm which utilises an augmented state space approach to help
the Markov chain traverse regions of low probability density and reach other
modes. This method suffers from the curse of dimensionality which dramatically
slows the transfer of mixing information from the auxiliary targets to the
target of interest as . This paper introduces a novel
prototype algorithm, QuanTA, that uses a Gaussian motivated transformation in
an attempt to accelerate the mixing through the temperature schedule of a
parallel tempering algorithm. This new algorithm is accompanied by a
comprehensive theoretical analysis quantifying the improved efficiency and
scalability of the approach; concluding that under weak regularity conditions
the new approach gives accelerated mixing through the temperature schedule.
Empirical evidence of the effectiveness of this new algorithm is illustrated on
canonical examples
- …