Search CORE

25,237 research outputs found

Transferable neural networks for enhanced sampling of protein dynamics

Author: Pande Vijay S.
Sultan Mohammad M.
Wayment-Steele Hannah K.
Publication venue
Publication date: 02/01/2018
Field of study

Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable and is generally applicable to sets of related simulations, enabling us to probe the effects of variation in increasingly large systems of biophysical interest.Comment: 20 pages, 10 figure

arXiv.org e-Print Archive

FigShare

Approximated and User Steerable tSNE for Progressive Visual Analytics

Author: Eisemann Elmar
Höllt Thomas
Lelieveldt Boudewijn P. F.
Pezzotti Nicola
van der Maaten Laurens
Vilanova Anna
Publication venue
Publication date: 01/01/2016
Field of study

Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis

arXiv.org e-Print Archive

Repository TU/e

TU Delft Repository

Pure OAI Repository

Leiden University Scholary Publications

Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis

Author: Damianou Andreas
Ek Carl Henrik
Lawrence Neil D.
Publication venue
Publication date: 17/04/2016
Field of study

Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.Comment: 49 pages including appendi

arXiv.org e-Print Archive

Explore Bristol Research

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Accelerating Parallel Tempering: Quantile Tempering Algorithm (QuanTA)

Author: Roberts Gareth O.
Tawn Nicholas G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 30/08/2018
Field of study

Using MCMC to sample from a target distribution,

\pi(x)

on a

d

-dimensional state space can be a difficult and computationally expensive problem. Particularly when the target exhibits multimodality, then the traditional methods can fail to explore the entire state space and this results in a bias sample output. Methods to overcome this issue include the parallel tempering algorithm which utilises an augmented state space approach to help the Markov chain traverse regions of low probability density and reach other modes. This method suffers from the curse of dimensionality which dramatically slows the transfer of mixing information from the auxiliary targets to the target of interest as

d \rightarrow \infty

. This paper introduces a novel prototype algorithm, QuanTA, that uses a Gaussian motivated transformation in an attempt to accelerate the mixing through the temperature schedule of a parallel tempering algorithm. This new algorithm is accompanied by a comprehensive theoretical analysis quantifying the improved efficiency and scalability of the approach; concluding that under weak regularity conditions the new approach gives accelerated mixing through the temperature schedule. Empirical evidence of the effectiveness of this new algorithm is illustrated on canonical examples

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository