Search CORE

6,862 research outputs found

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Author: Balasubramanian Vivek
Cervone Guido
Hu Weiming
Jha Shantenu
Lefebvre Matthieu
Lei Wenjie
Modrak Ryan
Tromp Jeroen
Turilli Matteo
Publication venue
Publication date: 16/05/2018
Field of study

Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

Author: Cang Zixuan
Mu Lin
Wei Guowei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/08/2017
Field of study

This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

MolMod – an open access database of force fields for molecular simulations of fluids

Author: Hasse Hans
Horsch Martin T.
Stephan Simon
Vrabec Jadran
Publication venue
Publication date: 17/10/2019
Field of study

The MolMod database is presented, which is openly accessible at http://molmod.boltzmann-zuse.de and contains intermolecular force fields for over 150 pure fluids at present. It was developed and is maintained by the Boltzmann-Zuse Society for Computational Molecular Engineering (BZS). The set of molecular models in the MolMod database provides a coherent framework for molecular simulations of fluids. The molecular models in the MolMod database consist of Lennard-Jones interaction sites, point charges, and point dipoles and quadrupoles, which can be equivalently represented by multiple point charges. The force fields can be exported as input files for the simulation programmes ms2 and ls1 mardyn, GROMACS, and LAMMPS. To characterise the semantics associated with the numerical database content, a force field nomenclature is introduced that can also be used in other contexts in materials modelling at the atomistic and mesoscopic levels. The models of the pure substances that are included in the database were generally optimised such as to yield good representations of experimental data of the vapour–liquid equilibrium with a focus on the vapour pressure and the saturated liquid density. In many cases, the models also yield good predictions of caloric, transport, and interfacial properties of the pure fluids. For all models, references to the original works in which they were developed are provided. The models can be used straightforwardly for predictions of properties of fluid mixtures using established combination rules. Input errors are a major source of errors in simulations. The MolMod database contributes to reducing such errors.BMBF, 01IH16008E, Verbundprojekt: TaLPas - Task-basierte Lastverteilung und Auto-Tuning in der PartikelsimulationEC/H2020/694807/EU/Enrichment of Components at Interfaces and Mass Transfer in Fluid Separation Technologies/ENRICOEC/H2020/760907/EU/Virtual Materials Market Place (VIMMP)/VIMM

DepositOnce

TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

Author: Cang Zixuan
Wei Guo-Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/03/2017
Field of study

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Process Calculi Abstractions for Biology

Author: Guerriero Maria Luisa
Prandi Davide
Priami Corrado
Quaglia Paola
Publication venue
Publication date: 01/01/2006
Field of study

Several approaches have been proposed to model biological systems by means of the formal techniques and tools available in computer science. To mention just a few of them, some representations are inspired by Petri Nets theory, and some other by stochastic processes. A most recent approach consists in interpreting the living entities as terms of process calculi where the behavior of the represented systems can be inferred by applying syntax-driven rules. A comprehensive picture of the state of the art of the process calculi approach to biological modeling is still missing. This paper goes in the direction of providing such a picture by presenting a comparative survey of the process calculi that have been used and proposed to describe the behavior of living entities. This is the preliminary version of a paper that was published in Algorithmic Bioprocesses. The original publication is available at http://www.springer.com/computer/foundations/book/978-3-540-88868-

Unitn-eprints Research