1,363 research outputs found
Accepting Hybrid Networks of Evolutionary Processors with Special Topologies and Small Communication
Starting from the fact that complete Accepting Hybrid Networks of
Evolutionary Processors allow much communication between the nodes and are far
from network structures used in practice, we propose in this paper three
network topologies that restrict the communication: star networks, ring
networks, and grid networks. We show that ring-AHNEPs can simulate 2-tag
systems, thus we deduce the existence of a universal ring-AHNEP. For star
networks or grid networks, we show a more general result; that is, each
recursively enumerable language can be accepted efficiently by a star- or
grid-AHNEP. We also present bounds for the size of these star and grid
networks. As a consequence we get that each recursively enumerable can be
accepted by networks with at most 13 communication channels and by networks
where each node communicates with at most three other nodes.Comment: In Proceedings DCFS 2010, arXiv:1008.127
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
In this paper, we discuss software design issues related to the development
of parallel computational intelligence algorithms on multi-core CPUs, using the
new Java 8 functional programming features. In particular, we focus on
probabilistic graphical models (PGMs) and present the parallelisation of a
collection of algorithms that deal with inference and learning of PGMs from
data. Namely, maximum likelihood estimation, importance sampling, and greedy
search for solving combinatorial optimisation problems. Through these concrete
examples, we tackle the problem of defining efficient data structures for PGMs
and parallel processing of same-size batches of data sets using Java 8
features. We also provide straightforward techniques to code parallel
algorithms that seamlessly exploit multi-core processors. The experimental
analysis, carried out using our open source AMIDST (Analysis of MassIve Data
STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on
Computational Intelligence Software at IEEE Computational Intelligence
Magazine journa
Targeting the Poly (ADP-Ribose) Polymerase-1 Catalytic Pocket Using AutoGrow4, a Genetic Algorithm for De Novo Design
AutoGrow4 is a free and open-source program for de novo drug design that uses a genetic algorithm (GA) to create novel predicted small-molecule ligands for a given protein target without the constraints of a finite, pre-defined virtual library. By leveraging recent computational and cheminformatic advancements, AutoGrow4 is faster, more stable, and more modular than previous versions. Features such as docking-software compatibility, chemical filters, multithreading options, and selection methods have been expanded to support a wide range of user needs. This dissertation will cover the development and validation of AutoGrow4, as well as its application to poly (ADP-ribose) polymerase-1 (PARP-1).
PARP-1 is a well-characterized DNA-damage recognition protein, and PARP-1 inhibition is an effective treatment for ovarian and breast cancers that are homologous-recombination (HR) deficient1–5. As a well-studied protein, PARP-1 is also an excellent drug target with which to validate AutoGrow4. Multiple crystallographic structures of PARP-1 bound to various PARP-1 inhibitors (PARPi) serve as positive controls for assessing the quality of AutoGrow4-generated compounds in terms of predicted binding affinity, chemical structure, and predicted protein-ligand interactions.
This dissertation describes how I (1) generated novel potential PARPi with predicted binding affinities that surpass those of known PARPi; (2) validated AutoGrow4 as a tool for de novo drug design, lead optimization, and hypothesis generation, using PARP-1 as a test target; (3) contributed support to the growing notion that there is a need for HR-deficient cancer chemotherapies that do not rely on the same set of protein-ligand interactions typical of current PARPi; (4) generated novel potential PARPi that are predicted to bind to PARP-1 independent of a post-translational modification that is known to cause PARPi resistance; and (5) generated novel potential PARPi that are predicted to bind a secondary PARP-1 pocket that is distant from the primary catalytic site
Parallel Markov Chain Monte Carlo
The increasing availability of multi-core and multi-processor architectures provides
new opportunities for improving the performance of many computer simulations.
Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate
counting problems, Bayesian inference and as a means for estimating very highdimensional
integrals. As such MCMC has found a wide variety of applications in
fields including computational biology and physics,financial econometrics, machine
learning and image processing.
This thesis presents a number of new method for reducing the runtime of
Markov Chain Monte Carlo simulations by using SMP machines and/or clusters.
Two of the methods speculatively perform iterations in parallel, reducing the runtime
of MCMC programs whilst producing statistically identical results to conventional
sequential implementations. The other methods apply only to problem domains
that can be presented as an image, and involve using various means of dividing
the image into subimages that can be proceed with some degree of independence.
Where possible the thesis includes a theoretical analysis of the reduction in
runtime that may be achieved using our technique under perfect conditions, and
in all cases the methods are tested and compared on selection of multi-core and
multi-processor architectures. A framework is provided to allow easy construction
of MCMC application that implement these parallelisation methods
Recommended from our members
Monte Carlo Methods in Practice and Efficiency Enhancements via Parallel Computation
Monte Carlo methods are crucial when dealing with advanced problems in Bayesian inference. Indeed, common approaches such as Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) can be endlessly adapted to tackle the most complex problems. What is important then is to construct efficient algorithms, and significant attention in the literature is devoted to developing algorithms that mix well, have low computational complexity and can scale up to large datasets. One of the most commonly used and straightforward approaches is to speed up Monte Carlo algorithms by running them in parallel computing environments. The compute time of Monte Carlo algorithms is random and can vary depending on the current state of the Markov chain. Other computing-infrastructure related factors, such as competing jobs on the same processor, or memory bandwidth, which are prevalent in shared computing architectures such as cloud computing, can also affect this compute time. However, many algorithms running in parallel require the processors to communicate every so often, and for that we must ensure that they are simultaneously ready and any idle wait time is minimised. This can be done by employing a framework known as Anytime Monte Carlo, which imposes a real-time deadline on parallel computations.
The contributions in this thesis include novel applications of the Anytime framework to construct efficient Anytime MCMC and SMC algorithms which make use of parallel computing in order to perform inference for advanced problems. Examples of such problems investigated include models in which the likelihood cannot be evaluated analytically, and changepoint models, which are often used to model the heterogeneity of sequential data, but tricky to infer upon due to the unknown number and locations of the changepoints. This thesis also focuses on the difficult task of performing parameter inference in single-molecule microscopy, a category of models in which the arrival rate of observations is not uniformly distributed and measurement models have complex forms. These issues are exacerbated when molecules have trajectories described by stochastic differential equations.
The original contributions of this thesis are organised in Chapters 4-6. Chapter 4 shows the development of a novel Anytime parallel tempering algorithm and demonstrates the performance enhancements the Anytime framework brings to parallel tempering, an algorithm, which runs multiple interacting MCMC chains in order to more efficiently explore the state space. In Chapter 5, a general Anytime SMC sampler is developed for performing changepoint inference using reversible jump MCMC (RJ-MCMC), an algorithm that takes into account the unknown number of changepoints by including transdimensional MCMC updates. The workings of the algorithm are illustrated on a particularly complex changepoint model, and once again the improvements in performance brought by employing the Anytime framework are demonstrated. Chapter 6 moves away from the Anytime framework, and presents a novel and general SMC approach to performing parameter inference for molecules with stochastic trajectories
Computational miRNA Target Prediction in Animals
miRNAs are a class of small RNA molecules about 22 nucleotides long that regulate gene expression at the post-transcriptional level. The discovery of the second miRNA 10 years ago was as much a surprise in its own way as the very structure of DNA discovered a half century earlier[1]. How could these small molecules regulate so many genes? During the past decade the complex cascade of regulation has been investigated and reported in detail[2]. The regions of the genome called untranslated regions, or UTRs, proved true to their name: they were indeed untranslated, but certainly not unimportant: they act as the origin and often the destination of miRNAs.
miRBase[3] contains 1048 human miRNAs with more undoubtedly on the way. But experimental identification of miRNA targets has proven dreadfully slow and difficult. Instead, scientists have turned to computational target prediction programs as the preferred method to quickly identify potential miRNA targets. Current prediction tools have produced a huge number of potential target sites, but determining if they are correct, or which algorithms produce the most reliable predictions, remains an open question.
This project examines one type of algorithm, a probabilistic model called a profile Hidden Markov Model (pHMM), and uses it to predict miRNA target sites. HMMs are known to be very effective in pattern recognition and have been successfully applied to various bioinformatic applications, such as gene finding, multiple sequence alignment and protein family classification[4]. We proposed to build a pHMM from known miRNA interactions and use this model to identify potential miRNA target sites in UTR regions by abstracting the Watson-Crick base pairs into meta codes intended to more naturally describe important relationships in RNA folding. High quality positive training data came from the best curated mRNA:miRNA data-bases we could find, while negative training data was generated using random sequences. The purpose of this project was to demonstrate the flexibility of the pHMM architecture to process many kinds of interesting data and by doing so improve their miRNA target site prediction
PynPoint: a modular pipeline architecture for processing and analysis of high-contrast imaging data
The direct detection and characterization of planetary and substellar
companions at small angular separations is a rapidly advancing field. Dedicated
high-contrast imaging instruments deliver unprecedented sensitivity, enabling
detailed insights into the atmospheres of young low-mass companions. In
addition, improvements in data reduction and PSF subtraction algorithms are
equally relevant for maximizing the scientific yield, both from new and
archival data sets. We aim at developing a generic and modular data reduction
pipeline for processing and analysis of high-contrast imaging data obtained
with pupil-stabilized observations. The package should be scalable and robust
for future implementations and in particular well suitable for the 3-5 micron
wavelength range where typically (ten) thousands of frames have to be processed
and an accurate subtraction of the thermal background emission is critical.
PynPoint is written in Python 2.7 and applies various image processing
techniques, as well as statistical tools for analyzing the data, building on
open-source Python packages. The current version of PynPoint has evolved from
an earlier version that was developed as a PSF subtraction tool based on PCA.
The architecture of PynPoint has been redesigned with the core functionalities
decoupled from the pipeline modules. Modules have been implemented for
dedicated processing and analysis steps, including background subtraction,
frame registration, PSF subtraction, photometric and astrometric measurements,
and estimation of detection limits. The pipeline package enables end-to-end
data reduction of pupil-stabilized data and supports classical dithering and
coronagraphic data sets. As an example, we processed archival VLT/NACO L' and
M' data of beta Pic b and reassessed the planet's brightness and position with
an MCMC analysis, and we provide a derivation of the photometric error budget.Comment: 16 pages, 9 figures, accepted for publication in A&A, PynPoint is
available at https://github.com/PynPoint/PynPoin
Parallel Markov Chain Monte Carlo
The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods.EThOS - Electronic Theses Online ServiceUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
- …