6,214 research outputs found
Bayesian Optimization with Dimension Scheduling: Application to Biological Systems
Bayesian Optimization (BO) is a data-efficient method for global black-box
optimization of an expensive-to-evaluate fitness function. BO typically assumes
that computation cost of BO is cheap, but experiments are time consuming or
costly. In practice, this allows us to optimize ten or fewer critical
parameters in up to 1,000 experiments. But experiments may be less expensive
than BO methods assume: In some simulation models, we may be able to conduct
multiple thousands of experiments in a few hours, and the computational burden
of BO is no longer negligible compared to experimentation time. To address this
challenge we introduce a new Dimension Scheduling Algorithm (DSA), which
reduces the computational burden of BO for many experiments. The key idea is
that DSA optimizes the fitness function only along a small set of dimensions at
each iteration. This DSA strategy (1) reduces the necessary computation time,
(2) finds good solutions faster than the traditional BO method, and (3) can be
parallelized straightforwardly. We evaluate the DSA in the context of
optimizing parameters of dynamic models of microalgae metabolism and show
faster convergence than traditional BO
Clear and Compress: Computing Persistent Homology in Chunks
We present a parallelizable algorithm for computing the persistent homology
of a filtered chain complex. Our approach differs from the commonly used
reduction algorithm by first computing persistence pairs within local chunks,
then simplifying the unpaired columns, and finally applying standard reduction
on the simplified matrix. The approach generalizes a technique by G\"unther et
al., which uses discrete Morse Theory to compute persistence; we derive the
same worst-case complexity bound in a more general context. The algorithm
employs several practical optimization techniques which are of independent
interest. Our sequential implementation of the algorithm is competitive with
state-of-the-art methods, and we improve the performance through parallelized
computation.Comment: This result was presented at TopoInVis 2013
(http://www.sci.utah.edu/topoinvis13.html
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model
hyperparameters, regularization terms, and optimization parameters.
Unfortunately, this tuning is often a "black art" that requires expert
experience, unwritten rules of thumb, or sometimes brute-force search. Much
more appealing is the idea of developing automatic approaches which can
optimize the performance of a given learning algorithm to the task at hand. In
this work, we consider the automatic tuning problem within the framework of
Bayesian optimization, in which a learning algorithm's generalization
performance is modeled as a sample from a Gaussian process (GP). The tractable
posterior distribution induced by the GP leads to efficient use of the
information gathered by previous experiments, enabling optimal choices about
what parameters to try next. Here we show how the effects of the Gaussian
process prior and the associated inference procedure can have a large impact on
the success or failure of Bayesian optimization. We show that thoughtful
choices can lead to results that exceed expert-level performance in tuning
machine learning algorithms. We also describe new algorithms that take into
account the variable cost (duration) of learning experiments and that can
leverage the presence of multiple cores for parallel experimentation. We show
that these proposed algorithms improve on previous automatic procedures and can
reach or surpass human expert-level optimization on a diverse set of
contemporary algorithms including latent Dirichlet allocation, structured SVMs
and convolutional neural networks
Counting Triangles in Large Graphs on GPU
The clustering coefficient and the transitivity ratio are concepts often used
in network analysis, which creates a need for fast practical algorithms for
counting triangles in large graphs. Previous research in this area focused on
sequential algorithms, MapReduce parallelization, and fast approximations.
In this paper we propose a parallel triangle counting algorithm for CUDA GPU.
We describe the implementation details necessary to achieve high performance
and present the experimental evaluation of our approach. Our algorithm achieves
8 to 15 times speedup over the CPU implementation and is capable of finding 3.8
billion triangles in an 89 million edges graph in less than 10 seconds on the
Nvidia Tesla C2050 GPU.Comment: 2016 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we propose string sample sort. The algorithm makes effective use of
the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Additionally, we parallelize variants of multikey
quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
- …