1,939 research outputs found
Paramotopy: Parameter homotopies in parallel
Numerical algebraic geometry provides a number of efficient tools for
approximating the solutions of polynomial systems. One such tool is the
parameter homotopy, which can be an extremely efficient method to solve
numerous polynomial systems that differ only in coefficients, not monomials.
This technique is frequently used for solving a parameterized family of
polynomial systems at multiple parameter values. Parameter homotopies have
recently been useful in several areas of application and have been implemented
in at least two software packages. This article describes Paramotopy, a new,
parallel, optimized implementation of this technique, making use of the Bertini
software package. The novel features of this implementation, not available
elsewhere, include allowing for the simultaneous solutions of arbitrary
polynomial systems in a parameterized family on an automatically generated (or
manually provided) mesh in the parameter space of coefficients, front ends and
back ends that are easily specialized to particular classes of problems, and
adaptive techniques for solving polynomial systems near singular points in the
parameter space. This last feature automates and simplifies a task that is
important but often misunderstood by non-experts.Comment: Long version of ICMS extended abstrac
Automatic differentiation in ML: Where we are and where we should be going
We review the current state of automatic differentiation (AD) for array
programming in machine learning (ML), including the different approaches such
as operator overloading (OO) and source transformation (ST) used for AD,
graph-based intermediate representations for programs, and source languages.
Based on these insights, we introduce a new graph-based intermediate
representation (IR) which specifically aims to efficiently support
fully-general AD for array programming. Unlike existing dataflow programming
representations in ML frameworks, our IR naturally supports function calls,
higher-order functions and recursion, making ML models easier to implement. The
ability to represent closures allows us to perform AD using ST without a tape,
making the resulting derivative (adjoint) program amenable to ahead-of-time
optimization using tools from functional language compilers, and enabling
higher-order derivatives. Lastly, we introduce a proof of concept compiler
toolchain called Myia which uses a subset of Python as a front end
A Python Extension for the Massively Parallel Multiphysics Simulation Framework waLBerla
We present a Python extension to the massively parallel HPC simulation
toolkit waLBerla. waLBerla is a framework for stencil based algorithms
operating on block-structured grids, with the main application field being
fluid simulations in complex geometries using the lattice Boltzmann method.
Careful performance engineering results in excellent node performance and good
scalability to over 400,000 cores. To increase the usability and flexibility of
the framework, a Python interface was developed. Python extensions are used at
all stages of the simulation pipeline: They simplify and automate scenario
setup, evaluation, and plotting. We show how our Python interface outperforms
the existing text-file-based configuration mechanism, providing features like
automatic nondimensionalization of physical quantities and handling of complex
parameter dependencies. Furthermore, Python is used to process and evaluate
results while the simulation is running, leading to smaller output files and
the possibility to adjust parameters dependent on the current simulation state.
C++ data structures are exported such that a seamless interfacing to other
numerical Python libraries is possible. The expressive power of Python and the
performance of C++ make development of efficient code with low time effort
possible
A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix
We introduce a novel algorithm for approximating the logarithm of the
determinant of a symmetric positive definite (SPD) matrix. The algorithm is
randomized and approximates the traces of a small number of matrix powers of a
specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}.
From a theoretical perspective, we present additive and relative error bounds
for our algorithm. Our additive error bound works for any SPD matrix, whereas
our relative error bound works for SPD matrices whose eigenvalues lie in the
interval , with ; the latter setting was proposed
in~\cite{icml2015_hana15}. From an empirical perspective, we demonstrate that a
C++ implementation of our algorithm can approximate the logarithm of the
determinant of large matrices very accurately in a matter of seconds.Comment: working pape
Assessing Excel VBA Suitability for Monte Carlo Simulation
Monte Carlo (MC) simulation includes a wide range of stochastic techniques
used to quantitatively evaluate the behavior of complex systems or processes.
Microsoft Excel spreadsheets with Visual Basic for Applications (VBA) software
is, arguably, the most commonly employed general purpose tool for MC
simulation. Despite the popularity of the Excel in many industries and
educational institutions, it has been repeatedly criticized for its flaws and
often described as questionable, if not completely unsuitable, for statistical
problems. The purpose of this study is to assess suitability of the Excel
(specifically its 2010 and 2013 versions) with VBA programming as a tool for MC
simulation. The results of the study indicate that Microsoft Excel (versions
2010 and 2013) is a strong Monte Carlo simulation application offering a solid
framework of core simulation components including spreadsheets for data input
and output, VBA development environment and summary statistics functions. This
framework should be complemented with an external high-quality pseudo-random
number generator added as a VBA module. A large and diverse category of Excel
incidental simulation components that includes statistical distributions,
linear and non-linear regression and other statistical, engineering and
business functions require execution of due diligence to determine their
suitability for a specific MC project
An Optimization Framework to Improve 4D-Var Data Assimilation System Performance
This paper develops a computational framework for optimizing the parameters
of data assimilation systems in order to improve their performance. The
approach formulates a continuous meta-optimization problem for parameters; the
meta-optimization is constrained by the original data assimilation problem. The
numerical solution process employs adjoint models and iterative solvers. The
proposed framework is applied to optimize observation values, data weighting
coefficients, and the location of sensors for a test problem. The ability to
optimize a distributed measurement network is crucial for cutting down
operating costs and detecting malfunctions
A performance spectrum for parallel computational frameworks that solve PDEs
Important computational physics problems are often large-scale in nature, and
it is highly desirable to have robust and high performing computational
frameworks that can quickly address these problems. However, it is no trivial
task to determine whether a computational framework is performing efficiently
or is scalable. The aim of this paper is to present various strategies for
better understanding the performance of any parallel computational frameworks
for solving PDEs. Important performance issues that negatively impact
time-to-solution are discussed, and we propose a performance spectrum analysis
that can enhance one's understanding of critical aforementioned performance
issues. As proof of concept, we examine commonly used finite element simulation
packages and software and apply the performance spectrum to quickly analyze the
performance and scalability across various hardware platforms, software
implementations, and numerical discretizations. It is shown that the proposed
performance spectrum is a versatile performance model that is not only
extendable to more complex PDEs such as hydrostatic ice sheet flow equations,
but also useful for understanding hardware performance in a massively parallel
computing environment. Potential applications and future extensions of this
work are also discussed
An efficient null space inexact Newton method for hydraulic simulation of water distribution networks
Null space Newton algorithms are efficient in solving the nonlinear equations
arising in hydraulic analysis of water distribution networks. In this article,
we propose and evaluate an inexact Newton method that relies on partial updates
of the network pipes' frictional headloss computations to solve the linear
systems more efficiently and with numerical reliability. The update set
parameters are studied to propose appropriate values. Different null space
basis generation schemes are analysed to choose methods for sparse and
well-conditioned null space bases resulting in a smaller update set. The Newton
steps are computed in the null space by solving sparse, symmetric positive
definite systems with sparse Cholesky factorizations. By using the constant
structure of the null space system matrices, a single symbolic factorization in
the Cholesky decomposition is used multiple times, reducing the computational
cost of linear solves. The algorithms and analyses are validated using medium
to large-scale water network models.Comment: 15 pages, 9 figures, Preprint extension of Abraham and Stoianov, 2015
(https://dx.doi.org/10.1061/(ASCE)HY.1943-7900.0001089), September 2015.
Includes extended exposition, additional case studies and new simulations and
analysi
A Novel Partitioning Method for Accelerating the Block Cimmino Algorithm
We propose a novel block-row partitioning method in order to improve the
convergence rate of the block Cimmino algorithm for solving general sparse
linear systems of equations. The convergence rate of the block Cimmino
algorithm depends on the orthogonality among the block rows obtained by the
partitioning method. The proposed method takes numerical orthogonality among
block rows into account by proposing a row inner-product graph model of the
coefficient matrix. In the graph partitioning formulation defined on this graph
model, the partitioning objective of minimizing the cutsize directly
corresponds to minimizing the sum of inter-block inner products between block
rows thus leading to an improvement in the eigenvalue spectrum of the iteration
matrix. This in turn leads to a significant reduction in the number of
iterations required for convergence. Extensive experiments conducted on a large
set of matrices confirm the validity of the proposed method against a
state-of-the-art method
Generalized Rybicki Press algorithm
This article discusses a more general and numerically stable Rybicki Press
algorithm, which enables inverting and computing determinants of covariance
matrices, whose elements are sums of exponentials. The algorithm is true in
exact arithmetic and relies on introducing new variables and corresponding
equations, thereby converting the matrix into a banded matrix of larger size.
Linear complexity banded algorithms for solving linear systems and computing
determinants on the larger matrix enable linear complexity algorithms for the
initial semi-separable matrix as well. Benchmarks provided illustrate the
linear scaling of the algorithm.Comment: 13 pages, 11 figures, 1 tabl
- …