27,328 research outputs found
A novel approach to integration by parts reduction
Integration by parts reduction is a standard component of most modern
multi-loop calculations in quantum field theory. We present a novel strategy
constructed to overcome the limitations of currently available reduction
programs based on Laporta's algorithm. The key idea is to construct algebraic
identities from numerical samples obtained from reductions over finite fields.
We expect the method to be highly amenable to parallelization, show a low
memory footprint during the reduction step, and allow for significantly better
run-times.Comment: 4 pages. Version 2 is the final, published version of this articl
Novel Parallelization Techniques for Computer Graphics Applications
Increasingly complex and data-intensive algorithms in computer graphics applications require software engineers to find ways of improving performance and scalability to satisfy the requirements of customers and users. Parallelizing and tailoring each algorithm of each specific application is a time-consuming task and its implementation is domain-specific because it can not be reused outside the specific problem in which the algorithm is defined. Identifying reusable parallelization patterns that can be extrapolated and applied to other different algorithms is an essential task needed in order to provide consistent parallelization improvements and reduce the development time of evolving a sequential algorithm into a parallel one.
This thesis focuses on defining general and efficient parallelization techniques and approaches that can be followed in order to parallelize complex 3D graphic algorithms. These parallelization patterns can be easily applied in order to convert most kinds of sequential complex and data-intensive algorithms to parallel ones obtaining consistent optimization results.
The main idea in the thesis is to use multi-threading techniques to improve the parallelization and core utilization of 3D algorithms. Most of the 3D algorithms apply similar repetitive independent operations on a vast amount of 3D data. These application characteristics bring the opportunity of applying multi-thread parallelization techniques on such applications. The efficiency of the proposed idea is tested on two common computer graphics algorithms: hidden-line removal and collision detection. Both algorithms are data-intensive algorithms, whose conversions from a sequential to a multithread implementation introduce challenges, due to their complexities and the fact that elements in their data have different sizes and complexities, producing work-load imbalances and asymmetries between processing elements.
The results show that the proposed principles and patterns can be easily applied to both algorithms, transforming their sequential to multithread implementations, obtaining consistent optimization results proportional to the number of processing elements. From the work done in this thesis, it is concluded that the suggested parallelization warrants further study and development in order to extend its usage to heterogeneous platforms such as a Graphical Processing Unit (GPU). OpenCL is the most feasible framework to explore in the future due to its interoperability among different platforms
Parallelization of Modular Algorithms
In this paper we investigate the parallelization of two modular algorithms.
In fact, we consider the modular computation of Gr\"obner bases (resp. standard
bases) and the modular computation of the associated primes of a
zero-dimensional ideal and describe their parallel implementation in SINGULAR.
Our modular algorithms to solve problems over Q mainly consist of three parts,
solving the problem modulo p for several primes p, lifting the result to Q by
applying Chinese remainder resp. rational reconstruction, and a part of
verification. Arnold proved using the Hilbert function that the verification
part in the modular algorithm to compute Gr\"obner bases can be simplified for
homogeneous ideals (cf. \cite{A03}). The idea of the proof could easily be
adapted to the local case, i.e. for local orderings and not necessarily
homogeneous ideals, using the Hilbert-Samuel function (cf. \cite{Pf07}). In
this paper we prove the corresponding theorem for non-homogeneous ideals in
case of a global ordering.Comment: 16 page
Towards parallelizable sampling-based Nonlinear Model Predictive Control
This paper proposes a new sampling-based nonlinear model predictive control
(MPC) algorithm, with a bound on complexity quadratic in the prediction horizon
N and linear in the number of samples. The idea of the proposed algorithm is to
use the sequence of predicted inputs from the previous time step as a warm
start, and to iteratively update this sequence by changing its elements one by
one, starting from the last predicted input and ending with the first predicted
input. This strategy, which resembles the dynamic programming principle, allows
for parallelization up to a certain level and yields a suboptimal nonlinear MPC
algorithm with guaranteed recursive feasibility, stability and improved cost
function at every iteration, which is suitable for real-time implementation.
The complexity of the algorithm per each time step in the prediction horizon
depends only on the horizon, the number of samples and parallel threads, and it
is independent of the measured system state. Comparisons with the fmincon
nonlinear optimization solver on benchmark examples indicate that as the
simulation time progresses, the proposed algorithm converges rapidly to the
"optimal" solution, even when using a small number of samples.Comment: 9 pages, 9 pictures, submitted to IFAC World Congress 201
Improved parallelization techniques for the density matrix renormalization group
A distributed-memory parallelization strategy for the density matrix
renormalization group is proposed for cases where correlation functions are
required. This new strategy has substantial improvements with respect to
previous works. A scalability analysis shows an overall serial fraction of 9.4%
and an efficiency of around 60% considering up to eight nodes. Sources of
possible parallel slowdown are pointed out and solutions to circumvent these
issues are brought forward in order to achieve a better performance.Comment: 8 pages, 4 figures; version published in Computer Physics
Communication
- …