11,084 research outputs found
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
Enhanced LFR-toolbox for MATLAB and LFT-based gain scheduling
We describe recent developments and enhancements of the LFR-Toolbox for MATLAB for building LFT-based uncertainty models and for LFT-based gain scheduling. A major development is the new LFT-object definition supporting a large class of uncertainty descriptions: continuous- and discrete-time uncertain models, regular and singular parametric expressions, more general uncertainty blocks (nonlinear, time-varying, etc.). By associating names to uncertainty blocks the reusability of generated LFT-models and the user friendliness of manipulation of LFR-descriptions have been highly increased. Significant enhancements of the computational efficiency and of numerical accuracy have been achieved by employing efficient and numerically robust Fortran implementations of order reduction tools via mex-function interfaces. The new enhancements in conjunction with improved symbolical preprocessing lead generally to a faster generation of LFT-models with significantly lower orders. Scheduled gains can be viewed as LFT-objects. Two techniques for designing such gains are presented. Analysis tools are also considered
The Role of Representations in Executive Function: Investigating a Developmental Link between Flexibility and Abstraction.
Young children often perseverate, engaging in previously correct, but no longer appropriate behaviors. One account posits that such perseveration results from the use of stimulus-specific representations of a situation, which are distinct from abstract, generalizable representations that support flexible behavior. Previous findings supported this account, demonstrating that only children who flexibly switch between rules could generalize their behavior to novel stimuli. However, this link between flexibility and generalization might reflect general cognitive abilities, or depend upon similarities across the measures or their temporal order. The current work examined these issues by testing the specificity and generality of this link. In two experiments with 3-year-old children, flexibility was measured in terms of switching between rules in a card-sorting task, while abstraction was measured in terms of selecting which stimulus did not belong in an odd-one-out task. The link between flexibility and abstraction was general across (1) abstraction dimensions similar to or different from those in the card-sorting task and (2) abstraction tasks that preceded or followed the switching task. Good performance on abstraction and flexibility measures did not extend to all cognitive tasks, including an IQ measure, and dissociated from children's ability to gaze at the correct stimulus in the odd-one-out task, suggesting that the link between flexibility and abstraction is specific to such measures, rather than reflecting general abilities that affect all tasks. We interpret these results in terms of the role that developing prefrontal cortical regions play in processes such as working memory, which can support both flexibility and abstraction
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
A permanent formula for the Jones polynomial
The permanent of a square matrix is defined in a way similar to the
determinant, but without using signs. The exact computation of the permanent is
hard, but there are Monte-Carlo algorithms that can estimate general
permanents. Given a planar diagram of a link L with crossings, we define a
7n by 7n matrix whose permanent equals to the Jones polynomial of L. This
result accompanied with recent work of Freedman, Kitaev, Larson and Wang
provides a Monte-Carlo algorithm to any decision problem belonging to the class
BQP, i.e. such that it can be computed with bounded error in polynomial time
using quantum resources.Comment: To appear in Advances in Applied Mathematic
- …