11,084 research outputs found

    Algorithmic patterns for H\mathcal{H}-matrices on many-core processors

    Get PDF
    In this work, we consider the reformulation of hierarchical (H\mathcal{H}) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). H\mathcal{H} matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of H\mathcal{H} matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing H\mathcal{H} matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full H\mathcal{H} matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source H\mathcal{H} matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard H\mathcal{H} matrix library, highlighting profound speedups of our many-core parallel approach

    Enhanced LFR-toolbox for MATLAB and LFT-based gain scheduling

    Get PDF
    We describe recent developments and enhancements of the LFR-Toolbox for MATLAB for building LFT-based uncertainty models and for LFT-based gain scheduling. A major development is the new LFT-object definition supporting a large class of uncertainty descriptions: continuous- and discrete-time uncertain models, regular and singular parametric expressions, more general uncertainty blocks (nonlinear, time-varying, etc.). By associating names to uncertainty blocks the reusability of generated LFT-models and the user friendliness of manipulation of LFR-descriptions have been highly increased. Significant enhancements of the computational efficiency and of numerical accuracy have been achieved by employing efficient and numerically robust Fortran implementations of order reduction tools via mex-function interfaces. The new enhancements in conjunction with improved symbolical preprocessing lead generally to a faster generation of LFT-models with significantly lower orders. Scheduled gains can be viewed as LFT-objects. Two techniques for designing such gains are presented. Analysis tools are also considered

    The Role of Representations in Executive Function: Investigating a Developmental Link between Flexibility and Abstraction.

    Get PDF
    Young children often perseverate, engaging in previously correct, but no longer appropriate behaviors. One account posits that such perseveration results from the use of stimulus-specific representations of a situation, which are distinct from abstract, generalizable representations that support flexible behavior. Previous findings supported this account, demonstrating that only children who flexibly switch between rules could generalize their behavior to novel stimuli. However, this link between flexibility and generalization might reflect general cognitive abilities, or depend upon similarities across the measures or their temporal order. The current work examined these issues by testing the specificity and generality of this link. In two experiments with 3-year-old children, flexibility was measured in terms of switching between rules in a card-sorting task, while abstraction was measured in terms of selecting which stimulus did not belong in an odd-one-out task. The link between flexibility and abstraction was general across (1) abstraction dimensions similar to or different from those in the card-sorting task and (2) abstraction tasks that preceded or followed the switching task. Good performance on abstraction and flexibility measures did not extend to all cognitive tasks, including an IQ measure, and dissociated from children's ability to gaze at the correct stimulus in the odd-one-out task, suggesting that the link between flexibility and abstraction is specific to such measures, rather than reflecting general abilities that affect all tasks. We interpret these results in terms of the role that developing prefrontal cortical regions play in processes such as working memory, which can support both flexibility and abstraction

    Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures

    Get PDF
    A new solver featuring time-space adaptation and error control has been recently introduced to tackle the numerical solution of stiff reaction-diffusion systems. Based on operator splitting, finite volume adaptive multiresolution and high order time integrators with specific stability properties for each operator, this strategy yields high computational efficiency for large multidimensional computations on standard architectures such as powerful workstations. However, the data structure of the original implementation, based on trees of pointers, provides limited opportunities for efficiency enhancements, while posing serious challenges in terms of parallel programming and load balancing. The present contribution proposes a new implementation of the whole set of numerical methods including Radau5 and ROCK4, relying on a fully different data structure together with the use of a specific library, TBB, for shared-memory, task-based parallelism with work-stealing. The performance of our implementation is assessed in a series of test-cases of increasing difficulty in two and three dimensions on multi-core and many-core architectures, demonstrating high scalability

    A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

    Get PDF
    In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

    A permanent formula for the Jones polynomial

    Get PDF
    The permanent of a square matrix is defined in a way similar to the determinant, but without using signs. The exact computation of the permanent is hard, but there are Monte-Carlo algorithms that can estimate general permanents. Given a planar diagram of a link L with nn crossings, we define a 7n by 7n matrix whose permanent equals to the Jones polynomial of L. This result accompanied with recent work of Freedman, Kitaev, Larson and Wang provides a Monte-Carlo algorithm to any decision problem belonging to the class BQP, i.e. such that it can be computed with bounded error in polynomial time using quantum resources.Comment: To appear in Advances in Applied Mathematic
    corecore