8,198 research outputs found

    MPI+X: task-based parallelization and dynamic load balance of finite element assembly

    Get PDF
    The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

    HoPP: Robust and Resilient Publish-Subscribe for an Information-Centric Internet of Things

    Full text link
    This paper revisits NDN deployment in the IoT with a special focus on the interaction of sensors and actuators. Such scenarios require high responsiveness and limited control state at the constrained nodes. We argue that the NDN request-response pattern which prevents data push is vital for IoT networks. We contribute HoP-and-Pull (HoPP), a robust publish-subscribe scheme for typical IoT scenarios that targets IoT networks consisting of hundreds of resource constrained devices at intermittent connectivity. Our approach limits the FIB tables to a minimum and naturally supports mobility, temporary network partitioning, data aggregation and near real-time reactivity. We experimentally evaluate the protocol in a real-world deployment using the IoT-Lab testbed with varying numbers of constrained devices, each wirelessly interconnected via IEEE 802.15.4 LowPANs. Implementations are built on CCN-lite with RIOT and support experiments using various single- and multi-hop scenarios

    Seismic Analysis Capability in NASTRAN

    Get PDF
    Seismic analysis is a technique which pertains to loading described in terms of boundary accelerations. Earthquake shocks to buildings is the type of excitation which usually comes to mind when one hears the word seismic, but this technique also applied to a broad class of acceleration excitations which are applied at the base of a structure such as vibration shaker testing or shocks to machinery foundations. Four different solution paths are available in NASTRAN for seismic analysis. They are: Direct Seismic Frequency Response, Direct Seismic Transient Response, Modal Seismic Frequency Response, and Modal Seismic Transient Response. This capability, at present, is invoked not as separate rigid formats, but as pre-packaged ALTER packets to existing RIGID Formats 8, 9, 11, and 12. These ALTER packets are included with the delivery of the NASTRAN program and are stored on the computer as a library of callable utilities. The user calls one of these utilities and merges it into the Executive Control Section of the data deck to perform any of the four options are invoked by setting parameter values in the bulk data

    Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems

    Get PDF
    International audienceTo face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability and effectiveness of runtime systems based on the Sequential Task Flow model for complex applications , namely, sparse matrix multifrontal factorizations which feature extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Most importantly, it shows how this parallel programming model eases the development of complex features that benefit the performance of sparse, direct solvers as well as their memory consumption. We illustrate our discussion with the multifrontal QR factorization running on top of the StarPU runtime system. ACM Reference Format: Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche and Florent Lopez, 2014. Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime system
    • …
    corecore