8,198 research outputs found
MPI+X: task-based parallelization and dynamic load balance of finite element assembly
The main computing tasks of a finite element code(FE) for solving partial
differential equations (PDE's) are the algebraic system assembly and the
iterative solver. This work focuses on the first task, in the context of a
hybrid MPI+X paradigm. Although we will describe algorithms in the FE context,
a similar strategy can be straightforwardly applied to other discretization
methods, like the finite volume method. The matrix assembly consists of a loop
over the elements of the MPI partition to compute element matrices and
right-hand sides and their assemblies in the local system to each MPI
partition. In a MPI+X hybrid parallelism context, X has consisted traditionally
of loop parallelism using OpenMP. Several strategies have been proposed in the
literature to implement this loop parallelism, like coloring or substructuring
techniques to circumvent the race condition that appears when assembling the
element system into the local system. The main drawback of the first technique
is the decrease of the IPC due to bad spatial locality. The second technique
avoids this issue but requires extensive changes in the implementation, which
can be cumbersome when several element loops should be treated. We propose an
alternative, based on the task parallelism of the element loop using some
extensions to the OpenMP programming model. The taskification of the assembly
solves both aforementioned problems. In addition, dynamic load balance will be
applied using the DLB library, especially efficient in the presence of hybrid
meshes, where the relative costs of the different elements is impossible to
estimate a priori. This paper presents the proposed methodology, its
implementation and its validation through the solution of large computational
mechanics problems up to 16k cores
HoPP: Robust and Resilient Publish-Subscribe for an Information-Centric Internet of Things
This paper revisits NDN deployment in the IoT with a special focus on the
interaction of sensors and actuators. Such scenarios require high
responsiveness and limited control state at the constrained nodes. We argue
that the NDN request-response pattern which prevents data push is vital for IoT
networks. We contribute HoP-and-Pull (HoPP), a robust publish-subscribe scheme
for typical IoT scenarios that targets IoT networks consisting of hundreds of
resource constrained devices at intermittent connectivity. Our approach limits
the FIB tables to a minimum and naturally supports mobility, temporary network
partitioning, data aggregation and near real-time reactivity. We experimentally
evaluate the protocol in a real-world deployment using the IoT-Lab testbed with
varying numbers of constrained devices, each wirelessly interconnected via IEEE
802.15.4 LowPANs. Implementations are built on CCN-lite with RIOT and support
experiments using various single- and multi-hop scenarios
Seismic Analysis Capability in NASTRAN
Seismic analysis is a technique which pertains to loading described in terms of boundary accelerations. Earthquake shocks to buildings is the type of excitation which usually comes to mind when one hears the word seismic, but this technique also applied to a broad class of acceleration excitations which are applied at the base of a structure such as vibration shaker testing or shocks to machinery foundations. Four different solution paths are available in NASTRAN for seismic analysis. They are: Direct Seismic Frequency Response, Direct Seismic Transient Response, Modal Seismic Frequency Response, and Modal Seismic Transient Response. This capability, at present, is invoked not as separate rigid formats, but as pre-packaged ALTER packets to existing RIGID Formats 8, 9, 11, and 12. These ALTER packets are included with the delivery of the NASTRAN program and are stored on the computer as a library of callable utilities. The user calls one of these utilities and merges it into the Executive Control Section of the data deck to perform any of the four options are invoked by setting parameter values in the bulk data
Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems
International audienceTo face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability and effectiveness of runtime systems based on the Sequential Task Flow model for complex applications , namely, sparse matrix multifrontal factorizations which feature extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Most importantly, it shows how this parallel programming model eases the development of complex features that benefit the performance of sparse, direct solvers as well as their memory consumption. We illustrate our discussion with the multifrontal QR factorization running on top of the StarPU runtime system. ACM Reference Format: Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche and Florent Lopez, 2014. Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime system
- …