197 research outputs found
Task-based Runtime Optimizations Towards High Performance Computing Applications
The last decades have witnessed a rapid improvement of computational capabilities in high-performance computing (HPC) platforms thanks to hardware technology scaling. HPC architectures benefit from mainstream advances on the hardware with many-core systems, deep hierarchical memory subsystem, non-uniform memory access, and an ever-increasing gap between computational power and memory bandwidth. This has necessitated continuous adaptations across the software stack to maintain high hardware utilization. In this HPC landscape of potentially million-way parallelism, task-based programming models associated with dynamic runtime systems are becoming more popular, which fosters developers’ productivity at extreme scale by abstracting the underlying hardware complexity.
In this context, this dissertation highlights how a software bundle powered by a task-based programming model can address the heterogeneous workloads engendered by HPC applications., i.e., data redistribution, geospatial modeling and 3D unstructured mesh deformation here. Data redistribution aims to reshuffle data to optimize some objective for an algorithm, whose objective can be multi-dimensional, such as improving computational load balance or decreasing communication volume or cost, with the ultimate goal of increasing the efficiency and therefore reducing the time-to-solution for the algorithm. Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Meshing the deformable contour of moving 3D bodies is an expensive operation that can cause huge computational challenges in fluid-structure interaction (FSI) applications. Therefore, in this dissertation, Redistribute-PaRSEC, ExaGeoStat-PaRSEC and HiCMA-PaRSEC are proposed to efficiently tackle these HPC applications respectively at extreme scale, and they are evaluated on multiple HPC clusters, including AMD-based, Intel-based, Arm-based CPU systems and IBM-based multi-GPU system. This multidisciplinary work emphasizes the need for runtime systems to go beyond their primary responsibility of task scheduling on massively parallel hardware system for servicing the next-generation scientific applications
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
Solving Pallet loading Problem with Real-World Constraints
Efficient cargo packing and transport unit stacking play a vital role in
enhancing logistics efficiency and reducing costs in the field of logistics.
This article focuses on the challenging problem of loading transport units onto
pallets, which belongs to the class of NP-hard problems. We propose a novel
method for solving the pallet loading problem using a branch and bound
algorithm, where there is a loading order of transport units. The derived
algorithm considers only a heuristically favourable subset of possible
positions of the transport units, which has a positive effect on computability.
Furthermore, it is ensured that the pallet configuration meets real-world
constraints, such as the stability of the position of transport units under the
influence of transport inertial forces and gravity.Comment: 8 pages, 1 figure, project report pape
PRODUCTION SEQUENCING AND STABILITY ANALYSIS OF A JUST-IN-TIME SYSTEM WITH SEQUENCE DEPENDENT SETUPS
Just-In-Time (JIT) production systems is a popular area for researchers but real-world issues such as sequence dependent setups are often overlooked. This research investigates an approach for determining stability and an approach for mixed product sequencing in production systems with sequence dependent setups and buffer thresholds which signal replenishment of a given buffer. Production systems in this research operate under JIT pull production principles by producing only when demand exists and idle when no demand exists.
In the first approach, an iterative method is presented to determine stability for a multi-product production system that operates with replenishment signals and may have sequence dependent setups. In this method, a network of nodes representing machine states and arcs representing the buffer inventory levels is used to find a stable trajectory for the production system via an iterative procedure. The method determines suitable buffer levels for the production system that ensure that a trajectory originating from any point within a buffer region will always map to a point contained on another buffer region for all future mappings.
This iterative method for determining the stability of a production system was implemented using an algorithm to calculate the buffer inventory regions for all arcs in a given arc-node network. The algorithm showed favorable results for two and three product systems in which sequence dependent setups may exist.
In the second approach, a product sequencing algorithm determines a product sequence for a production system based on system parameters – setup times, buffer levels, usage rates, production rates, etc. The algorithm selects a product by evaluating the goodness of each product that has reached the replenishment threshold at the current time. The algorithm also incorporates a lookahead function that calculates the goodness for some time interval into the future. The lookahead function considers all branches of the tree of potential sequences to prevent the sequence from travelling down a dead-end branch in which the system will be unable to avoid a depleted buffer. The sequencing algorithm allows the user to weight the five terms of the goodness equations (current and lookahead) to control the behavior of the sequence
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis and computer science
This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period April l, 1988 through September 30, 1988
Streaming Verification of Graph Properties
Streaming interactive proofs (SIPs) are a framework for outsourced
computation. A computationally limited streaming client (the verifier) hands
over a large data set to an untrusted server (the prover) in the cloud and the
two parties run a protocol to confirm the correctness of result with high
probability. SIPs are particularly interesting for problems that are hard to
solve (or even approximate) well in a streaming setting. The most notable of
these problems is finding maximum matchings, which has received intense
interest in recent years but has strong lower bounds even for constant factor
approximations.
In this paper, we present efficient streaming interactive proofs that can
verify maximum matchings exactly. Our results cover all flavors of matchings
(bipartite/non-bipartite and weighted). In addition, we also present streaming
verifiers for approximate metric TSP. In particular, these are the first
efficient results for weighted matchings and for metric TSP in any streaming
verification model.Comment: 26 pages, 2 figure, 1 tabl
QUANTUM COMPUTING AND HPC TECHNIQUES FOR SOLVING MICRORHEOLOGY AND DIMENSIONALITY REDUCTION PROBLEMS
Tesis doctoral en perÃodo de exposición públicaDoctorado en Informática (RD99/11)(8908
- …