Search CORE

197 research outputs found

Task-based Runtime Optimizations Towards High Performance Computing Applications

Author: Cao Qinglei
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2022
Field of study

The last decades have witnessed a rapid improvement of computational capabilities in high-performance computing (HPC) platforms thanks to hardware technology scaling. HPC architectures benefit from mainstream advances on the hardware with many-core systems, deep hierarchical memory subsystem, non-uniform memory access, and an ever-increasing gap between computational power and memory bandwidth. This has necessitated continuous adaptations across the software stack to maintain high hardware utilization. In this HPC landscape of potentially million-way parallelism, task-based programming models associated with dynamic runtime systems are becoming more popular, which fosters developers’ productivity at extreme scale by abstracting the underlying hardware complexity. In this context, this dissertation highlights how a software bundle powered by a task-based programming model can address the heterogeneous workloads engendered by HPC applications., i.e., data redistribution, geospatial modeling and 3D unstructured mesh deformation here. Data redistribution aims to reshuffle data to optimize some objective for an algorithm, whose objective can be multi-dimensional, such as improving computational load balance or decreasing communication volume or cost, with the ultimate goal of increasing the efficiency and therefore reducing the time-to-solution for the algorithm. Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Meshing the deformable contour of moving 3D bodies is an expensive operation that can cause huge computational challenges in fluid-structure interaction (FSI) applications. Therefore, in this dissertation, Redistribute-PaRSEC, ExaGeoStat-PaRSEC and HiCMA-PaRSEC are proposed to efficiently tackle these HPC applications respectively at extreme scale, and they are evaluated on multiple HPC clusters, including AMD-based, Intel-based, Arm-based CPU systems and IBM-based multi-GPU system. This multidisciplinary work emphasizes the need for runtime systems to go beyond their primary responsibility of task scheduling on massively parallel hardware system for servicing the next-generation scientific applications

University of Tennessee, Knoxville: Trace

GPU LSM: A Dynamic Dictionary Data Structure for the GPU

Author: Amenta Nina
Ashkiani Saman
Farach-Colton Martin
Li Shengren
Owens John D.
Publication venue
Publication date: 01/01/2017
Field of study

We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS'18

arXiv.org e-Print Archive

eScholarship - University of California

Content addressable memory: design and usage for general purpose computing

Author: Blair Gerard M.
Publication venue: The University of Edinburgh
Publication date: 01/01/1986
Field of study

Edinburgh Research Archive

Solving Pallet loading Problem with Real-World Constraints

Author: Kristović Pietro
Vidaković Josip
Šekoranja Bojan
Šuligoj Filip
Švaco Marko
Publication venue
Publication date: 21/07/2023
Field of study

Efficient cargo packing and transport unit stacking play a vital role in enhancing logistics efficiency and reducing costs in the field of logistics. This article focuses on the challenging problem of loading transport units onto pallets, which belongs to the class of NP-hard problems. We propose a novel method for solving the pallet loading problem using a branch and bound algorithm, where there is a loading order of transport units. The derived algorithm considers only a heuristically favourable subset of possible positions of the transport units, which has a positive effect on computability. Furthermore, it is ensured that the pallet configuration meets real-world constraints, such as the stability of the position of transport units under the influence of transport inertial forces and gravity.Comment: 8 pages, 1 figure, project report pape

arXiv.org e-Print Archive

PRODUCTION SEQUENCING AND STABILITY ANALYSIS OF A JUST-IN-TIME SYSTEM WITH SEQUENCE DEPENDENT SETUPS

Author: Henninger John Thomas
Publication venue: UKnowledge
Publication date: 01/01/2009
Field of study

Just-In-Time (JIT) production systems is a popular area for researchers but real-world issues such as sequence dependent setups are often overlooked. This research investigates an approach for determining stability and an approach for mixed product sequencing in production systems with sequence dependent setups and buffer thresholds which signal replenishment of a given buffer. Production systems in this research operate under JIT pull production principles by producing only when demand exists and idle when no demand exists. In the first approach, an iterative method is presented to determine stability for a multi-product production system that operates with replenishment signals and may have sequence dependent setups. In this method, a network of nodes representing machine states and arcs representing the buffer inventory levels is used to find a stable trajectory for the production system via an iterative procedure. The method determines suitable buffer levels for the production system that ensure that a trajectory originating from any point within a buffer region will always map to a point contained on another buffer region for all future mappings. This iterative method for determining the stability of a production system was implemented using an algorithm to calculate the buffer inventory regions for all arcs in a given arc-node network. The algorithm showed favorable results for two and three product systems in which sequence dependent setups may exist. In the second approach, a product sequencing algorithm determines a product sequence for a production system based on system parameters – setup times, buffer levels, usage rates, production rates, etc. The algorithm selects a product by evaluating the goodness of each product that has reached the replenishment threshold at the current time. The algorithm also incorporates a lookahead function that calculates the goodness for some time interval into the future. The lookahead function considers all branches of the tree of potential sequences to prevent the sequence from travelling down a dead-end branch in which the system will be unable to avoid a depleted buffer. The sequencing algorithm allows the user to weight the five terms of the goodness equations (current and lookahead) to control the behavior of the sequence

University of Kentucky

Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis and computer science

Author
Publication venue
Publication date
Field of study

This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science during the period April l, 1988 through September 30, 1988

NASA Technical Reports Server

Streaming Verification of Graph Properties

Author: Abdullah Amirali
Daruki Samira
Roy Chitradeep Dutta
Venkatasubramanian Suresh
Publication venue
Publication date: 01/01/2016
Field of study

Streaming interactive proofs (SIPs) are a framework for outsourced computation. A computationally limited streaming client (the verifier) hands over a large data set to an untrusted server (the prover) in the cloud and the two parties run a protocol to confirm the correctness of result with high probability. SIPs are particularly interesting for problems that are hard to solve (or even approximate) well in a streaming setting. The most notable of these problems is finding maximum matchings, which has received intense interest in recent years but has strong lower bounds even for constant factor approximations. In this paper, we present efficient streaming interactive proofs that can verify maximum matchings exactly. Our results cover all flavors of matchings (bipartite/non-bipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP. In particular, these are the first efficient results for weighted matchings and for metric TSP in any streaming verification model.Comment: 26 pages, 2 figure, 1 tabl

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

QUANTUM COMPUTING AND HPC TECHNIQUES FOR SOLVING MICRORHEOLOGY AND DIMENSIONALITY REDUCTION PROBLEMS

Author: Orts Gómez Francisco José
Publication venue
Publication date: 23/09/2021
Field of study

Tesis doctoral en período de exposición públicaDoctorado en Informática (RD99/11)(8908

Repositorio Institucional de la Universidad de Almería (Spain)