Search CORE

57 research outputs found

An efficient rounding boundary test for pow(x,y) in double precision

Author: Lauter Christoph
Lefèvre Vincent
Publication venue: HAL CCSD
Publication date: 04/09/2007
Field of study

18 pagesThe correct rounding of the function pow: (x,y) -> x^y is currently based on Ziv's iterative approximation process. In order to ensure its termination, cases when x^y falls on a rounding boundary must be filtered out. Such rounding boundaries are floating-point numbers and midpoints between two consecutive floating-point numbers. Detecting rounding boundaries for pow is a difficult problem. Previous approaches use repeated square root extraction followed by repeated square and multiply. This article presents a new rounding boundary test for pow in double precision which resumes to a few comparisons with pre-computed constants. These constants are deduced from worst cases for the Table Maker's Dilemma, searched over a small subset of the input domain. This is a novel use of such worst-case bounds. The resulting algorithm has been designed for a fast-on-average correctly rounded implementation of pow, considering the scarcity of rounding boundary cases. It does not stall average computations for rounding boundary detection. The article includes its correction proof and experimental results

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Application benchmark results for Big Red, an IBM e1350 BladeCenter Cluster

Author: Aiken Ross
Hancock David Y.
Jurenz Matthias
Lieber Matthias
Link Matthew R.
McCaulay D. Scott
Mueller Matthias S.
Pierce Marlon
Plale Beth A.
Rodgers Greg
Saied Faisal
Stewart Craig A.
Tillotson Jenett
Turner George
Wang Peng
Publication venue
Publication date: 01/01/2009
Field of study

The purpose of this report is to present the results of benchmark tests with Big Red, an IBM e1350 BladeCenter Cluster. This report is particularly focused on providing details of system architecture and test run results in detail to allow for analysis in other reports and comparison with other systems, rather than presenting such analysis here

IUScholarWorks (University of Indiana)

Parallel and Distributed Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

Directory of Open Access Books (DOAB)

Multithreaded Processor Core Optimized for Parallel Thread Execution

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Multithreaded Processor Core Optimized for Parallel Thread Execution

Author
Publication venue
Publication date
Field of study

Design and implementation of an out of order execution engine of floating point arithmetic operations

Author: Ramírez Lazo Cristóbal
Publication venue: Universitat Politècnica de Catalunya
Publication date: 04/02/2016
Field of study

In this thesis, work is undertaken towards the design in hardware description languages and implementation in FPGA of an out of order execution engine of floating point arithmetic operations. This thesis work, is part of a project called Lagarto

UPCommons. Portal del coneixement obert de la UPC

Scalable system software for high performance large-scale applications

Author: Morari Alessadro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

In the last decades, high-performance large-scale systems have been a fundamental tool for scientific discovery and engineering advances. The sustained growth of supercomputing performance and the concurrent reduction in cost have made this technology available for a large number of scientists and engineers working on many different problems. The design of next-generation supercomputers will include traditional HPC requirements as well as the new requirements to handle data-intensive computations. Data intensive applications will hence play an important role in a variety of fields, and are the current focus of several research trends in HPC. Due to the challenges of scalability and power efficiency, next-generation of supercomputers needs a redesign of the whole software stack. Being at the bottom of the software stack, system software is expected to change drastically to support the upcoming hardware and to meet new application requirements. This PhD thesis addresses the scalability of system software. The thesis start at the Operating System level: first studying general-purpose OS (ex. Linux) and then studying lightweight kernels (ex. CNK). Then, we focus on the runtime system: we implement a runtime system for distributed memory systems that includes many of the system services required by next-generation applications. Finally we focus on hardware features that can be exploited at user-level to improve applications performance, and potentially included into our advanced runtime system. The thesis contributions are the following: Operating System Scalability: We provide an accurate study of the scalability problems of modern Operating Systems for HPC. We design and implement a methodology whereby detailed quantitative information may be obtained for each OS noise event. We validate our approach by comparing it to other well-known standard techniques to analyze OS noise, such FTQ (Fixed Time Quantum). Evaluation of the address translation management for a lightweight kernel: we provide a performance evaluation of different TLB management approaches ¿ dynamic memory mapping, static memory mapping with replaceable TLB entries, and static memory mapping with fixed TLB entries (no TLB misses) on a IBM BlueGene/P system. Runtime System Scalability: We show that a runtime system can efficiently incorporate system services and improve scalability for a specific class of applications. We design and implement a full-featured runtime system and programming model to execute irregular appli- cations on a commodity cluster. The runtime library is called Global Memory and Threading library (GMT) and integrates a locality-aware Partitioned Global Address Space communication model with a fork/join program structure. It supports massive lightweight multi-threading, overlapping of communication and computation and small messages aggregation to tolerate network latencies. We compare GMT to other PGAS models, hand-optimized MPI code and custom architectures (Cray XMT) on a set of large scale irregular applications: breadth first search, random walk and concurrent hash map access. Our runtime system shows performance orders of magnitude higher than other solutions on commodity clusters and competitive with custom architectures. User-level Scalability Exploiting Hardware Features: We show the high complexity of low-level hardware optimizations for single applications, as a motivation to incorporate this logic into an adaptive runtime system. We evaluate the effects of controllable hardware-thread priority mechanism that controls the rate at which each hardware-thread decodes instruction on IBM POWER5 and POWER6 processors. Finally, we show how to effectively exploits cache locality and network-on-chip on the Tilera many-core architecture to improve intra-core scalability

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura