Search CORE

497 research outputs found

Factory: A n Object-Oriented Parallel Programming Substrate for Deep Multiprocessors

Author: Schneider Scott Arthur
Publication venue: W&M ScholarWorks
Publication date: 01/01/2005
Field of study

Exploring coordinated software and hardware support for hardware resource allocation

Author: Figueiredo Boneti Carlos Santieri de
Publication venue: Universitat Politècnica de Catalunya
Publication date: 04/09/2009
Field of study

Multithreaded processors are now common in the industry as they offer high performance at a low cost. Traditionally, in such processors, the assignation of hardware resources between the multiple threads is done implicitly, by the hardware policies. However, a new class of multithreaded hardware allows the explicit allocation of resources to be controlled or biased by the software. Currently, there is little or no coordination between the allocation of resources done by the hardware and the prioritization of tasks done by the software.This thesis targets to narrow the gap between the software and the hardware, with respect to the hardware resource allocation, by proposing a new explicit resource allocation hardware mechanism and novel schedulers that use the currently available hardware resource allocation mechanisms.It approaches the problem in two different types of computing systems: on the high performance computing domain, we characterize the first processor to present a mechanism that allows the software to bias the allocation hardware resources, the IBM POWER5. In addition, we propose the use of hardware resource allocation as a way to balance high performance computing applications. Finally, we propose two new scheduling mechanisms that are able to transparently and successfully balance applications in real systems using the hardware resource allocation. On the soft real-time domain, we propose a hardware extension to the existing explicit resource allocation hardware and, in addition, two software schedulers that use the explicit allocation hardware to improve the schedulability of tasks in a soft real-time system.In this thesis, we demonstrate that system performance improves by making the software aware of the mechanisms to control the amount of resources given to each running thread. In particular, for the high performance computing domain, we show that it is possible to decrease the execution time of MPI applications biasing the hardware resource assignation between threads. In addition, we show that it is possible to decrease the number of missed deadlines when scheduling tasks in a soft real-time SMT system.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Architectural support for real-time task scheduling in SMT processors

Author: Cazorla Almeida Francisco Javier
Fernández Enrique
Knijnenburg Peter M.W.
Ramírez Bellido Alejandro
Sakellariou Rizos
Valero Cortés Mateo
Publication venue
Publication date: 01/01/2005
Field of study

In Simultaneous Multithreaded (SMT) architectures most hardware resources are shared between threads. This provides a good cost/performance trade-off which renders these architectures suitable for use in embedded systems. However, since threads share many resources, like caches, they also interfere with each other. As a result, execution times of applications become highly unpredictable and highly dependent on the context in which an application is executed. Obviously, this poses problems if an SMT is to be used in a (soft) real time system. In this paper, we propose two novel hardware mechanisms that can be used to reduce this performance variability. In contrast to previous approaches, our proposed mechanisms do not need any information beyond the information already known by traditional job schedulers. Neither do they require extensive profiling of workloads to determine optimal schedules. Our mechanisms are based on dynamic resource partitioning. The OS level job scheduler needs to be slightly adapted in order to provide the hardware resource allocator some information on how this resource partitioning needs to be done. We show that our mechanisms provide high stability for SMT architectures to be used in real time systems: the real time benchmarks we used meet their deadlines in more than 98% of the cases considered while the other thread in the workload still achieves high throughput.Postprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

International Migration, Integration and Social Cohesion online publications

Reconfigurable Instruction Cell Architecture Reconfiguration and Interconnects

Author: Nousias Ioannis
Publication venue: The University of Edinburgh
Publication date: 01/01/2009
Field of study

Edinburgh Research Archive

Exploring coordinated software and hardware support for hardware resource allocation

Author: Figueiredo Boneti Carlos Santieri de
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Economic Evaluation of New Technologies and Promotions in the Australian Sheep and Wool Industries

Author: Mounter Stuart W.
Publication venue
Publication date
Field of study

Knowledge about the size and distribution of returns from alternative broad types of R&D and promotion investments permit strategic-level decisions about resource allocation, both within and across research programs. The Australian sheep meat and wool industries are characterised by strong cross-commodity relationships due to the joint product nature of the industries. An equilibrium displacement model of the Australian sheep meat and wool industries was developed to account for these relationships and any indirect benefits and costs arising from spill-over and feedback effects between the industries as a result of research-induced innovation or promotion. The potential annual returns and their distribution among the various industry sectors were estimated from different hypothetical investment scenarios to demonstrate the model's relevance to R&D and promotion policy and decision-making.Australian sheep and wool industries, equilibrium displacement model, cross-commodity relationships, R&D and promotion evaluation, Livestock Production/Industries,

Research Papers in Economics

Simultaneous Branch and Warp Interweaving for Sustained GPU Performance

Author: Brunie Nicolas
Collange Caroline
Diamos Gregory
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2011
Field of study

International audienceSingle-Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into units, referred to as warps, to amortize the cost of instruction fetch, decode and control logic over multiple execution units. As individual threads take divergent execution paths, their processing takes place sequentially, defeating part of the efficiency advantage of SIMD execution. We present two complementary techniques that mitigate the impact of thread divergence on SIMT micro-architectures. Both techniques relax the SIMD execution model by allowing two distinct instructions to be scheduled to disjoint subsets of the the same row of execution units, instead of one single instruction. They increase flexibility by providing more thread grouping opportunities than SIMD, while preserving the affinity between threads to avoid introducing extra memory divergence. We consider (1) co-issuing instructions from different divergent paths of the same warp and (2) co-issuing instructions from different warps. To support (1), we introduce a novel thread reconvergence technique that ensures threads are run back in lockstep at control-flow reconvergence points without hindering their ability to run branches in parallel. We propose a lane shuffling technique to allow solution (2) to benefit from inter-warp correlations in divergence patterns. The combination of all these techniques improves performance by 23% on a set of regular GPGPU applications and by 40% on irregular applications, while maintaining the same instruction-fetch and processing-unit resource requirements as the contemporary Fermi GPU architecture

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Implicitly-multithreaded processors

Author: Falsafi Babak
Park Il
Vijaykumar T. N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/04/2009
Field of study

This paper proposes the Implicitly-MultiThreaded (IMT) architecture to execute compiler-specified speculative threads on to a modified Simultaneous Multithreading pipeline. IMT reduces hardware complexity by relying on the compiler to select suitable thread spawning points and orchestrate inter-thread register communication. To enhance IMT's effectiveness, this paper proposes three novel microarchitectural mechanisms: (1) resource- and dependence-based fetch policy to fetch and execute suitable instructions, (2) context multiplexing to improve utilization and map as many threads to a single context as allowed by availability of resources, and (3) early thread-invocation to hide thread start-up overhead by overlapping one thread's invocation with other threads' execution. We use SPEC2K benchmarks and cycle-accurate simulation to show that an microarchitecture-optimized IMT improves performance on average by 24% and at best by 69% over an aggressive superscalar. We also compare IMT to two prior proposals, TME and DMT, for speculative threading on an SMT using hardware-extracted threads. Our best IMT design outperforms a comparable TME and DMT on average by 26% and 38% respectively

Infoscience - École polytechnique fédérale de Lausanne