Search CORE

14 research outputs found

Enabling SMT for real-time embedded systems

Author: Cazorla Almeida Francisco Javier
Fernandez Prieto Enrique
Knijnenburg Peter M.W.
Ramírez Bellido Alejandro
Sakellariou Rizos
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

In order to deal with real time constraints, current embedded processors are usually simple in-order processors with no speculation capabilities to ensure that execution times of applications are predictable. However, embedded systems require ever more compute power and the trend is that they will become as complex as current high performance systems. SMTs are viable candidates for future high performance embedded processors, because of their good cost/performance trade-off. However, current SMTs exhibit unpredictable performance. Hence, the SMT hardware needs to be adapted in order to meet real time constraints. This paper is a first step toward the use of high performance SMT processors in future real time systems. We present a novel collaboration between OS and SMT processors that entails that the OS exercises control over how resources are shared inside the processor. We illustrate this collaboration by a mechanism in which the OS cooperates with the SMT hardware to guarantee that a given thread runs at a specific speed, enabling SMT for real-time systems.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

International Migration, Integration and Social Cohesion online publications

Efficient resources assignment schemes for clustered multithreaded processors

Author: Fernando Latorre
González Colás Antonio María
González González José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

New feature sizes provide larger number of transistors per chip that architects could use in order to further exploit instruction level parallelism. However, these technologies bring also new challenges that complicate conventional monolithic processor designs. On the one hand, exploiting instruction level parallelism is leading us to diminishing returns and therefore exploiting other sources of parallelism like thread level parallelism is needed in order to keep raising performance with a reasonable hardware complexity. On the other hand, clustering architectures have been widely studied in order to reduce the inherent complexity of current monolithic processors. This paper studies the synergies and trade-offs between two concepts, clustering and simultaneous multithreading (SMT), in order to understand the reasons why conventional SMT resource assignment schemes are not so effective in clustered processors. These trade-offs are used to propose a novel resource assignment scheme that gets and average speed up of 17.6% versus Icount improving fairness in 24%.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Performance Enhancement of Multicore Architecture

Author: Awadalla Medhat
Konsowa Hanan
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2015
Field of study

Multicore processors integrate several cores on a single chip. The fixed architecture of multicore platforms often fails to accommodate the inherent diverse requirements of different applications. The permanent need to enhance the performance of multicore architecture motivates the development of a dynamic architecture. To address this issue, this paper presents new algorithms for thread selection in fetch stage. Moreover, this paper presents three new fetch stage policies, EACH_LOOP_FETCH, INC-FETCH, and WZ-FETCH, based on Ordinary Least Square (OLS) regression statistic method. These new fetch policies differ on thread selection time which is represented by instructions’ count and window size. Furthermore, the simulation multicore tool, , is adapted to cope with multicore processor dynamic design by adding a dynamic feature in the policy of thread selection in fetch stage. SPLASH2, parallel scientific workloads, has been used to validate the proposed adaptation for multi2sim. Intensive simulated experiments have been conducted and the obtained results show that remarkable performance enhancements have been achieved in terms of execution time and number of instructions per second produces less broadcast operations compared to the typical algorithm

Institute of Advanced Engineering and Science

Software-controlled priority characterization of POWER5 processor

Author: Boneti Carlos
Buyuktosunoglu Alper
Cazorla Francisco
Cher Chen-Yong
Gioiosa Roberto
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Due to the limitations of instruction-level parallelism, thread-level parallelism has become a popular way to improve processor performance. One example is the IBM POWER5TM processor, a two-context simultaneous-multithreaded dual-core chip. In each SMT core, the IBM POWER5 features two levels of thread resource balancing and prioritization. The first level provides automatic in-hardware resource balancing, while the second level is a software-controlled priority mechanism that presents eight levels of thread priorities. Currently, software-controlled prioritization is only used in limited number of cases in the software platforms due to lack of performance characterization of the effects of this mechanism. In this work, we characterize the effects of the software-based prioritization on several different workloads. We show that the impact of the prioritization significantly depends on the workloads coscheduled on a core. By prioritizing the right task, it is possible to obtain more than two times of throughput improvement for synthetic workloads compared to the baseline. We also present two application case studies targeting two different performance metrics: the first case study improves overall throughput by 23.7% and the second case study reduces the total execution time by 9.3%. In addition, we show the circumstances when a background thread can be run transparently without affecting the performance of the foreground thread.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Managing SMT Resource Usage through Speculative Instruction Window Weighting

Author: Seznec André
Vandierendonck Hans
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

Simultaneous multithreading processors dynamically share processor resources between multiple threads. In general, shared SMT resources may be managed explicitly, e.g. by dynamically setting queue occupation bounds for each thread as in the DCRA and Hill-Climbing policies. Alternatively, resources may be managed implicitly, i.e. resource usage is controlled by placing the desired instruction mix in the resources. In this case, the main resource management tool is the instruction fetch policy which must predict the behavior of each thread (branch mispredictions, long-latency loads, etc.) as it fetches instructions. In this paper, we present the use of Speculative Instruction Window Weighting (SIWW) to bridge the gap between implicit and explicit SMT fetch policies. SIWW estimates for each thread the amount of outstanding work in the processor pipeline. Fetch proceeds for the thread with the least amount of work left. SIWW policies are implicit as fetch proceeds for the thread with the least amount of work left. They are also explicit as maximum resource allocation can also be set. SIWW can use and combine virtually any of the indicators that were previously proposed for guiding the instruction fetch policy (number of in-flight instructions, number of low confidence branches, number of predicted cache misses, etc.). Therefore, SIWW is an \emph{approach to designing SMT fetch policies}, rather than a particular fetch policy. Targeting fairness or throughput is often contradictory and a SMT scheduling policy often optimizes only one performance metric at the sacrifice of the other metric. Our simulations show that the SIWW fetch policy can achieve at the same time state-of-the-art throughput, state-of-the-art fairness and state-of-the-art harmonic performance mean

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Ghent University Academic Bibliography

HAL-Rennes 1

Hybrid Designs for Caches and Cores.

Author: Sleiman Faissal
Publication venue
Publication date
Field of study

Processor power constraints have come to the forefront over the last decade, heralded by the stagnation of clock frequency scaling. High-performance core and cache designs often utilize power-hungry techniques to increase parallelism. Conversely, the most energy-efficient designs opt for a serial execution to avoid unnecessary overheads. While both of these extremes constitute one-size-fits-all approaches, a judicious mix of parallel and serial execution has the potential to achieve the best of both high-performing and energy-efficient designs. This dissertation examines such hybrid designs for cores and caches. Firstly, we introduce a novel, hybrid out-of-order/in-order core microarchitecture. Instructions that are steered towards in-order execution skip register allocation, reordering and dynamic scheduling. At the same time, these instructions can interleave on an instruction-by-instruction basis with instructions that continue to benefit from these conventional out-of-order mechanisms. Secondly, this dissertation revisits a hybrid technique introduced for L1 caches, way-prediction, in the context of last-level caches that are larger, have higher associativity, and experience less locality.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113484/1/sleimanf_1.pd

Deep Blue Documents at the University of Michigan

Hill-climbing SMT processor resource distribution

Author: Donald Yeung
Dorai G. K.
El-Moursy A.
Goncalves R.
Luo K.
Luo K.
Madon D.
Marr D. T.
Raasch S. E.
Raasch S. E.
Seungryul Choi
Sherwood T.
Tullsen D. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref