Search CORE

254 research outputs found

Dynamic thread and data mapping for NoC based CMPs

Author: Kandemir M.
Muralidhara S.P.
Ozturk O.
Publication venue
Publication date: 01/01/2009
Field of study

Thread mapping and data mapping are two important problems in the context of NoC (network-on-chip) based CMPs (chip multiprocessors). While a compiler can determine suitable mappings for data and threads, such static mappings may not work well for multithreaded applications that go through different execution phases during their execution, each phase with potentially different data access patterns than others. Instead, a dynamic mapping strategy, if its overheads can be kept low, may be a more promising option. In this work, we present dynamic (runtime) thread and data mappings for NoC based CMPs. The goal of these mappings is to reduce the distance between the location of the core that requests data and the core whose local memory contains that requested data. In our experiments, we evaluate our proposed thread mapping and data mapping in isolation as well as in an integrated manner. Copyright 2009 ACM

Crossref

Bilkent University Institutional Repository

A Survey of Prediction and Classification Techniques in Multicore Processor Systems

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/05/2019
Field of study

In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

epublications@Marquette

Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors

Author: Cho Sangyeun
Hammoud Mohammad
Melhem Rami
Publication venue: Department of Computer Science, University of Pittsburgh
Publication date: 01/01/2009
Field of study

This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs

D-Scholarship@Pitt

A survey of system level power management schemes in the dark-silicon era for many-core architectures

Author: Ofori-Attah Emmanuel
Opoku Agyeman Michael
Wang Xiaohang
Publication venue: 'European Alliance for Innovation n.o.'
Publication date: 19/09/2018
Field of study

Power consumption in Complementary Metal Oxide Semiconductor (CMOS) technology has escalated to a point that only a fractional part of many-core chips can be powered-on at a time. Fortunately, this fraction can be increased at the expense of performance through the dark-silicon solution. However, with many-core integration set to be heading towards its thousands, power consumption and temperature increases per time, meaning the number of active nodes must be reduced drastically. Therefore, optimized techniques are demanded for continuous advancement in technology. Existing eﬀorts try to overcome this challenge by activating nodes from diﬀerent parts of the chip at the expense of communication latency. Other eﬀorts on the other hand employ run-time power management techniques to manage the power performance of the cores trading-oﬀ performance for power. We found out that, for a signiﬁcant amount of power to saved and high temperature to be avoided, focus should be on reducing the power consumption of all the on-chip components. Especially, the memory hierarchy and the interconnect. Power consumption can be minimized by, reducing the size of high leakage power dissipating elements, turning-oﬀ idle resources and integrating power saving materials

Crossref

University of Northampton's Research Explorer

NECTAR

Jigsaw: Scalable software-defined caches

Author: Beckmann Nathan Zachary
Sanchez Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2013
Field of study

Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce access latency but are prone to hotspots and interference, and cache partitioning techniques only provide isolation but do not reduce access latency.United States. Defense Advanced Research Projects Agency (DARPA PERFECT contract HR0011-13-2-0005)Quanta Computer (Firm

DSpace@MIT

Crossref

Addressing Manufacturing Challenges in NoC-based ULSI Designs

Author: Hernández Luz Carles
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 19/07/2012
Field of study

Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669

RiuNet

Hardware/Software Co-design for Multicore Architectures

Author: Xu Thomas Canhao
Publication venue: Turku Centre for Computer Science
Publication date: 12/09/2012
Field of study

Siirretty Doriast

UTUPub