29 research outputs found

    McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling

    Get PDF
    Abstract-With their significant performance and energy advantages, emerging manycore processors have also brought new challenges to the architecture research community. Manycore processors are highly integrated complex system-on-chips with complicated core and uncore subsystems. The core subsystems can consist of a large number of traditional and asymmetric cores. The uncore subsystems have also become unprecedentedly powerful and complex with deeper cache hierarchies, advanced on-chip interconnects, and high-performance memory controllers. In order to conduct research for emerging manycore processor systems, a microarchitecture-level and cycle-level manycore simulation infrastructure is needed. This paper introduces McSimA+, a new timing simulation infrastructure, to meet these needs. McSimA+ models x86-based asymmetric manycore microarchitectures in detail for both core and uncore subsystems, including a full spectrum of asymmetric cores from single-threaded to multithreaded and from in-order to out-of-order, sophisticated cache hierarchies, coherence hardware, on-chip interconnects, memory controllers, and main memory. McSimA+ is an application-level+ simulator, offering a middle ground between a full-system simulator and an application-level simulator. Therefore, it enjoys the light weight of an application-level simulator and the full control of threads and processes as in a full-system simulator. This paper also explores an asymmetric clustered manycore architecture that can reduce the thread migration cost to achieve a noticeable performance improvement compared to a state-of-the-art asymmetric manycore architecture

    An Overview of Chip Multi-Processors Simulators Technology

    Full text link
    Computer System Architecture (CSA) simulators are generally used to develop and validate new CSA designs and developments. The goal of this paper is to provide an insight into the importance of CSA simulation and the possible criteria that differentiate between various CSA simulators. Multi-dimensional aspects determine the taxonomy of CSA simulators including their accuracy, performance, functionality and flexibility. The Sniper simulator has been selected for a closer look and testing. The Sniper proofs its ability to scale to hundred cores with a wide range of functionality and performance. ยฉ Springer International Publishing Switzerland 2015

    A Comparison of x86 Computer Architecture Simulators

    Get PDF
    The signi๏ฌcance of computer architecture simulators in advancing computer architecture research is widely acknowledged. Computer architects have developed numerous simulators in the past few decades and their number continues to rise. This paper explores different simulation techniques and surveys many simulators. Comparing simulators with each other and validating their correctness has been a challenging task. In this paper, we compare and contrast x86 simulators in terms of ๏ฌ‚exibility, level of details, user friendliness and simulation models. In addition, we measure the experimental error and compare the speed of four contemporary x86 simulators: gem5, Sniper, Multi2sim and PTLsim. We also discuss the strengths and limitations of the different simulators. We believe that this paper provides insights into different simulation strategies and aims to help computer architects understand the differences among the existing simulation tools

    Energy Demand Response for High-Performance Computing Systems

    Get PDF
    The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system\u27s demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems\u27 energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system\u27s emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system\u27s energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response

    ์บ์‹œ ๋ถ„ํ•  ๊ธฐ์ˆ ์„ ์ด์šฉํ•œ ๊ณต์œ  ๋ผ์ŠคํŠธ ๋ ˆ๋ฒจ ์บ์‹œ์˜ ๋ถ„ํ•  ์ •์ฑ…์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ฏผ๊ฐ๋„ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€, 2018. 2. ์•ˆ์ •ํ˜ธ.๋ฉ€ํ‹ฐ ์ฝ”์–ด ์‹œ์Šคํ…œ์—์„œ ์—ฌ๋Ÿฌ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๋“ค์„ ๋™์‹œ์— ์‹คํ–‰ํ•  ๋•Œ, ์‹œ์Šคํ…œ ๊ณต์œ  ์ž์›(๊ณต์œ  ์บ์‹œ, ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ ๋“ฑ)์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝํ•ฉ/๊ฐ„์„ญ์€ ์ผ๋ถ€ ๋˜๋Š” ๋ชจ๋“  ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ์–‘์ ์œผ๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ต๋‹ค. ํŠนํžˆ, ์—ฌ๋Ÿฌ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๋“ค์ด ํ•œ์ •๋œ ๊ณต์œ  ์บ์‹œ ์šฉ๋Ÿ‰์„ ๊ฒฝํ•ฉ์„ ํ†ตํ•ด ๋‚˜๋ˆ„์–ด ์‚ฌ์šฉํ•˜๋‹ค ๋ณด๋ฉด, ์‹ค์‹œ๊ฐ„ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋˜๋Š” ์‹คํ–‰ ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ๋‹ค๋ฅธ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์˜ํ•ด ์บ์‹œ ์ ์œ ์œจ์„ ๊ณผ๋„ํ•˜๊ฒŒ ๋นผ์•—๊ฒจ ์‹ฌ๊ฐํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๊ฒช์„ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณต์œ  ์บ์‹œ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€์ •์ ์ธ ์ƒํ™ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ์šฐ์„ ์ˆœ์œ„๊ฐ€ ๋†’์€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ์ • ์ˆ˜์ค€์˜ ์บ์‹œ ์šฉ๋Ÿ‰์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ• ๋‹นํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ์‹ค์ œ ์ œํ’ˆ์— ์ ์šฉ๋˜๊ธฐ ์ „๋ถ€ํ„ฐ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๊ณ  ์‹คํ—˜๋˜์–ด์™”๋‹ค. ์ธํ…”์€ ์ œ์˜จ ํ”„๋กœ์„ธ์„œ v3 ์ œํ’ˆ ๊ตฐ๋ถ€ํ„ฐ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๋งˆ๋‹ค ๊ณต์œ  ๋ผ์ŠคํŠธ ๋ ˆ๋ฒจ ์บ์‹œ๋ฅผ ๋ถ„ํ• /ํ• ๋‹นํ•  ์ˆ˜ ์žˆ๋Š” Cache Allocation Technology(CAT) ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜์˜€๋‹ค. ๊ณต์œ  ์บ์‹œ์˜ ๋ถ„ํ• ์€ ๋™์ผํ•œ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฐ–๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๋“ค์˜ ์ง‘ํ•ฉ ๋‹จ์œ„๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค. ์บ์‹œ ๋ถ„ํ•  ๋ฐฉ๋ฒ•์—์„œ๋Š” ๋…๋ฆฝ ๋ถ„ํ• ๊ณผ ์ค‘์ฒฉ ๋ถ„ํ•  ๋ฐฉ์‹์„ ์ œ๊ณตํ•œ๋‹ค. ์ค‘์ฒฉ ๋ถ„ํ• ์„ ์‚ฌ์šฉํ•˜๋ฉด ์šฐ์„ ์ˆœ์œ„๊ฐ€ ๋†’์€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋ชจ๋“  ์บ์‹œ ์˜์—ญ์„ ํ• ๋‹นํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋…๋ฆฝ ๋ถ„ํ• ์„ ์‚ฌ์šฉํ•  ๋•Œ ๋ณด๋‹ค ์šฐ์„ ์ˆœ์œ„๊ฐ€ ๋†’์€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฐ ์œ ๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ง๊ด€์ ์ธ ์˜ˆ์ƒ๊ณผ๋Š” ๋ฐ˜๋Œ€ ๊ฒฝํ–ฅ์„ ๋ณด์ผ ๊ฐ€๋Šฅ์„ฑ ๋˜ํ•œ ์กด์žฌํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ํŠน์ • ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด CAT๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ, ๋…๋ฆฝ ๋ถ„ํ• ๊ณผ ์ค‘์ฒฉ ๋ถ„ํ• ์˜ ์„ฑ๋Šฅ์„ ํ•˜๋“œ์›จ์–ด ์‹คํ—˜์„ ํ†ตํ•˜์—ฌ ๋น„๊ตํ•˜๊ณ  ๋ถ„์„ ํ”„๋กœ๊ทธ๋žจ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ๊ทธ ์›์ธ์„ ํŒŒ์•…ํ•˜์˜€๋‹ค. 20 ๊ฐœ ์กฐํ•ฉ(์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์Œ) ์ค‘ 14๊ฐœ ์กฐํ•ฉ์—์„œ ๋…๋ฆฝ ๋ถ„ํ• ์ด ์„ฑ๋Šฅ ์šฐ์œ„(~12%)๋ฅผ ๋ณด์˜€์œผ๋ฉฐ, ๋‚˜๋จธ์ง€ ์กฐํ•ฉ์—์„œ๋Š” ์ค‘์ฒฉ ๋ถ„ํ• ์ด ์„ฑ๋Šฅ ์šฐ์œ„(~16%)๋ฅผ ๋ณด์˜€๋‹ค. ๋…๋ฆฝ ๋ถ„ํ• ์ด ์„ฑ๋Šฅ ์šฐ์œ„๋ฅผ ๋ณด์ด๋Š” ๊ฒฝ์šฐ๋Š” ์ค‘์ฒฉ ๋ถ„ํ•  ์‹œ ๊ณต์œ  ์˜์—ญ์—์„œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐ„์˜ ๊ฒฝ์Ÿ์œผ๋กœ ์ธํ•œ ์บ์‹œ ๋ฏธ์Šค๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜์—ฌ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์ด๋ฅผ ์žฌํ˜„ํ•˜์˜€์œผ๋ฉฐ ์บ์‹œ ๋ฏธ์Šค๊ฐ€ ์ฆ๊ฐ€ํ•œ ๊ฒƒ์€ ์ค‘์ฒฉ ๋ถ„ํ•  ์‹œ ์บ์‹œ ๊ต์ฒด ์ •์ฑ…(์˜ˆ๋ฅผ ๋“ค์–ด, least recently used ์ •์ฑ…)์„ ์ œ๋Œ€๋กœ ์ ์šฉํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ธ ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.์ œ 1 ์žฅ ์„œ ๋ก  1 ์ œ 1 ์ ˆ ์—ฐ๊ตฌ๋™๊ธฐ 1 ์ œ 2 ์ ˆ ๊ด€๋ จ์—ฐ๊ตฌ ๋ฐ ๋ฐฐ๊ฒฝ 3 ์ œ 3 ์ ˆ ์—ฐ๊ตฌ๋‚ด์šฉ 7 ์ œ 2 ์žฅ ๋…๋ฆฝ ๋ถ„ํ• ๊ณผ ์ค‘์ฒฉ ๋ถ„ํ•  ์„ฑ๋Šฅ ๋น„๊ต 9 ์ œ 1 ์ ˆ ๋…๋ฆฝ ๋ถ„ํ• ๊ณผ ์ค‘์ฒฉ ๋ถ„ํ•  ์„ค์ • 9 ์ œ 2 ์ ˆ ํ•˜๋“œ์›จ์–ด ์‹คํ—˜ ํ™˜๊ฒฝ ๋ฐ ์„ค์ • 10 ์ œ 3 ์ ˆ ์–ดํ”Œ๋ ˆ์ผ€์ด์…˜ ๋‹จ๋… ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ 12 ์ œ 4 ์ ˆ ๋…๋ฆฝ ๋ถ„ํ• ๊ณผ ์ค‘์ฒ  ๋ถ„ํ•  ์„ฑ๋Šฅ ๊ฒฐ๊ณผ ๋น„๊ต ๋ถ„์„ 13 ์ œ 3 ์žฅ ์ค‘์ฒฉ ๋ถ„ํ•  ์„ฑ๋Šฅ ์—ดํ™” ์›์ธ ๋ถ„์„ 20 ์ œ 1 ์ ˆ ์บ์‹œ ๊ตํ™˜ ์ •์ฑ… ๋ฐ ์ค‘์ฒฉ ๋ถ„ํ•  ์„ฑ๋Šฅ ์—ฐ๊ด€์„ฑ 20 ์ œ 2 ์ ˆ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜ ํ™˜๊ฒฝ ๋ฐ ์„ค์ • 21 ์ œ 3 ์ ˆ ๊ฒ€์ฆ ์‹คํ—˜ ๊ฒฐ๊ณผ 22 ์ œ 4 ์žฅ ๊ฒฐ๋ก  27 ์ฐธ๊ณ  ๋ฌธํ—Œ 28 Abstract 28Maste

    DRackSim: Simulator for Rack-scale Memory Disaggregation

    Full text link
    Memory disaggregation has emerged as an alternative to traditional server architecture in data centers. This paper introduces DRackSim, a simulation infrastructure to model rack-scale hardware disaggregated memory. DRackSim models multiple compute nodes, memory pools, and a rack-scale interconnect similar to GenZ. An application-level simulation approach simulates an x86 out-of-order multi-core processor with a multi-level cache hierarchy at compute nodes. A queue-based simulation is used to model a remote memory controller and rack-level interconnect, which allows both cache-based and page-based access to remote memory. DRackSim models a central memory manager to manage address space at the memory pools. We integrate community-accepted DRAMSim2 to perform memory simulation at local and remote memory using multiple DRAMSim2 instances. An incremental approach is followed to validate the core and cache subsystem of DRackSim with that of Gem5. We measure the performance of various HPC workloads and show the performance impact for different nodes/pools configuration
    corecore