Search CORE

29 research outputs found

McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling

Author
Publication venue
Publication date: 05/03/2020
Field of study

Abstract-With their significant performance and energy advantages, emerging manycore processors have also brought new challenges to the architecture research community. Manycore processors are highly integrated complex system-on-chips with complicated core and uncore subsystems. The core subsystems can consist of a large number of traditional and asymmetric cores. The uncore subsystems have also become unprecedentedly powerful and complex with deeper cache hierarchies, advanced on-chip interconnects, and high-performance memory controllers. In order to conduct research for emerging manycore processor systems, a microarchitecture-level and cycle-level manycore simulation infrastructure is needed. This paper introduces McSimA+, a new timing simulation infrastructure, to meet these needs. McSimA+ models x86-based asymmetric manycore microarchitectures in detail for both core and uncore subsystems, including a full spectrum of asymmetric cores from single-threaded to multithreaded and from in-order to out-of-order, sophisticated cache hierarchies, coherence hardware, on-chip interconnects, memory controllers, and main memory. McSimA+ is an application-level+ simulator, offering a middle ground between a full-system simulator and an application-level simulator. Therefore, it enjoys the light weight of an application-level simulator and the full control of threads and processes as in a full-system simulator. This paper also explores an asymmetric clustered manycore architecture that can reduce the thread migration cost to achieve a noticeable performance improvement compared to a state-of-the-art asymmetric manycore architecture

CiteSeerX

An Overview of Chip Multi-Processors Simulators Technology

Author: A.-M. M. C. Z. “A Survey of Computer System Architecture Simulators, Case Study: Sniper " in APCASE
D Sanchez
J Chen
JH Ahn
M Rosenblum
PS Magnusson
SS Mukherjee
T Austin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Computer System Architecture (CSA) simulators are generally used to develop and validate new CSA designs and developments. The goal of this paper is to provide an insight into the importance of CSA simulation and the possible criteria that differentiate between various CSA simulators. Multi-dimensional aspects determine the taxonomy of CSA simulators including their accuracy, performance, functionality and flexibility. The Sniper simulator has been selected for a closer look and testing. The Sniper proofs its ability to scale to hundred cores with a wide range of functionality and performance. © Springer International Publishing Switzerland 2015

Crossref

OPUS - University of Technology Sydney

A Comparison of x86 Computer Architecture Simulators

Author: Akram Ayaz
Sawalha Lina
Publication venue: ScholarWorks at WMU
Publication date: 01/10/2016
Field of study

The signiﬁcance of computer architecture simulators in advancing computer architecture research is widely acknowledged. Computer architects have developed numerous simulators in the past few decades and their number continues to rise. This paper explores different simulation techniques and surveys many simulators. Comparing simulators with each other and validating their correctness has been a challenging task. In this paper, we compare and contrast x86 simulators in terms of ﬂexibility, level of details, user friendliness and simulation models. In addition, we measure the experimental error and compare the speed of four contemporary x86 simulators: gem5, Sniper, Multi2sim and PTLsim. We also discuss the strengths and limitations of the different simulators. We believe that this paper provides insights into different simulation strategies and aims to help computer architects understand the differences among the existing simulation tools

ScholarWorks at WMU

HySim: A Hybrid Software/Hardware Simulation Framework for Early Architectural Exploration of Chip Multiprocessors

Author
Publication venue
Publication date
Field of study

HySim: A Hybrid Software/Hardware Simulation Framework for Early Architectural Exploration of Chip Multiprocessors

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Energy Demand Response for High-Performance Computing Systems

Author: Ahmed Kishwar
Publication venue: FIU Digital Commons
Publication date: 22/03/2018
Field of study

The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system\u27s demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems\u27 energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system\u27s emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system\u27s energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response

DigitalCommons@Florida International University

캐시 분할 기술을 이용한 공유 라스트 레벨 캐시의 분할 정책에 따른 성능 민감도 연구

Author: 서슬기
Publication venue: 서울대학교 대학원
Publication date: 01/02/2018
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부, 2018. 2. 안정호.멀티 코어 시스템에서 여러 어플리케이션들을 동시에 실행할 때, 시스템 공유 자원(공유 캐시, 메인 메모리 등)에서 발생하는 경합/간섭은 일부 또는 모든 어플리케이션의 성능 저하를 유발할 수 있으며, 이를 양적으로 예측하는 것은 매우 어렵다. 특히, 여러 어플리케이션들이 한정된 공유 캐시 용량을 경합을 통해 나누어 사용하다 보면, 실시간 어플리케이션 또는 실행 시간이 중요한 어플리케이션이 다른 어플리케이션에 의해 캐시 점유율을 과도하게 빼앗겨 심각한 성능 저하를 겪을 수 있다. 이러한 공유 캐시에서 발생할 수 있는 부정적인 상황을 방지하기 위해서, 우선순위가 높은 어플리케이션에 적정 수준의 캐시 용량을 독립적으로 할당하는 방법을 사용할 수 있다. 이러한 방법은 실제 제품에 적용되기 전부터 광범위하게 연구되고 실험되어왔다. 인텔은 제온 프로세서 v3 제품 군부터 어플리케이션마다 공유 라스트 레벨 캐시를 분할/할당할 수 있는 Cache Allocation Technology(CAT) 기술을 적용하였다. 공유 캐시의 분할은 동일한 우선순위를 갖는 어플리케이션들의 집합 단위로 이루어진다. 캐시 분할 방법에서는 독립 분할과 중첩 분할 방식을 제공한다. 중첩 분할을 사용하면 우선순위가 높은 어플리케이션에 모든 캐시 영역을 할당할 수 있기 때문에 독립 분할을 사용할 때 보다 우선순위가 높은 어플리케이션의 성능을 최대화하는 데 유리할 수 있다. 그러나 이러한 직관적인 예상과는 반대 경향을 보일 가능성 또한 존재한다. 본 연구는 특정 어플리케이션의 성능을 최대화하기 위해 CAT를 사용할 때, 독립 분할과 중첩 분할의 성능을 하드웨어 실험을 통하여 비교하고 분석 프로그램과 시뮬레이션으로 그 원인을 파악하였다. 20 개 조합(어플리케이션 쌍) 중 14개 조합에서 독립 분할이 성능 우위(~12%)를 보였으며, 나머지 조합에서는 중첩 분할이 성능 우위(~16%)를 보였다. 독립 분할이 성능 우위를 보이는 경우는 중첩 분할 시 공유 영역에서 어플리케이션 간의 경쟁으로 인한 캐시 미스가 과도하게 발생하여 성능이 저하되는 것을 확인하였다. 시뮬레이션을 통해 이를 재현하였으며 캐시 미스가 증가한 것은 중첩 분할 시 캐시 교체 정책(예를 들어, least recently used 정책)을 제대로 적용할 수 없기 때문인 것을 확인하였다.제 1 장 서 론 1 제 1 절 연구동기 1 제 2 절 관련연구 및 배경 3 제 3 절 연구내용 7 제 2 장 독립 분할과 중첩 분할 성능 비교 9 제 1 절 독립 분할과 중첩 분할 설정 9 제 2 절 하드웨어 실험 환경 및 설정 10 제 3 절 어플레케이션 단독 수행 결과 12 제 4 절 독립 분할과 중철 분할 성능 결과 비교 분석 13 제 3 장 중첩 분할 성능 열화 원인 분석 20 제 1 절 캐시 교환 정책 및 중첩 분할 성능 연관성 20 제 2 절 시뮬레이션 실험 환경 및 설정 21 제 3 절 검증 실험 결과 22 제 4 장 결론 27 참고 문헌 28 Abstract 28Maste

SNU Open Repository and Archive

DRackSim: Simulator for Rack-scale Memory Disaggregation

Author: Jose John
Narayanan Vijaykrishnan
Puri Amit
Venkatesh Tamarapalli
Publication venue
Publication date: 19/09/2023
Field of study

Memory disaggregation has emerged as an alternative to traditional server architecture in data centers. This paper introduces DRackSim, a simulation infrastructure to model rack-scale hardware disaggregated memory. DRackSim models multiple compute nodes, memory pools, and a rack-scale interconnect similar to GenZ. An application-level simulation approach simulates an x86 out-of-order multi-core processor with a multi-level cache hierarchy at compute nodes. A queue-based simulation is used to model a remote memory controller and rack-level interconnect, which allows both cache-based and page-based access to remote memory. DRackSim models a central memory manager to manage address space at the memory pools. We integrate community-accepted DRAMSim2 to perform memory simulation at local and remote memory using multiple DRAMSim2 instances. An incremental approach is followed to validate the core and cache subsystem of DRackSim with that of Gem5. We measure the performance of various HPC workloads and show the performance impact for different nodes/pools configuration

arXiv.org e-Print Archive