230 research outputs found

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    A Survey of Research into Mixed Criticality Systems

    Get PDF
    This survey covers research into mixed criticality systems that has been published since Vestalโ€™s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards

    Time-predictable Chip-Multiprocessor Design

    Get PDF
    Abstractโ€”Real-time systems need time-predictable platforms to enable static worst-case execution time (WCET) analysis. Improving the processor performance with superscalar techniques makes static WCET analysis practically impossible. However, most real-time systems are multi-threaded applications and performance can be improved by using several processor cores on a single chip. In this paper we present a time-predictable chipmultiprocessor system that aims to improve system performance while still enabling WCET analysis. The proposed chip-multiprocessor (CMP) uses a shared memory with a time-division multiple access (TDMA) based memory access scheduling. The static TDMA schedule can be integrated into the WCET analysis. Experiments with a JOP based CMP showed that the memory access starts to dominate the execution time when using more than 4 processor cores. To provide a better scalability, more local memories have to be used. We add a processor local scratchpad memory and split data caches, which are still time-predictable, to the processor cores. I

    A survey of techniques for reducing interference in real-time applications on multicore platforms

    Get PDF
    This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.This work was supported in part by the Comunidad de Madrid Government "Nuevas Tรฉcnicas de Desarrollo de Software de Tiempo Real Embarcado Para Plataformas. MPSoC de Prรณxima Generaciรณn" under Grant IND2019/TIC-17261

    Contention in multicore hardware shared resources: Understanding of the state of the art

    Get PDF
    The real-time systems community has over the years devoted considerable attention to the impact on execution timing that arises from contention on access to hardware shared resources. The relevance of this problem has been accentuated with the arrival of multicore processors. From the state of the art on the subject, there appears to be considerable diversity in the understanding of the problem and in the โ€œapproachโ€ to solve it. This sparseness makes it difficult for any reader to form a coherent picture of the problem and solution space. This paper draws a tentative taxonomy in which each known approach to the problem can be categorised based on its specific goals and assumptions.Postprint (published version

    RT-OpenStack: CPU Resource Management for Real-Time Cloud Computing

    Get PDF
    Clouds have become appealing platforms for not only general-purpose applications, but also real-time ones. However, current clouds cannot provide real-time performance to virtual machines (VMs). We observe the demand and the advantage of co-hosting real-time (RT) VMs with non-real-time (regular) VMs in a same cloud. RT VMs can benefit from the easily deployed, elastic resource provisioning provided by the cloud, while regular VMs effectively utilize remaining resources without affecting the performance of RT VMs through pro per resource management at both the cloud and the hypervisor levels. This paper presents RT-OpenStack, a cloud CPU resource management system for co-hosting real-time and regular VMs. RT-OpenStack entails three main contributions: (1) integration of a real-time hypervisor (RT-Xen) and a cloud management system (OpenStack) through a real-time resource interface; (2) a realtime VM scheduler to allow regular VMs to share hosts with RT VMs without interfering the real-time performance of RT VMs; and (3) a VM-to-host mapping strategy that provisions real-time performance to RT VMs while allowing effective resource sharing with regular VMs. Experimental results demonstrate that RTOpenStack can effectively improve the real-time performance of RT VMs while allowing regular VMs to fully utilize the remaining CPU resources

    ๋ฉ”๋ชจ๋ฆฌ ๊ฐ€์ƒ ์ฑ„๋„์„ ํ†ตํ•œ ๋ผ์ŠคํŠธ ๋ ˆ๋ฒจ ์บ์‹œ ํŒŒํ‹ฐ์…”๋‹

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2023. 2. ๊น€์žฅ์šฐ.Ensuring fairness or providing isolation between multiple workloads with distinct characteristics that are collocated on a single, shared-memory system is a challenge. Recent multicore processors provide last-level cache (LLC) hardware partitioning to provide hardware support for isolation, with the cache partitioning often specified by the user. While more LLC capacity often results in higher performance, in this dissertation we identify that a workload allocated more LLC capacity result in worse performance on real-machine experiments, which we refer to as MiW (more is worse). Through various controlled experiments, we identify that another workload with less LLC capacity causes more frequent LLC misses. The workload stresses the main memory system shared by both workloads and degrades the performance of the former workload even if LLC partitioning is used (a balloon effect). To resolve this problem, we propose virtualizing the data path of main memory controllers and dedicating the memory virtual channels (mVCs) to each group of applications, grouped for LLC partitioning. mVC can further fine-tune the performance of groups by differentiating buffer sizes among mVCs. It can reduce the total system cost by executing latency-critical and throughput-oriented workloads together on shared machines, of which performance criteria can be achieved only on dedicated machines if mVCs are not supported. Experiments on a simulated chip multiprocessor show that our proposals effectively eliminate the MiW phenomenon, hence providing additional opportunities for workload consolidation in a datacenter. Our case study demonstrates potential savings of machine count by 21.8% with mVC, which would otherwise violate a service level objective (SLO).์ตœ๊ทผ ๋ฉ€ํ‹ฐ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์€ ํ•™๊ณ„ ๋ฐ ์—…๊ณ„์˜ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ, ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ฉ€ํ‹ฐ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์€ ์„œ๋กœ ๋‹ค๋ฅธ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์ด ๋™์‹œ์— ์‹คํ–‰๋˜๋Š”๋ฐ, ์ด ๋•Œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์€ ์‹œ์Šคํ…œ์˜ ์—ฌ๋Ÿฌ ์ž์›๋“ค์„ ๊ณต์œ ํ•˜๊ฒŒ ๋œ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ๊ณต์œ  ์ž์›์˜ ์˜ˆ๋กœ๋Š” ๋ผ์ŠคํŠธ ๋ ˆ๋ฒจ ์บ์‹œ(LLC) ๋ฐ ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋“ค ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค ๊ฐ„์— ๊ณต์œ  ์ž์›์˜ ๊ณต์ •์„ฑ์„ ๋ณด์žฅํ•˜๊ฑฐ๋‚˜ ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ„์„ญ์„ ๋ฐ›์ง€ ์•Š๋„๋ก ๊ฒฉ๋ฆฌํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ์ผ์ด๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ตœ๊ทผ ๋ฉ€ํ‹ฐ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ๋Š” LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ•˜๋“œ์›จ์–ด์ ์œผ๋กœ ์ œ๊ณตํ•˜๊ธฐ ์‹œ์ž‘ํ•˜์˜€๋‹ค. ์‚ฌ์šฉ์ž๋Š” ํ•˜๋“œ์›จ์–ด์ ์œผ๋กœ ์ œ๊ณต๋œ LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ์›ํ•˜๋Š” ์ˆ˜์ค€๋งŒํผ LLC๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ„์„ญ์„ ๋ฐ›์ง€ ์•Š๋„๋ก ๊ฒฉ๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ LLC ์šฉ๋Ÿ‰์„ ๋งŽ์ด ํ• ๋‹น ๋ฐ›์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋” ๋งŽ์€ LLC ์šฉ๋Ÿ‰์„ ํ• ๋‹น ๋ฐ›์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด ์˜คํžˆ๋ ค ์„ฑ๋Šฅ ์ €ํ•˜๋œ๋‹ค๋Š” ์‚ฌ์‹ค(MiW, more is worse)์„ ํ•˜๋“œ์›จ์–ด์  ์‹คํ—˜์„ ํ†ตํ•ด ํ™•์ธํ•˜์˜€๋‹ค. ๋‹ค์–‘ํ•œ ํ†ต์ œ๋œ ์‹คํ—˜์„ ํ†ตํ•ด LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด LLC ์šฉ๋Ÿ‰์„ ์ ๊ฒŒ ํ• ๋‹น ๋ฐ›์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด LLC ๋ฏธ์Šค๋ฅผ ๋” ์ž์ฃผ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ผ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. LLC ์šฉ๋Ÿ‰์„ ์ ๊ฒŒ ํ• ๋‹น ๋ฐ›์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์ด ๊ณต์œ ํ•˜๋Š” ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์— ์ŠคํŠธ๋ ˆ์Šค๋ฅผ ๊ฐ€ํ•˜๊ณ , LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด ์„œ๋กœ ๊ฒฉ๋ฆฌ๋ฅผ ํ•˜์˜€์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œ์ผฐ๋‹ค. MiW ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์˜ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ƒํ™”ํ•˜๊ณ  LLC ํŒŒํ‹ฐ์…”๋‹์— ์˜ํ•ด ๊ทธ๋ฃนํ™”๋œ ๊ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์— ์ „์šฉ์œผ๋กœ ํ• ๋‹น๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๊ฐ€์ƒ ์ฑ„๋„(mVC)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. mVC๋ฅผ ํ†ตํ•ด ๊ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์€ ๋…๋ฆฝ์ ์ธ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋ฅผ ์†Œ์œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ๊ฐ€์ƒํ™” ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์ด ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋ฅผ ๋…์ ํ•˜๋”๋ผ๋„ ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์€ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์—†๊ฒŒ ๋˜์–ด ์„œ๋กœ ๊ฒฉ๋ฆฌ๋œ ํ™˜๊ฒฝ์„ ์กฐ์„ฑํ•œ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ mVC์˜ ๋ฒ„ํผ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์˜ ์„ฑ๋Šฅ ๋ฏธ์„ธ ์กฐ์ •์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜์˜€๋‹ค. mVC๋ฅผ ๋„์ž…ํ•จ์œผ๋กœ์จ ์ „์ฒด์ ์ธ ์‹œ์Šคํ…œ ๋น„์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ฒ˜๋ฆฌ๋Ÿ‰์ด ์ค‘์š”ํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์„ ํ•จ๊ป˜ ์‹คํ–‰ํ•  ๋•Œ mVC๊ฐ€ ์—†์„ ๊ฒฝ์šฐ์—๋Š” ์ง€์—ฐ ์‹œ๊ฐ„์˜ ์„ฑ๋Šฅ ๊ธฐ์ค€์น˜๋ฅผ ๋งŒ์กฑํ•  ์ˆ˜ ์—†์—ˆ์ง€๋งŒ, mVC๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ ๊ธฐ์ค€์น˜๋ฅผ ๋งŒ์กฑํ•˜๋ฉด์„œ ์‹œ์Šคํ…œ์˜ ์ด ๋น„์šฉ์„ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ฉ€ํ‹ฐ ์นฉ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” MiW ํ˜„์ƒ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•จ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋˜ํ•œ, ๋ฐ์ดํ„ฐ ์„ผํ„ฐ์—์„œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์˜ ๋™์‹œ ์‹คํ–‰์„ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์‚ฌ๋ก€ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด mVC๋ฅผ ๋„์ž…ํ•˜์—ฌ ์‹œ์Šคํ…œ ๋น„์šฉ์„ 21.8%๊นŒ์ง€ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์œผ๋ฉฐ, mVC๋ฅผ ๋„์ž…ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” ์„œ๋น„์Šค ๊ธฐ์ค€(SLO)์„ ๋งŒ์กฑํ•˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.1. Introduction 1 1.1 Research Contributions 5 1.2 Outline 6 2. Background 7 2.1 Cache Hierarchy and Policies 7 2.2 Cache Partitioning 10 2.3 Benchmarks 15 2.3.1 Working Set Size 16 2.3.2 Top-down Analysis 17 2.3.3 Profiling Tools 19 3. More-is-Worse Phenonmenon 21 3.1 More LLC Leading to Performance Drop 21 3.2 Synthetic Workload Evaluation 27 3.3 Impact on Latency-critical Workloads 31 3.4 Workload Analysis 33 3.5 The Root Cause of the MiW Phenomenon 35 3.6 Limitations of Existing Solutions 41 3.6.1 Memory Bandwidth Throttling 41 3.6.2 Fairness-aware Memory Scheduling 44 4. Virtualizing Memory Channels 49 4.1 Memory Virtual Channel (mVC) 50 4.2 mVC Buffer Allocation Strategies 52 4.3 Evaluation 57 4.3.1 Experimental Setup 57 4.3.2 Reproducing Hardware Results 59 4.3.3 Mitigating MiW through mVC 60 4.3.4 Evaluation on Four Groups 64 4.3.5 Potentials for Operating Cost Savings with mVC 66 5. Related Work 71 5.1 Component-wise QoS/Fairness for Shared Resources 71 5.2 Holistic Approaches to QoS/Fairness 73 5.3 MiW on Recent Architectures 74 6. Conclusion 76 6.1 Discussion 78 6.2 Future Work 79 Bibliography 81 ๊ตญ๋ฌธ์ดˆ๋ก 89๋ฐ•
    • โ€ฆ
    corecore