147 research outputs found

    Enabling Intra-Plane Parallel Block Erase to Alleviate the Impact of Garbage Collection

    Get PDF
    Garbage collection (GC) in NAND flash can significantly decrease I/O performance in SSDs by copying valid data to other locations, thus blocking incoming I/O requests. To help improve performance, NAND flash utilizes various advanced commands to increase internal parallelism. Currently, these commands only parallelize operations across channels, chips, dies, and planes, neglecting the block-level and below due structural bottlenecks along the data path and risk of disturbances that can compromise valid data by inducing errors. However, due to the triple-well structure of the NAND flash plane architecture and erasing procedure, it is possible to erase multiple blocks within a plane, in parallel, without being restricted by structural limitations or diminishing the integrity of the valid data. The number of page movements due to multiple block erases can be restrained so as to bound the overhead per GC. Moreover, more capacity can be reclaimed per GC which delays future GCs and effectively reduces their frequency. Such an Intra-Plane Parallel Block Erase (IPPBE) in turn diminishes the impact of GC on incoming requests, improving their response times. Experimental results show that IPPBE can reduce the time spent performing GC by up to 50.7% and 33.6% on average, read/write response time by up to 47.0%/45.4% and 16.5%/14.8% on average respectively, page movements by up to 52.2% and 26.6% on average, and blocks erased by up to 14.2% and 3.6% on average. An energy analysis conducted indicates that by reducing the number of page copies and the number of block erases, the energy cost of garbage collection can be reduced up to 44.1% and 19.3% on average

    ํ”Œ๋ž˜์‹œ๋ฉ”๋ชจ๋ฆฌ ์ €์žฅ์žฅ์น˜๋ฅผ ์œ„ํ•œ ๋ฆฌ์…‹ ๊ธฐ๋ฐ˜์˜ ์ฝ๊ธฐ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2018. 2. ๊น€์ง€ํ™.๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ธฐ์กด์˜ ์ฝ๊ธฐ ์‘๋‹ต์‹œ๊ฐ„์˜ ์ตœ์ ํ™” ๊ธฐ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•˜๊ณ  ๊ธฐ์กด ๊ธฐ๋ฒ•์˜ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฆฌ์…‹ ๋ช…๋ น์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ๊ธฐ์กด ๊ธฐ๋ฒ•๋“ค๊ณผ ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆํ•˜๋Š” ๋ฆฌ์…‹ ๋ช…๋ น์„ ํ†ตํ•ฉ ์ ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๊ณ  ๊ฐ ๊ธฐ๋ฒ•์˜ ์žฅ์ ๊ณผ ๋‹จ์ ์„ ๋น„๊ต ๋ถ„์„ํ•˜์˜€๋‹ค. ๋ฆฌ์…‹ ๋ช…๋ น์€ ์ฝ๊ธฐ ์š”์ฒญ์ด ์“ฐ๊ธฐ/์†Œ๊ฑฐ๊ฐ€ ์ง„ํ–‰์ค‘์ธ ์นฉ์„ ์„ ์ ํ•˜๋Š”๋ฐ ๊นŒ์ง€ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„์ด ์ผ์‹œ์ •์ง€์— ๋น„ํ•˜์—ฌ ๋น ๋ฅด์ง€๋งŒ, ์ˆ˜๋ช…์„ ๋‹จ์ถ•์‹œํ‚ค๋Š” ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ค‘์š”ํ•œ ์ฝ๊ธฐ์— ๋Œ€ํ•ด์„œ๋งŒ ์„ ํƒ์ ์œผ๋กœ ์ ์šฉํ•  ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ์šฐ์„ ์ˆœ์œ„ ๊ธฐ๋ฐ˜ ์„ ํƒ์  ๋ฆฌ์…‹ ๊ธฐ๋ฒ•์€ ์ €์žฅ์žฅ์น˜ ๋‚ด์˜ ๊ฐ€๋น„์ง€ ์ปฌ๋ ‰์…˜ ๋™์•ˆ์— ์œ ํšจ ํŽ˜์ด์ง€ ์ฝ๊ธฐ์™€ ์‚ฌ์šฉ์ž ์ฝ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฐ–๊ณ  ์‹คํ–‰๋˜๋Š” ๊ฒƒ์— ์ฐฉ์•ˆํ•˜์—ฌ, ์ €์žฅ์žฅ์น˜ ๋ฐ–์˜ ํ˜ธ์ŠคํŠธ ์‹œ์Šคํ…œ์—์„œ ์œ ๋ฐœ๋˜๋Š” ๊ฐ€๋น„์ง€ ์ปฌ๋ ‰์…˜ ์ฝ๊ธฐ์™€ ์‚ฌ์šฉ์ž ์ฝ๊ธฐ์—๋„ ์„œ๋กœ ๋‹ค๋ฅธ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ์„ ํƒ์ ์œผ๋กœ ๋ฆฌ์…‹์„ ์ ์šฉํ•˜์˜€๋‹ค. ํ‰๊ฐ€ ๊ฒฐ๊ณผ, ์“ฐ๊ธฐ/์†Œ๊ฑฐ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•˜๋Š” ๋ชจ๋“  ์‚ฌ์šฉ์ž ์ฝ๊ธฐ ์š”์ฒญ์— ๋ฆฌ์…‹์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์™€ ์šฐ์„ ์ˆœ์œ„ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ํƒ์  ๋ฆฌ์…‹์„ ์ ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ํ‰๊ท  ์ฝ๊ธฐ ์‘๋‹ต์‹œ๊ฐ„, 99.99th ์ฝ๊ธฐ ๊ผฌ๋ฆฌ์‘๋‹ต์‹œ๊ฐ„, ์ˆ˜๋ช…์ด ๊ฐ๊ฐ 9%, 6.9%, 20% ํ–ฅ์ƒ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ชจ๋“  ์ถฉ๋Œ์— ๋Œ€ํ•˜์—ฌ ๋ฆฌ์…‹์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์— ๋น„ํ•˜์—ฌ ์„ ํƒ์  ๋ฆฌ์…‹ ๊ธฐ๋ฒ•์ด ๋” ์ข‹์•„์ง€๋Š” ์ด์œ ๋Š” ๊ธฐ์กด์— ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†์—ˆ๋˜ ์ฝ๊ธฐ ์š”์ฒญ ๊ฐ„์˜ ํ ์ง€์—ฐ์‹œ๊ฐ„์ด ํ•ด๊ฒฐ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋˜ํ•œ ์ €์žฅ์žฅ์น˜์˜ ์ˆ˜๋ช…์ด ๋ฆฌ์…‹์„ ์ „ํ˜€ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์™€ ๋น„๊ตํ•˜์—ฌ ๋‹จ 2.8% ์ •๋„์˜ ์ˆ˜๋ช… ๋‹จ์ถ•์ด ๋ฐœ์ƒํ•จ์„ ๊ด€์ฐฐํ•˜์˜€๋‹ค.์ œ 1 ์žฅ ์„œ๋ก  1 1.1 ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ 1 1.2 ์—ฐ๊ตฌ ๋™๊ธฐ 6 1.3 ์—ฐ๊ตฌ ๊ธฐ์—ฌ 9 ์ œ 2 ์žฅ ์ฝ๊ธฐ ์‘๋‹ต์‹œ๊ฐ„ ๊ฐ์†Œ๋ฅผ ์œ„ํ•œ ๊ธฐ์กด์˜ ์—ฐ๊ตฌ 11 2.1 ์„ ์  ๊ฐ€๋Šฅํ•œ ๊ฐ€๋น„์ง€ ์ปฌ๋ ‰์…˜ 11 2.2 ์†Œํ”„ํŠธ์›จ์–ด ๊ธฐ๋ฐ˜ ๋ฌด์ˆœ์„œ ์ค‘์ฒฉ ์‹คํ–‰ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ• 12 2.3 ์“ฐ๊ธฐ ๋ฐ ์†Œ๊ฑฐ ์ผ์‹œ์ •์ง€ /์žฌ์‹œ์ž‘ ๊ธฐ๋ฒ• 15 ์ œ 3 ์žฅ ๋ฆฌ์…‹ ๋ช…๋ น์–ด(Reset Command) ์ œ์•ˆ 17 3.1 ์ฝ๊ธฐ ์‘๋‹ต์‹œ๊ฐ„ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ํ†ตํ•ฉ ํ‰๊ฐ€ 19 3.2 ํ†ตํ•ฉ ํ‰๊ฐ€ ๊ฒฐ๊ณผ 23 โ€ƒ ์ œ 4 ์žฅ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ณ ๋ คํ•œ ์„ ํƒ์  ๋ฆฌ์…‹ ๊ธฐ๋ฒ• 24 4.1 Log-Structured Merge-Tree ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ 24 4.2 PAReset(Priority-Aware Reset) ๊ตฌ์กฐ 26 ์ œ 5 ์žฅ ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„ 29 5.1 ์‹คํ—˜ ํ™˜๊ฒฝ 29 5.2 ์‹คํ—˜ ๊ฒฐ๊ณผ 30 ์ œ 6 ์žฅ ๊ฒฐ๋ก  33 6.1 ๊ฒฐ๋ก  33 6.2 ํ–ฅํ›„ ์—ฐ๊ตฌ 36 ์ฐธ๊ณ ๋ฌธํ—Œ 36 Abstract 37Maste

    Implementation of an AMIDAR-based Java Processor

    Get PDF
    This thesis presents a Java processor based on the Adaptive Microinstruction Driven Architecture (AMIDAR). This processor is intended as a research platform for investigating adaptive processor architectures. Combined with a configurable accelerator, it is able to detect and speed up hot spots of arbitrary applications dynamically. In contrast to classical RISC processors, an AMIDAR-based processor consists of four main types of components: a token machine, functional units (FUs), a token distribution network and an FU interconnect structure. The token machine is a specialized functional unit and controls the other FUs by means of tokens. These tokens are delivered to the FUs over the token distribution network. The tokens inform the FUs about what to do with input data and where to send the results. Data is exchanged among the FUs over the FU interconnect structure. Based on the virtual machine architecture defined by the Java bytecode, a total of six FUs have been developed for the Java processor, namely a frame stack, a heap manager, a thread scheduler, a debugger, an integer ALU and a floating-point unit. Using these FUs, the processor can already execute the SPEC JVM98 benchmark suite properly. This indicates that it can be employed to run a broad variety of applications rather than embedded software only. Besides bytecode execution, several enhanced features have also been implemented in the processor to improve its performance and usability. First, the processor includes an object cache using a novel cache index generation scheme that provides a better average hit rate than the classical XOR-based scheme. Second, a hardware garbage collector has been integrated into the heap manager, which greatly reduces the overhead caused by the garbage collection process. Third, thread scheduling has been realized in hardware as well, which allows it to be performed concurrently with the running application. Furthermore, a complete debugging framework has been developed for the processor, which provides powerful debugging functionalities at both software and hardware levels

    Memory management in the Smalltalk-80 system

    Get PDF
    This work presents an examination of the memory management area of the Smalltalk-80 system. Two implementations of this system were completed. The first system used virtual memory managed in an object oriented manner, the performance and related factors of this system is examined in detail. The second system implemented was based wholly in RAM and was used to examine in detail the factors that affected the performance of the system. Two areas of the RAM based system are examined in detail. The first of these is the logical manner in which the memory of the system is structured and its effects on the performance of the system. The second field is the way in which object reference counts are decremented. This second field has potentially the large effect on the system's running speed

    Software Performance Engineering using Virtual Time Program Execution

    Get PDF
    In this thesis we introduce a novel approach to software performance engineering that is based on the execution of code in virtual time. Virtual time execution models the timing-behaviour of unmodified applications by scaling observed method times or replacing them with results acquired from performance model simulation. This facilitates the investigation of "what-if" performance predictions of applications comprising an arbitrary combination of real code and performance models. The ability to analyse code and models in a single framework enables performance testing throughout the software lifecycle, without the need to to extract performance models from code. This is accomplished by forcing thread scheduling decisions to take into account the hypothetical time-scaling or model-based performance specifications of each method. The virtual time execution of I/O operations or multicore targets is also investigated. We explore these ideas using a Virtual EXecution (VEX) framework, which provides performance predictions for multi-threaded applications. The language-independent VEX core is driven by an instrumentation layer that notifies it of thread state changes and method profiling events; it is then up to VEX to control the progress of application threads in virtual time on top of the operating system scheduler. We also describe a Java Instrumentation Environment (JINE), demonstrating the challenges involved in virtual time execution at the JVM level. We evaluate the VEX/JINE tools by executing client-side Java benchmarks in virtual time and identifying the causes of deviations from observed real times. Our results show that VEX and JINE transparently provide predictions for the response time of unmodified applications with typically good accuracy (within 5-10%) and low simulation overheads (25-50% additional time). We conclude this thesis with a case study that shows how models and code can be integrated, thus illustrating our vision on how virtual time execution can support performance testing throughout the software lifecycle

    TACKLING PERFORMANCE AND SECURITY ISSUES FOR CLOUD STORAGE SYSTEMS

    Get PDF
    Building data-intensive applications and emerging computing paradigm (e.g., Machine Learning (ML), Artificial Intelligence (AI), Internet of Things (IoT) in cloud computing environments is becoming a norm, given the many advantages in scalability, reliability, security and performance. However, under rapid changes in applications, system middleware and underlying storage device, service providers are facing new challenges to deliver performance and security isolation in the context of shared resources among multiple tenants. The gap between the decades-old storage abstraction and modern storage device keeps widening, calling for software/hardware co-designs to approach more effective performance and security protocols. This dissertation rethinks the storage subsystem from device-level to system-level and proposes new designs at different levels to tackle performance and security issues for cloud storage systems. In the first part, we present an event-based SSD (Solid State Drive) simulator that models modern protocols, firmware and storage backend in detail. The proposed simulator can capture the nuances of SSD internal states under various I/O workloads, which help researchers understand the impact of various SSD designs and workload characteristics on end-to-end performance. In the second part, we study the security challenges of shared in-storage computing infrastructures. Many cloud providers offer isolation at multiple levels to secure data and instance, however, security measures in emerging in-storage computing infrastructures are not studied. We first investigate the attacks that could be conducted by offloaded in-storage programs in a multi-tenancy cloud environment. To defend against these attacks, we build a lightweight Trusted Execution Environment, IceClave to enable security isolation between in-storage programs and internal flash management functions. We show that while enforcing security isolation in the SSD controller with minimal hardware cost, IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.4x better performance than the conventional host-based trusted computing approach. In the third part, we investigate the performance interference problem caused by other tenants' I/O flows. We demonstrate that I/O resource sharing can often lead to performance degradation and instability. The block device abstraction fails to expose SSD parallelism and pass application requirements. To this end, we propose a software/hardware co-design to enforce performance isolation by bridging the semantic gap. Our design can significantly improve QoS (Quality of Service) by reducing throughput penalties and tail latency spikes. Lastly, we explore more effective I/O control to address contention in the storage software stack. We illustrate that the state-of-the-art resource control mechanism, Linux cgroups is insufficient for controlling I/O resources. Inappropriate cgroup configurations may even hurt the performance of co-located workloads under memory intensive scenarios. We add kernel support for limiting page cache usage per cgroup and achieving I/O proportionality

    Introductory Microcontroller Programming

    Get PDF
    This text is a treatise on microcontroller programming. It introduces the major peripherals found on most microcontrollers, including the usage of them, focusing on the ATmega644p in the AVR family produced by Atmel. General information and background knowledge on several topics is also presented. These topics include information regarding the hardware of a microcontroller and assembly code as well as instructions regarding good program structure and coding practices. Examples with code and discussion are presented throughout. This is intended for hobbyists and students desiring knowledge on programming microcontrollers, and is written at a level that students entering the junior level core robotics classes would find useful
    • โ€ฆ
    corecore