Search CORE

147 research outputs found

Response time distribution of flash memory accesses

Author: Harrison PG
Patel NM
Zertal S
Publication venue: 'Elsevier BV'
Publication date: 01/04/2010
Field of study

Spiral - Imperial College Digital Repository

Enabling Intra-Plane Parallel Block Erase to Alleviate the Impact of Garbage Collection

Author: Garrett Tyler
Publication venue
Publication date: 23/01/2019
Field of study

Garbage collection (GC) in NAND flash can significantly decrease I/O performance in SSDs by copying valid data to other locations, thus blocking incoming I/O requests. To help improve performance, NAND flash utilizes various advanced commands to increase internal parallelism. Currently, these commands only parallelize operations across channels, chips, dies, and planes, neglecting the block-level and below due structural bottlenecks along the data path and risk of disturbances that can compromise valid data by inducing errors. However, due to the triple-well structure of the NAND flash plane architecture and erasing procedure, it is possible to erase multiple blocks within a plane, in parallel, without being restricted by structural limitations or diminishing the integrity of the valid data. The number of page movements due to multiple block erases can be restrained so as to bound the overhead per GC. Moreover, more capacity can be reclaimed per GC which delays future GCs and effectively reduces their frequency. Such an Intra-Plane Parallel Block Erase (IPPBE) in turn diminishes the impact of GC on incoming requests, improving their response times. Experimental results show that IPPBE can reduce the time spent performing GC by up to 50.7% and 33.6% on average, read/write response time by up to 47.0%/45.4% and 16.5%/14.8% on average respectively, page movements by up to 52.2% and 26.6% on average, and blocks erased by up to 14.2% and 3.6% on average. An energy analysis conducted indicates that by reducing the number of page copies and the number of block erases, the energy cost of garbage collection can be reduced up to 44.1% and 19.3% on average

D-Scholarship@Pitt

플래시메모리 저장장치를 위한 리셋 기반의 읽기 성능 최적화 기법

Author: 이재훈
Publication venue: 서울대학교 대학원
Publication date: 01/02/2018
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2018. 2. 김지홍.본 연구에서는 기존의 읽기 응답시간의 최적화 기법들을 소개하고 기존 기법의 한계를 해결하기 위한 리셋 명령을 제안한다. 또한 기존 기법들과 새롭게 제안하는 리셋 명령을 통합 적용하여 실험을 진행하고 각 기법의 장점과 단점을 비교 분석하였다. 리셋 명령은 읽기 요청이 쓰기/소거가 진행중인 칩을 선점하는데 까지 걸리는 시간이 일시정지에 비하여 빠르지만, 수명을 단축시키는 문제가 존재한다. 따라서 중요한 읽기에 대해서만 선택적으로 적용할 것을 제안하였다. 본 논문에서 제안하는 우선순위 기반 선택적 리셋 기법은 저장장치 내의 가비지 컬렉션 동안에 유효 페이지 읽기와 사용자 읽기가 서로 다른 우선순위를 갖고 실행되는 것에 착안하여, 저장장치 밖의 호스트 시스템에서 유발되는 가비지 컬렉션 읽기와 사용자 읽기에도 서로 다른 우선순위를 부여하여 선택적으로 리셋을 적용하였다. 평가 결과, 쓰기/소거 충돌이 발생하는 모든 사용자 읽기 요청에 리셋을 수행하는 경우와 우선순위 기반으로 선택적 리셋을 적용하는 경우, 평균 읽기 응답시간, 99.99th 읽기 꼬리응답시간, 수명이 각각 9%, 6.9%, 20% 향상되는 것을 확인하였다. 모든 충돌에 대하여 리셋을 수행하는 경우에 비하여 선택적 리셋 기법이 더 좋아지는 이유는 기존에 해결할 수 없었던 읽기 요청 간의 큐 지연시간이 해결되기 때문이다. 또한 저장장치의 수명이 리셋을 전혀 사용하지 않는 경우와 비교하여 단 2.8% 정도의 수명 단축이 발생함을 관찰하였다.제 1 장 서론 1 1.1 연구 배경 1 1.2 연구 동기 6 1.3 연구 기여 9 제 2 장 읽기 응답시간 감소를 위한 기존의 연구 11 2.1 선점 가능한 가비지 컬렉션 11 2.2 소프트웨어 기반 무순서 중첩 실행 스케줄링 기법 12 2.3 쓰기 및 소거 일시정지 /재시작 기법 15 제 3 장 리셋 명령어(Reset Command) 제안 17 3.1 읽기 응답시간 최적화 기법 통합 평가 19 3.2 통합 평가 결과 23 제 4 장 우선순위를 고려한 선택적 리셋 기법 24 4.1 Log-Structured Merge-Tree 기반 시스템 24 4.2 PAReset(Priority-Aware Reset) 구조 26 제 5 장 실험 결과 및 분석 29 5.1 실험 환경 29 5.2 실험 결과 30 제 6 장 결론 33 6.1 결론 33 6.2 향후 연구 36 참고문헌 36 Abstract 37Maste

SNU Open Repository and Archive

Implementation of an AMIDAR-based Java Processor

Author: Li Changgong
Publication venue
Publication date: 01/01/2019
Field of study

This thesis presents a Java processor based on the Adaptive Microinstruction Driven Architecture (AMIDAR). This processor is intended as a research platform for investigating adaptive processor architectures. Combined with a configurable accelerator, it is able to detect and speed up hot spots of arbitrary applications dynamically. In contrast to classical RISC processors, an AMIDAR-based processor consists of four main types of components: a token machine, functional units (FUs), a token distribution network and an FU interconnect structure. The token machine is a specialized functional unit and controls the other FUs by means of tokens. These tokens are delivered to the FUs over the token distribution network. The tokens inform the FUs about what to do with input data and where to send the results. Data is exchanged among the FUs over the FU interconnect structure. Based on the virtual machine architecture defined by the Java bytecode, a total of six FUs have been developed for the Java processor, namely a frame stack, a heap manager, a thread scheduler, a debugger, an integer ALU and a floating-point unit. Using these FUs, the processor can already execute the SPEC JVM98 benchmark suite properly. This indicates that it can be employed to run a broad variety of applications rather than embedded software only. Besides bytecode execution, several enhanced features have also been implemented in the processor to improve its performance and usability. First, the processor includes an object cache using a novel cache index generation scheme that provides a better average hit rate than the classical XOR-based scheme. Second, a hardware garbage collector has been integrated into the heap manager, which greatly reduces the overhead caused by the garbage collection process. Third, thread scheduling has been realized in hardware as well, which allows it to be performed concurrently with the running application. Furthermore, a complete debugging framework has been developed for the processor, which provides powerful debugging functionalities at both software and hardware levels

TUbiblio

tuprints

Memory management in the Smalltalk-80 system

Author: Ellims M.
Publication venue: University of Canterbury. Computer Science
Publication date: 01/01/1987
Field of study

This work presents an examination of the memory management area of the Smalltalk-80 system. Two implementations of this system were completed. The first system used virtual memory managed in an object oriented manner, the performance and related factors of this system is examined in detail. The second system implemented was based wholly in RAM and was used to examine in detail the factors that affected the performance of the system. Two areas of the RAM based system are examined in detail. The first of these is the logical manner in which the memory of the system is structured and its effects on the performance of the system. The second field is the way in which object reference counts are decremented. This second field has potentially the large effect on the system's running speed

UC Research Repository

Software Performance Engineering using Virtual Time Program Execution

Author: Baltas Nikolaos
Publication venue: Computing, Imperial College London
Publication date: 01/07/2013
Field of study

In this thesis we introduce a novel approach to software performance engineering that is based on the execution of code in virtual time. Virtual time execution models the timing-behaviour of unmodified applications by scaling observed method times or replacing them with results acquired from performance model simulation. This facilitates the investigation of "what-if" performance predictions of applications comprising an arbitrary combination of real code and performance models. The ability to analyse code and models in a single framework enables performance testing throughout the software lifecycle, without the need to to extract performance models from code. This is accomplished by forcing thread scheduling decisions to take into account the hypothetical time-scaling or model-based performance specifications of each method. The virtual time execution of I/O operations or multicore targets is also investigated. We explore these ideas using a Virtual EXecution (VEX) framework, which provides performance predictions for multi-threaded applications. The language-independent VEX core is driven by an instrumentation layer that notifies it of thread state changes and method profiling events; it is then up to VEX to control the progress of application threads in virtual time on top of the operating system scheduler. We also describe a Java Instrumentation Environment (JINE), demonstrating the challenges involved in virtual time execution at the JVM level. We evaluate the VEX/JINE tools by executing client-side Java benchmarks in virtual time and identifying the causes of deviations from observed real times. Our results show that VEX and JINE transparently provide predictions for the response time of unmodified applications with typically good accuracy (within 5-10%) and low simulation overheads (25-50% additional time). We conclude this thesis with a case study that shows how models and code can be integrated, thus illustrating our vision on how virtual time execution can support performance testing throughout the software lifecycle

Spiral - Imperial College Digital Repository

TACKLING PERFORMANCE AND SECURITY ISSUES FOR CLOUD STORAGE SYSTEMS

Author: Kang Luyi
Publication venue
Publication date: 01/01/2022
Field of study

Building data-intensive applications and emerging computing paradigm (e.g., Machine Learning (ML), Artificial Intelligence (AI), Internet of Things (IoT) in cloud computing environments is becoming a norm, given the many advantages in scalability, reliability, security and performance. However, under rapid changes in applications, system middleware and underlying storage device, service providers are facing new challenges to deliver performance and security isolation in the context of shared resources among multiple tenants. The gap between the decades-old storage abstraction and modern storage device keeps widening, calling for software/hardware co-designs to approach more effective performance and security protocols. This dissertation rethinks the storage subsystem from device-level to system-level and proposes new designs at different levels to tackle performance and security issues for cloud storage systems. In the first part, we present an event-based SSD (Solid State Drive) simulator that models modern protocols, firmware and storage backend in detail. The proposed simulator can capture the nuances of SSD internal states under various I/O workloads, which help researchers understand the impact of various SSD designs and workload characteristics on end-to-end performance. In the second part, we study the security challenges of shared in-storage computing infrastructures. Many cloud providers offer isolation at multiple levels to secure data and instance, however, security measures in emerging in-storage computing infrastructures are not studied. We first investigate the attacks that could be conducted by offloaded in-storage programs in a multi-tenancy cloud environment. To defend against these attacks, we build a lightweight Trusted Execution Environment, IceClave to enable security isolation between in-storage programs and internal flash management functions. We show that while enforcing security isolation in the SSD controller with minimal hardware cost, IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.4x better performance than the conventional host-based trusted computing approach. In the third part, we investigate the performance interference problem caused by other tenants' I/O flows. We demonstrate that I/O resource sharing can often lead to performance degradation and instability. The block device abstraction fails to expose SSD parallelism and pass application requirements. To this end, we propose a software/hardware co-design to enforce performance isolation by bridging the semantic gap. Our design can significantly improve QoS (Quality of Service) by reducing throughput penalties and tail latency spikes. Lastly, we explore more effective I/O control to address contention in the storage software stack. We illustrate that the state-of-the-art resource control mechanism, Linux cgroups is insufficient for controlling I/O resources. Inappropriate cgroup configurations may even hurt the performance of co-located workloads under memory intensive scenarios. We add kernel support for limiting page cache usage per cgroup and achieving I/O proportionality

Digital Repository at the University of Maryland

Introductory Microcontroller Programming

Author: Alley Peter J
Publication venue: Digital WPI
Publication date: 28/04/2011
Field of study

This text is a treatise on microcontroller programming. It introduces the major peripherals found on most microcontrollers, including the usage of them, focusing on the ATmega644p in the AVR family produced by Atmel. General information and background knowledge on several topics is also presented. These topics include information regarding the hardware of a microcontroller and assembly code as well as instructions regarding good program structure and coding practices. Examples with code and discussion are presented throughout. This is intended for hobbyists and students desiring knowledge on programming microcontrollers, and is written at a level that students entering the junior level core robotics classes would find useful

DigitalCommons@WPI