Search CORE

5 research outputs found

Reducing Application Launch Time by Using Execution-time Prefetching Techniques

Author: 유준희
Publication venue: 서울대학교 대학원
Publication date: 01/02/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 신현식.최근 모바일 기기의 사용이 보편화되면서 프로그램 실행의 응답성은 사용자의 체험에 큰 영향을 주는 요소가 되었다. 특히, 응용프로그램의 기동시간은 기기에 대한 사용자의 체감성능을 평가하는 중요한 지표로 사용된다. 하지만 플래시 기반의 디스크가 시스템 디스크로 사용되는 경우에도 사용자들은 긴 응용프로그램 기동시간을 자주 경험한다. 프로세서나 디스크 장치는 병렬성을 이용하여 성능을 개선하는 반면, 응용프로그램 기동 시에는 자원들의 사용이 직렬화되기 때문이다. 응용프로그램의 기동시간 단축을 위하여 본 논문에서는 새로운 실행시간 프리페칭 기법을 제안한다. 응용프로그램의 최초 기동 시 접근되는 블록을 정확히 파악하고, 이후 기동 시 이 블록들을 효율적인 방법으로 디스크캐시에 적재함으로써 기동시간을 단축시킨다. 핵심 전략은 프로세서와 디스크의 사용을 병렬화하고 디스크의 내부병렬성과 멀티코어의 활용을 유도하였다. 또, 프리페칭 시간을 단축하기 위하여 디스크의 특성에 따라 다양한 병합, 논리블록번호 정렬, 프리페치 수준의 의존성 해결 기법을 사용하였다. 제안한 프리페칭 기법을 리눅스 커널 3.5.0에 구현하였고 많이 사용되는 응용프로그램을 이용하여 성능을 평가하였다. 하드디스크 기반의 데스크탑 워크로드에서 콜드스타트 시간 대비 평균 52%의 기동시간을 단축하였고 SSD를 사용한 경우 34.1%가 단축하였다. 또, SSD를 사용하는 모바일용 Meego 플랫폼에서 평균 28.1 ~ 34.1%의 기동시간을 단축하였고, 안드로이드 플랫폼이 탑재된 갤럭시 넥서스 폰에서 평균 12.8%의 기동시간을 단축하였다. 마지막으로, 유저 수준에서 구현한 프리페쳐를 사용한 경우, SSD를 사용하는 환경에서 평균 21.7 ~ 28.5%의 기동시간을 단축하였다. 제안한 기법을 기존의 환경에 구현하여 운용하는데 경미한 오버헤드를 유발하는 한편, 시스템의 응답속도를 개선하고 사용자의 체감 속도를 향상시킴으로서 데스크탑 PC와 스마트폰의 성능향상에 유의미한 기여를 할 것이다.Recently, as mobile devices are widely used, an application responsiveness is of great importance to user experience. Among many metrics, application launch performance is one of important indices to evaluate user-perceived system performance. However, users suffer from long application launch delay even if they use flash-based disk as their system disks. It is mainly because system resources are used in serialized manner during application launch process while processors and disk drives improve their performance by exploiting parallelism. To optimize launch performance, this dissertation presents a new execution-time prefetching technique, which monitors accessed blocks accurately during the first launch of each application and prefetches them into disk caches in the optimized order at their subsequent launches. The key idea is to overlap processor computation with disk I/O while exploiting internal parallelism on disk drives effectively. In order to optimize prefetch performance, we employ various merge, logical-block-number sort, and prefetch-level dependency resolution schemes. We implemented the proposed prefetcher on Linux kernel 3.5.0 and evaluated it by launching a set of widely-used applications. Experiments demonstrate an average of 52% reduction of application launch time on an HDD-based system and 34.1% reduction on an SSD-based system as compared to cold start performance. We also achieve an average of 28.1 ~ 31.4% reduction on mobile Meego platform using an SSD as a system disk. And We port the proposed prefetcher to Android platform and achieve an average of 12.8% reduction of widely-used android applications on Galaxy Nexus phone. In addition, We implemented the proposed prefetcher at user-level which does not require kernel modification. It demonstrated an average of 21.7 ~ 28.5% reduction of application launch time on SSDs. The proposed scheme incurs little overhead from its implementation and operations in the existing environment. It is expected to make significant contributions to performance enhancement of desktop PCs and smartphones by improving both system and user-perceived performance.제 1 장 서 론 1 1.1 연구 동기 1 1.2 연구 내용 및 의의 3 1.3 논문의 구성 8 제 2 장 연구 배경 9 2.1 범용 디스크 드라이브 9 2.1.1 하드디스크 드라이브 9 2.1.2 NAND 플래시 기반 Solid-State Drive (SSD) 10 2.1.3 하이브리드 하드디스크 12 2.2 리눅스의 디스크 입출력 부 시스템 14 2.2.1 리눅스의 디스크 입출력 스택 14 2.2.2 리눅스의 디스크캐시 16 2.2.3 입출력 스케쥴러의 종류 및 특징 18 2.2.4 입출력 플러그/언플러그 20 2.2.5 프리페치 성공 시 절약되는 프로세서 시간 분석 21 2.3 응용프로그램의 빠른 기동을 위한 기존 연구 23 2.3.1 응용프로그램의 빠른 기동을 위한 디스크캐싱 기법 23 2.3.2 범용 워크로드의 빠른 응답을 위한 디스크캐싱 기법 26 2.3.3 그 외의 기법들 29 제 3 장 응용프로그램 기동 시의 동작 특성 분석 31 3.1 기동 시나리오 31 3.2 기동 시 발생되는 디스크 입출력 분석 32 3.3 프로세서와 디스크의 활성화 패턴 분석 34 제 4 장 커널 수준 실행시간 프리페쳐의 설계, 구현 및 평가 37 4.1 실행시간 프리페쳐의 소개 및 목표 37 4.2 기동시퀀스 수집 41 4.3 프리페치시퀀스 스케쥴러 44 4.3.1 익스텐트-의존성 (Extent-Dependency) 분석 44 4.3.2 블록 간 의존성 해결을 위한 메타데이터 쉬프트 46 4.3.3 거리기반 병합 50 4.3.4 거리기반 빈공간채움 병합 51 4.3.5 논리블록번호 정렬 52 4.3.6 플러그/언플러그 52 4.4 응용프로그램과 프리페쳐 동작의 병렬화 54 4.4.1 하드디스크를 사용하는 시스템 54 4.4.2 SSD를 사용하는 시스템 56 4.4.3 다중 디스크를 사용하는 시스템 57 4.5 기동시퀀스의 유효성 관리 57 4.6 운영체제 부트 프리페쳐 58 4.7 유휴시간 프리페쳐 인터페이스 59 4.8 실험 환경 60 4.9 응용프로그램 기동시간 64 4.10 운영 및 저장 공간 오버헤드 78 4.11 커널 수준 프리페쳐의 안전성 80 제 5 장 유저 수준 실행시간 프리페쳐의 설계, 구현 및 평가 83 5.1 유저 수준 프리페쳐의 소개 및 구조 83 5.2 응용프로그램의 프리페치시퀀스 생성 85 5.2.1 디스크 입출력 정보 수집 85 5.2.2 기동시퀀스 추출 85 5.2.3 프리페치시퀀스 스케쥴 86 5.3 블록-파일 사상 (Map) 86 5.3.1 블록-파일 사상의 소개 86 5.3.2 기동시퀀스 관련 파일 목록의 수집 88 5.4 유저 수준의 프리페쳐 프로그램 생성 89 5.5 응용프로그램 기동 관리자 90 5.6 유저 수준의 프리페쳐의 장점 및 단점 93 5.7 실험 환경 93 5.8 응용프로그램 기동시간 94 5.9 운영 및 저장 공간 오버헤드 95 제 6 장 결론 및 향후 연구 방향 96 6.1 결론 96 6.2 향후 연구 방향 99 참고문헌 102 Abstract 113Docto

SNU Open Repository and Archive

Analysis and optimization of storage IO in distributed and massive parallel high performance systems

Author: Sayed Mohamed Salem el-
Publication venue
Publication date: 01/01/2011
Field of study

Although Moore’s law ensures the increase in computational power, IO performance appears to be left behind. This minimizes the benefits gained from increased computational power. Processors have to idle for a long time waiting for IO. Another factor that slows the IO communication is the increased parallelism required in today’s computations. Most modern processing units are built from multiple weak cores. Since IO has a low parallelism the weak cores will decrease system performance. Furthermore to avoid added delay of external storage, future High Performance Computing (HPC) systems will employ Active Storage Fabrics (ASF). These embed storage directly into large HPC systems. Single HPC node IO performance will therefore require optimization. This can only be achieved with a full understanding of the IO stack operations. The analysis of the IO stack under the new conditions of multi-core and massive parallelism leads to some important conclusions. The IO stack is generally built for single devices and is heavily optimized for HDD. Two main optimization approaches are taken. The first is optimizing the IO stack to accommodate parallelism. Conclusions on IO analysis shows that a design based on several parallel operating storage devices is the best approach for parallelism in the IO stack. A parallel IO device with unified storage space is introduced. The unified storage space allows for optimal function division among resources for both read and write. The design also avoids large parallel file systems overhead by using limited changes to a conventional file system. Furthermore the interface of the IO stack is not changed by the design. This is a rather important restriction to avoid application rewrite. The implementation of such a design is shown to result in an increase in performance. The second approach is Optimizing the IO stack for Solid State Drives (SSD). The optimization for the new storage technology demanded further analysis. These show that the IO stack requires revision on many levels for optimal accommodation of SSD. File system preallocation of free blocks is used as an example. Preallocation is important for data contingency on HDD. However due to fast random access of SSD preallocation represents an overhead. By careful analysis to the block allocation algorithms, preallocation is removed. As an additional optimization approach IO compression is suggested for future work. It can utilize idle cores during an IO transaction to perform on the fly IO data compression