5 research outputs found

    Reducing Application Launch Time by Using Execution-time Prefetching Techniques

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : 전기·컴퓨터곡학뢀, 2013. 2. μ‹ ν˜„μ‹.졜근 λͺ¨λ°”일 기기의 μ‚¬μš©μ΄ λ³΄νŽΈν™”λ˜λ©΄μ„œ ν”„λ‘œκ·Έλž¨ μ‹€ν–‰μ˜ 응닡성은 μ‚¬μš©μžμ˜ μ²΄ν—˜μ— 큰 영ν–₯을 μ£ΌλŠ” μš”μ†Œκ°€ λ˜μ—ˆλ‹€. 특히, μ‘μš©ν”„λ‘œκ·Έλž¨μ˜ κΈ°λ™μ‹œκ°„μ€ 기기에 λŒ€ν•œ μ‚¬μš©μžμ˜ 체감성λŠ₯을 ν‰κ°€ν•˜λŠ” μ€‘μš”ν•œ μ§€ν‘œλ‘œ μ‚¬μš©λœλ‹€. ν•˜μ§€λ§Œ ν”Œλž˜μ‹œ 기반의 λ””μŠ€ν¬κ°€ μ‹œμŠ€ν…œ λ””μŠ€ν¬λ‘œ μ‚¬μš©λ˜λŠ” κ²½μš°μ—λ„ μ‚¬μš©μžλ“€μ€ κΈ΄ μ‘μš©ν”„λ‘œκ·Έλž¨ κΈ°λ™μ‹œκ°„μ„ 자주 κ²½ν—˜ν•œλ‹€. ν”„λ‘œμ„Έμ„œλ‚˜ λ””μŠ€ν¬ μž₯μΉ˜λŠ” 병렬성을 μ΄μš©ν•˜μ—¬ μ„±λŠ₯을 κ°œμ„ ν•˜λŠ” 반면, μ‘μš©ν”„λ‘œκ·Έλž¨ 기동 μ‹œμ—λŠ” μžμ›λ“€μ˜ μ‚¬μš©μ΄ μ§λ ¬ν™”λ˜κΈ° λ•Œλ¬Έμ΄λ‹€. μ‘μš©ν”„λ‘œκ·Έλž¨μ˜ κΈ°λ™μ‹œκ°„ 단좕을 μœ„ν•˜μ—¬ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μƒˆλ‘œμš΄ μ‹€ν–‰μ‹œκ°„ ν”„λ¦¬νŽ˜μΉ­ 기법을 μ œμ•ˆν•œλ‹€. μ‘μš©ν”„λ‘œκ·Έλž¨μ˜ 졜초 기동 μ‹œ μ ‘κ·Όλ˜λŠ” 블둝을 μ •ν™•νžˆ νŒŒμ•…ν•˜κ³ , 이후 기동 μ‹œ 이 블둝듀을 효율적인 λ°©λ²•μœΌλ‘œ λ””μŠ€ν¬μΊμ‹œμ— μ μž¬ν•¨μœΌλ‘œμ¨ κΈ°λ™μ‹œκ°„μ„ λ‹¨μΆ•μ‹œν‚¨λ‹€. 핡심 μ „λž΅μ€ ν”„λ‘œμ„Έμ„œμ™€ λ””μŠ€ν¬μ˜ μ‚¬μš©μ„ λ³‘λ ¬ν™”ν•˜κ³  λ””μŠ€ν¬μ˜ 내뢀병렬성과 λ©€ν‹°μ½”μ–΄μ˜ ν™œμš©μ„ μœ λ„ν•˜μ˜€λ‹€. 또, ν”„λ¦¬νŽ˜μΉ­ μ‹œκ°„μ„ λ‹¨μΆ•ν•˜κΈ° μœ„ν•˜μ—¬ λ””μŠ€ν¬μ˜ νŠΉμ„±μ— 따라 λ‹€μ–‘ν•œ 병합, λ…Όλ¦¬λΈ”λ‘λ²ˆν˜Έ μ •λ ¬, ν”„λ¦¬νŽ˜μΉ˜ μˆ˜μ€€μ˜ μ˜μ‘΄μ„± ν•΄κ²° 기법을 μ‚¬μš©ν•˜μ˜€λ‹€. μ œμ•ˆν•œ ν”„λ¦¬νŽ˜μΉ­ 기법을 λ¦¬λˆ…μŠ€ 컀널 3.5.0에 κ΅¬ν˜„ν•˜μ˜€κ³  많이 μ‚¬μš©λ˜λŠ” μ‘μš©ν”„λ‘œκ·Έλž¨μ„ μ΄μš©ν•˜μ—¬ μ„±λŠ₯을 ν‰κ°€ν•˜μ˜€λ‹€. ν•˜λ“œλ””μŠ€ν¬ 기반의 λ°μŠ€ν¬νƒ‘ μ›Œν¬λ‘œλ“œμ—μ„œ μ½œλ“œμŠ€νƒ€νŠΈ μ‹œκ°„ λŒ€λΉ„ 평균 52%의 κΈ°λ™μ‹œκ°„μ„ λ‹¨μΆ•ν•˜μ˜€κ³  SSDλ₯Ό μ‚¬μš©ν•œ 경우 34.1%κ°€ λ‹¨μΆ•ν•˜μ˜€λ‹€. 또, SSDλ₯Ό μ‚¬μš©ν•˜λŠ” λͺ¨λ°”μΌμš© Meego ν”Œλž«νΌμ—μ„œ 평균 28.1 ~ 34.1%의 κΈ°λ™μ‹œκ°„μ„ λ‹¨μΆ•ν•˜μ˜€κ³ , μ•ˆλ“œλ‘œμ΄λ“œ ν”Œλž«νΌμ΄ νƒ‘μž¬λœ κ°€λŸ­μ‹œ λ„₯μ„œμŠ€ ν°μ—μ„œ 평균 12.8%의 κΈ°λ™μ‹œκ°„μ„ λ‹¨μΆ•ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μœ μ € μˆ˜μ€€μ—μ„œ κ΅¬ν˜„ν•œ ν”„λ¦¬νŽ˜μ³λ₯Ό μ‚¬μš©ν•œ 경우, SSDλ₯Ό μ‚¬μš©ν•˜λŠ” ν™˜κ²½μ—μ„œ 평균 21.7 ~ 28.5%의 κΈ°λ™μ‹œκ°„μ„ λ‹¨μΆ•ν•˜μ˜€λ‹€. μ œμ•ˆν•œ 기법을 기쑴의 ν™˜κ²½μ— κ΅¬ν˜„ν•˜μ—¬ μš΄μš©ν•˜λŠ”λ° κ²½λ―Έν•œ μ˜€λ²„ν—€λ“œλ₯Ό μœ λ°œν•˜λŠ” ν•œνŽΈ, μ‹œμŠ€ν…œμ˜ 응닡속도λ₯Ό κ°œμ„ ν•˜κ³  μ‚¬μš©μžμ˜ 체감 속도λ₯Ό ν–₯μƒμ‹œν‚΄μœΌλ‘œμ„œ λ°μŠ€ν¬νƒ‘ PC와 슀마트폰의 μ„±λŠ₯ν–₯상에 μœ μ˜λ―Έν•œ κΈ°μ—¬λ₯Ό ν•  것이닀.Recently, as mobile devices are widely used, an application responsiveness is of great importance to user experience. Among many metrics, application launch performance is one of important indices to evaluate user-perceived system performance. However, users suffer from long application launch delay even if they use flash-based disk as their system disks. It is mainly because system resources are used in serialized manner during application launch process while processors and disk drives improve their performance by exploiting parallelism. To optimize launch performance, this dissertation presents a new execution-time prefetching technique, which monitors accessed blocks accurately during the first launch of each application and prefetches them into disk caches in the optimized order at their subsequent launches. The key idea is to overlap processor computation with disk I/O while exploiting internal parallelism on disk drives effectively. In order to optimize prefetch performance, we employ various merge, logical-block-number sort, and prefetch-level dependency resolution schemes. We implemented the proposed prefetcher on Linux kernel 3.5.0 and evaluated it by launching a set of widely-used applications. Experiments demonstrate an average of 52% reduction of application launch time on an HDD-based system and 34.1% reduction on an SSD-based system as compared to cold start performance. We also achieve an average of 28.1 ~ 31.4% reduction on mobile Meego platform using an SSD as a system disk. And We port the proposed prefetcher to Android platform and achieve an average of 12.8% reduction of widely-used android applications on Galaxy Nexus phone. In addition, We implemented the proposed prefetcher at user-level which does not require kernel modification. It demonstrated an average of 21.7 ~ 28.5% reduction of application launch time on SSDs. The proposed scheme incurs little overhead from its implementation and operations in the existing environment. It is expected to make significant contributions to performance enhancement of desktop PCs and smartphones by improving both system and user-perceived performance.제 1 μž₯ μ„œ λ‘  1 1.1 연ꡬ 동기 1 1.2 연ꡬ λ‚΄μš© 및 의의 3 1.3 λ…Όλ¬Έμ˜ ꡬ성 8 제 2 μž₯ 연ꡬ λ°°κ²½ 9 2.1 λ²”μš© λ””μŠ€ν¬ λ“œλΌμ΄λΈŒ 9 2.1.1 ν•˜λ“œλ””μŠ€ν¬ λ“œλΌμ΄λΈŒ 9 2.1.2 NAND ν”Œλž˜μ‹œ 기반 Solid-State Drive (SSD) 10 2.1.3 ν•˜μ΄λΈŒλ¦¬λ“œ ν•˜λ“œλ””μŠ€ν¬ 12 2.2 λ¦¬λˆ…μŠ€μ˜ λ””μŠ€ν¬ μž…μΆœλ ₯ λΆ€ μ‹œμŠ€ν…œ 14 2.2.1 λ¦¬λˆ…μŠ€μ˜ λ””μŠ€ν¬ μž…μΆœλ ₯ μŠ€νƒ 14 2.2.2 λ¦¬λˆ…μŠ€μ˜ λ””μŠ€ν¬μΊμ‹œ 16 2.2.3 μž…μΆœλ ₯ μŠ€μΌ€μ₯΄λŸ¬μ˜ μ’…λ₯˜ 및 νŠΉμ§• 18 2.2.4 μž…μΆœλ ₯ ν”ŒλŸ¬κ·Έ/μ–Έν”ŒλŸ¬κ·Έ 20 2.2.5 ν”„λ¦¬νŽ˜μΉ˜ 성곡 μ‹œ μ ˆμ•½λ˜λŠ” ν”„λ‘œμ„Έμ„œ μ‹œκ°„ 뢄석 21 2.3 μ‘μš©ν”„λ‘œκ·Έλž¨μ˜ λΉ λ₯Έ 기동을 μœ„ν•œ κΈ°μ‘΄ 연ꡬ 23 2.3.1 μ‘μš©ν”„λ‘œκ·Έλž¨μ˜ λΉ λ₯Έ 기동을 μœ„ν•œ λ””μŠ€ν¬μΊμ‹± 기법 23 2.3.2 λ²”μš© μ›Œν¬λ‘œλ“œμ˜ λΉ λ₯Έ 응닡을 μœ„ν•œ λ””μŠ€ν¬μΊμ‹± 기법 26 2.3.3 κ·Έ μ™Έμ˜ 기법듀 29 제 3 μž₯ μ‘μš©ν”„λ‘œκ·Έλž¨ 기동 μ‹œμ˜ λ™μž‘ νŠΉμ„± 뢄석 31 3.1 기동 μ‹œλ‚˜λ¦¬μ˜€ 31 3.2 기동 μ‹œ λ°œμƒλ˜λŠ” λ””μŠ€ν¬ μž…μΆœλ ₯ 뢄석 32 3.3 ν”„λ‘œμ„Έμ„œμ™€ λ””μŠ€ν¬μ˜ ν™œμ„±ν™” νŒ¨ν„΄ 뢄석 34 제 4 μž₯ 컀널 μˆ˜μ€€ μ‹€ν–‰μ‹œκ°„ ν”„λ¦¬νŽ˜μ³μ˜ 섀계, κ΅¬ν˜„ 및 평가 37 4.1 μ‹€ν–‰μ‹œκ°„ ν”„λ¦¬νŽ˜μ³μ˜ μ†Œκ°œ 및 λͺ©ν‘œ 37 4.2 κΈ°λ™μ‹œν€€μŠ€ μˆ˜μ§‘ 41 4.3 ν”„λ¦¬νŽ˜μΉ˜μ‹œν€€μŠ€ μŠ€μΌ€μ₯΄λŸ¬ 44 4.3.1 μ΅μŠ€ν…νŠΈ-μ˜μ‘΄μ„± (Extent-Dependency) 뢄석 44 4.3.2 블둝 κ°„ μ˜μ‘΄μ„± 해결을 μœ„ν•œ 메타데이터 μ‰¬ν”„νŠΈ 46 4.3.3 거리기반 병합 50 4.3.4 거리기반 λΉˆκ³΅κ°„μ±„μ›€ 병합 51 4.3.5 λ…Όλ¦¬λΈ”λ‘λ²ˆν˜Έ μ •λ ¬ 52 4.3.6 ν”ŒλŸ¬κ·Έ/μ–Έν”ŒλŸ¬κ·Έ 52 4.4 μ‘μš©ν”„λ‘œκ·Έλž¨κ³Ό ν”„λ¦¬νŽ˜μ³ λ™μž‘μ˜ 병렬화 54 4.4.1 ν•˜λ“œλ””μŠ€ν¬λ₯Ό μ‚¬μš©ν•˜λŠ” μ‹œμŠ€ν…œ 54 4.4.2 SSDλ₯Ό μ‚¬μš©ν•˜λŠ” μ‹œμŠ€ν…œ 56 4.4.3 닀쀑 λ””μŠ€ν¬λ₯Ό μ‚¬μš©ν•˜λŠ” μ‹œμŠ€ν…œ 57 4.5 κΈ°λ™μ‹œν€€μŠ€μ˜ μœ νš¨μ„± 관리 57 4.6 운영체제 λΆ€νŠΈ ν”„λ¦¬νŽ˜μ³ 58 4.7 μœ νœ΄μ‹œκ°„ ν”„λ¦¬νŽ˜μ³ μΈν„°νŽ˜μ΄μŠ€ 59 4.8 μ‹€ν—˜ ν™˜κ²½ 60 4.9 μ‘μš©ν”„λ‘œκ·Έλž¨ κΈ°λ™μ‹œκ°„ 64 4.10 운영 및 μ €μž₯ 곡간 μ˜€λ²„ν—€λ“œ 78 4.11 컀널 μˆ˜μ€€ ν”„λ¦¬νŽ˜μ³μ˜ μ•ˆμ „μ„± 80 제 5 μž₯ μœ μ € μˆ˜μ€€ μ‹€ν–‰μ‹œκ°„ ν”„λ¦¬νŽ˜μ³μ˜ 섀계, κ΅¬ν˜„ 및 평가 83 5.1 μœ μ € μˆ˜μ€€ ν”„λ¦¬νŽ˜μ³μ˜ μ†Œκ°œ 및 ꡬ쑰 83 5.2 μ‘μš©ν”„λ‘œκ·Έλž¨μ˜ ν”„λ¦¬νŽ˜μΉ˜μ‹œν€€μŠ€ 생성 85 5.2.1 λ””μŠ€ν¬ μž…μΆœλ ₯ 정보 μˆ˜μ§‘ 85 5.2.2 κΈ°λ™μ‹œν€€μŠ€ μΆ”μΆœ 85 5.2.3 ν”„λ¦¬νŽ˜μΉ˜μ‹œν€€μŠ€ μŠ€μΌ€μ₯΄ 86 5.3 블둝-파일 사상 (Map) 86 5.3.1 블둝-파일 μ‚¬μƒμ˜ μ†Œκ°œ 86 5.3.2 κΈ°λ™μ‹œν€€μŠ€ κ΄€λ ¨ 파일 λͺ©λ‘μ˜ μˆ˜μ§‘ 88 5.4 μœ μ € μˆ˜μ€€μ˜ ν”„λ¦¬νŽ˜μ³ ν”„λ‘œκ·Έλž¨ 생성 89 5.5 μ‘μš©ν”„λ‘œκ·Έλž¨ 기동 κ΄€λ¦¬μž 90 5.6 μœ μ € μˆ˜μ€€μ˜ ν”„λ¦¬νŽ˜μ³μ˜ μž₯점 및 단점 93 5.7 μ‹€ν—˜ ν™˜κ²½ 93 5.8 μ‘μš©ν”„λ‘œκ·Έλž¨ κΈ°λ™μ‹œκ°„ 94 5.9 운영 및 μ €μž₯ 곡간 μ˜€λ²„ν—€λ“œ 95 제 6 μž₯ κ²°λ‘  및 ν–₯ν›„ 연ꡬ λ°©ν–₯ 96 6.1 κ²°λ‘  96 6.2 ν–₯ν›„ 연ꡬ λ°©ν–₯ 99 μ°Έκ³ λ¬Έν—Œ 102 Abstract 113Docto

    Analysis and optimization of storage IO in distributed and massive parallel high performance systems

    Get PDF
    Although Moore’s law ensures the increase in computational power, IO performance appears to be left behind. This minimizes the benefits gained from increased computational power. Processors have to idle for a long time waiting for IO. Another factor that slows the IO communication is the increased parallelism required in today’s computations. Most modern processing units are built from multiple weak cores. Since IO has a low parallelism the weak cores will decrease system performance. Furthermore to avoid added delay of external storage, future High Performance Computing (HPC) systems will employ Active Storage Fabrics (ASF). These embed storage directly into large HPC systems. Single HPC node IO performance will therefore require optimization. This can only be achieved with a full understanding of the IO stack operations. The analysis of the IO stack under the new conditions of multi-core and massive parallelism leads to some important conclusions. The IO stack is generally built for single devices and is heavily optimized for HDD. Two main optimization approaches are taken. The first is optimizing the IO stack to accommodate parallelism. Conclusions on IO analysis shows that a design based on several parallel operating storage devices is the best approach for parallelism in the IO stack. A parallel IO device with unified storage space is introduced. The unified storage space allows for optimal function division among resources for both read and write. The design also avoids large parallel file systems overhead by using limited changes to a conventional file system. Furthermore the interface of the IO stack is not changed by the design. This is a rather important restriction to avoid application rewrite. The implementation of such a design is shown to result in an increase in performance. The second approach is Optimizing the IO stack for Solid State Drives (SSD). The optimization for the new storage technology demanded further analysis. These show that the IO stack requires revision on many levels for optimal accommodation of SSD. File system preallocation of free blocks is used as an example. Preallocation is important for data contingency on HDD. However due to fast random access of SSD preallocation represents an overhead. By careful analysis to the block allocation algorithms, preallocation is removed. As an additional optimization approach IO compression is suggested for future work. It can utilize idle cores during an IO transaction to perform on the fly IO data compression
    corecore