Search CORE

3 research outputs found

Improved Cache-hot Page Allocation Technique for Reducing Page Initialization Latency of Linux Based Systems

Author: 양석우
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 홍성수.최근 사용자 대화형(user-interactive) 응용들은 OS에게 많은 양의 메모리를 빈번하게 요구한다는 특징을 보인다. 응용의 메모리 할당 요청이 발생하면 OS는 페이지 단위로 메모리 공간을 할당하는데, 이 때 OS는 할당할 페이지의 초기화 작업을 필수적으로 수행한다. 페이지 할당을 빈번하게 시도하는 사용자 대화형 응용에서 이 페이지 초기화 작업의 지연이 응용의 성능을 저하시키는 원인이 되고 있다. 기존 리눅스 기반 시스템은 이러한 페이지 초기화 지연을 단축하기 위해, CPU의 캐시에 매핑되어 페이지 초기화시 빠르게 접근할 수 있는 페이지인 캐시-핫 페이지를 우선적으로 할당한다. 하지만 기존 리눅스는 CPU의 사유 캐시(private cache)에 매핑된 캐시-핫 페이지만을 고려하며, 공유 캐시(shared cache)에 매핑된 캐시-핫 페이지를 활용하지 못한다. 이 때문에 사유 캐시에 매핑된 캐시-핫 페이지가 존재하지 않는 경우 공유 캐시에 매핑된 캐시-핫 페이지가 시스템에 존재하더라도 캐시-콜드 페이지를 할당받는 경우가 발생한다. 본 논문에서는 사유 캐시와 공유 캐시에 매핑된 캐시-핫 페이지를 모두 고려하여, 응용이 캐시-핫 페이지를 할당받을 확률을 기존 기법보다 높이는 향상된 캐시-핫 페이지 할당 기법을 제안한다. 제안된 기법은 페이지 할당 요청이 발생하면 먼저 각 코어의 사유 캐시에 매핑된 캐시-핫 페이지를 우선적으로 할당한다. 만약 사유 캐시에 매핑된 캐시-핫 페이지가 없다면 캐시-콜드 페이지를 할당받는 대신 공유 캐시에 매핑된 캐시-핫 페이지를 할당한다. 이를 통해 캐시-핫 페이지를 할당받을 확률을 기존 기법보다 높이고, 결과적으로 평균 페이지 초기화 지연을 단축한다. 제안된 기법을 리눅스 커널 4.18.10버전 기반 데스크탑 환경에서 구현하여 실험한 결과, 평균 페이지 초기화 지연이 기존 리눅스 시스템과 비교하여 약 7% 단축되었다.Recently, user-interactive applications frequently request large amount of memory space to OS. When an application requests memory allocation, the OS allocates memory space in units of pages. In this case, the OS essentially performs the page initialization on the page which is to be allocated. In a user-interactive application that frequently requests to allocated pages, the latency of this page initialization causes the performance degradation of application. Legacy Linux kernel preferentially allocate a cache-hot page, which is a page that is mapped to CPUs cache. This cache-hot page can be accessed with short latency when page initialization. However, legacy Linux kernel only considers cache-hot pages which mapped to CPUs private cache and does not utilize the cache-hot pages which mapped to shared cache. For this reason, if there is no cache-hot page which mapped to private cache, kernel allocates cache-cold page while there are cache-hot pages which mapped to shared cache. In this paper, we propose an improved cache-hot page allocation method that improves the probability of cache-hot page allocation than the legacy Linux kernel, considering both cache-hot pages mapped to private and shared cache. The proposed method first allocates cache-hot pages mapped to each cores private cache when a page allocation request occurs. If there is no cache-hot page mapped to private cache, this method tries to allocate cache-hot page which mapped to shared cache instead of allocating cache-cold page. This increases the probability that the cache-hot pages are allocated higher than the legacy Linux kernel, and consequently reduce the average page initialization latency. We implemented proposed method on desktop environment based on Linux kernel 4.18.10. Experimental results show that the average page initialization latency is reduced by about 7% compared to legacy Linux kernel.제 1 장 서 론 1 제 1 절 연구의 배경 1 제 2 절 연구의 내용 3 제 3 절 논문의 구성 5 제 2 장 리눅스의 물리 메모리 관리 메커니즘 6 제 1 절 물리 메모리 관리 자료구조 6 제 2 절 페이지 할당 및 해제 핸들러 7 제 3 장 문제 정의 10 제 1 절 문제 설명 10 제 2 절 해결 방법 개관 11 제 4 장 다중 레벨 리스트를 통한 캐시-핫 페이지 할당 기법 12 제 1 절 페이지 해제 14 제 2 절 패이지 할당 15 제 5 장 실험을 통한 검증 17 제 1 절 평균 페이지 초기화 지연 측정 17 제 2 절 오버헤드 측정 18 제 6 장 관련 연구 20 제 7 장 결론 22 참고문헌 23 Abstract 25Maste

SNU Open Repository and Archive

Introducing kernel-level page reuse for high performance computing

Author: Jalby William
Pérache Marc
Valat Sébastien,
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/06/2013
Field of study

International audienc

HAL-CEA

HAL UVSQ

Methods for efficient resource utilization in statistical machine learning algorithms

Author: Kotthaus Helena
Publication venue
Publication date
Field of study

In recent years, statistical machine learning has emerged as a key technique for tackling problems that elude a classic algorithmic approach. One such problem, with a major impact on human life, is the analysis of complex biomedical data. Solving this problem in a fast and efficient manner is of major importance, as it enables, e.g., the prediction of the efficacy of different drugs for therapy selection. While achieving the highest possible prediction quality appears desirable, doing so is often simply infeasible due to resource constraints. Statistical learning algorithms for predicting the health status of a patient or for finding the best algorithm configuration for the prediction require an excessively high amount of resources. Furthermore, these algorithms are often implemented with no awareness of the underlying system architecture, which leads to sub-optimal resource utilization. This thesis presents methods for efficient resource utilization of statistical learning applications. The goal is to reduce the resource demands of these algorithms to meet a given time budget while simultaneously preserving the prediction quality. As a first step, the resource consumption characteristics of learning algorithms are analyzed, as well as their scheduling on underlying parallel architectures, in order to develop optimizations that enable these algorithms to scale to larger problem sizes. For this purpose, new profiling mechanisms are incorporated into a holistic profiling framework. The results show that one major contributor to the resource issues is memory consumption. To overcome this obstacle, a new optimization based on dynamic sharing of memory is developed that speeds up computation by several orders of magnitude in situations when available main memory is the bottleneck, leading to swapping out memory. One important application that can be applied for automated parameter tuning of learning algorithms is model-based optimization. Within a huge search space, algorithm configurations are evaluated to find the configuration with the best prediction quality. An important step towards better managing this search space is to parallelize the search process itself. However, a high runtime variance within the configuration space can cause inefficient resource utilization. For this purpose, new resource-aware scheduling strategies are developed that efficiently map evaluations of configurations to the parallel architecture, depending on their resource demands. In contrast to classical scheduling problems, the new scheduling interacts with the configuration proposal mechanism to select configurations with suitable resource demands. With these strategies, it becomes possible to make use of the full potential of parallel architectures. Compared to established parallel execution models, the results show that the new approach enables model-based optimization to converge faster to the optimum within a given time budget

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung