19 research outputs found
STUN: Reinforcement-Learning-Based Optimization of Kernel Scheduler Parameters for Static Workload Performance
Modern Linux operating systems are being used in a wide range of fields, from small IoT embedded devices to supercomputers. However, most machines use the default Linux scheduler parameters implemented for general-purpose environments. The problem is that the Linux scheduler cannot utilize the features of the various hardware and software environments, and it is therefore, difficult to achieve optimal performance in the machines. In this paper, we propose STUN, an automatic scheduler optimization framework. STUN modifies the five scheduling policies of the Linux kernel and 10 parameters automatically to optimize for each workload environment. STUN decreases the training time and enhances the efficiency through a filtering mechanism and training reward algorithms. Using STUN, users can optimize the performance of their machines at the OS scheduler level without manual control of the scheduler. STUN showed an execution time and improved FPS of 18.3% and 22.4% on a face detection workload, respectively. In addition, STUN showed 26.97%, 54.42%, and 256.13% performance improvements for microbenchmarks with 4, 44, and 120 cores for each
Exploiting GPUs in Virtual Machine for BioCloud
Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment
Development of Behavior-Profilers for Multimedia Consumer Electronics
In spite of the rapid improvement of hardware performance, debugging and optimization still remain as important procedures for developing consumer electronics embedded systems due to the manufacturing cost and the product quality. However, because the properties of consumer electronics, systems are significantly different from the traditional computing systems, the required functionalities of behavior-profilers for the multimedia consumer electronics systems have to be newly defined. We analyze the desirable characteristics of the behavior profilers for multimedia consumer electronics systems and based on the analysis results we also implement a novel profiler tool set which consists of light-weight profiler components and remotely executed GUI client programs. The implemented profiler tool set is independent to the processor architecture and able to analyze the whole system layers from operating systems to functions inside user-level applications. The effectiveness of our, tool set was verified by actually performing optimization of a commodity, digital TV system.(1)close3
A superblock-based flash translation layer for nand flash memory
In NAND flash-based storage systems, an intermediate software layer called a flash translation layer (FTL) is usually employed to hide the erase-before-write characteristics of NAND flash memory. This paper proposes a novel superblockbased FTL scheme, which combines a set of adjacent logical blocks into a superblock. In the proposed FTL scheme, superblocks are mapped at coarse granularity, while pages inside the superblock are mapped freely at fine granularity to any location in several physical blocks. To reduce extra storage and flash memory operations, the fine-grain mapping information is stored in the spare area of NAND flash memory. This hybrid mapping technique has the flexibility provided by fine-grain address translation, while reducing the memory overhead to the level of coarse-grain address translation. Our experimental results show that the proposed FTL scheme decreases the garbage collection overhead up to 40 % compared to previous FTL schemes
A group-based wear-leveling algorithm for large-capacity flash memory storage systems
Although NAND flash memory has become one of the most popular storage media for portable devices, it has a serious problem with respect to lifetime. Each block of NAND flash memory has a limited number of program/erase cycles, usually 10,000–100,000, and data in a block become unreliable after the limit. For this reason, distributing erase operations evenly across the whole flash memory media is an important concern in designing flash memory storage systems. In this paper, we propose a memory-efficient group-based wear-leveling algorithm. Our group-based algorithm achieves a small memory footprint by grouping several logically sequential blocks and managing only the summary information for each group. We also propose an effective group summary structure and a method to reduce unnecessary wearleveling operations in order to enhance the wear-leveling performance. The evaluation results show that our group-based algorithm consumes only 8.75 % of memory space compared to the previous scheme that manages per-block information, while showing roughly the same wear-leveling performance
SSD-HDD-Hybrid Virtual Disk in Consolidated Environments
With the prevalence of multi-core processors and cloud computing, the server consolidation using virtualization has increasingly expanded its territory, and the degree of consolidation has also become higher. As a large number of virtual machines individually require their own disks, the storage capacity of a data center could be exceeded. To address this problem, copy-on-write storage systems allow virtual machines to initially share a template disk image. This paper proposes a hybrid copy-on-write storage system that combines solid-state disks and hard disk drives for consolidated environments. In order to take advantage of both devices, the proposed scheme places a read-only template disk image on a solid-state disk, while write operations are isolated to the hard disk drive. In this hybrid architecture, the disk I/O performance benefits from the fast read access of the solid-state disk, especially for random reads, precluding write operations from the degrading flash memory performance. We show that the hybrid virtual disk, in terms of performance and cost, is more effective than the pure copy-on-write disks for a highly consolidated system
KAL: kernel-assisted non-invasive memory leak tolerance with a general-purpose memory allocator
Memory leaks are a continuing problem in the software developed with programming languages, such as C and C++. A recent approach adopted by some researchers is to tolerate leaks in the software application and to reclaim the leaked memory by use of specially constructed memory allocation routines. However, such routines replace the usual general-purpose memory allocator and tend to be less efficient in speed and in memory utilization. We propose a new scheme which coexists with the existing memory allocation routines and which reclaims memory leaks. Our scheme identifies and reclaims leaked memory at the kernel level. There are some major advantages to our approach: (I) the application software does not need to be modified; (2) the application does not need to be suspended while leaked memory is reclaimed; (3) a remote host can be used to identify the leaked memory, thus minimizing impact on the application program's performance; and (4) our scheme does not degrade the service availability of the application while detecting and reclaiming memory leaks. We have implemented a prototype that works with the GNU C library and with the Linux kernel. Our prototype has been tested and evaluated with various real-world applications. Our results show that the computational overhead of our approach is around 2% of that incurred by the conventional memory allocator in terms of throughput and average response time. We also verified that the prototype successfully suppressed address space expansion caused by memory leaks when the applications are run on synthetic workloads.close
Energy Reduction in Consolidated Servers through Memory-Aware Virtual Machine Scheduling
Increasing energy consumption in server consolidation environments leads to high maintenance costs for data centers. Main memory, no less than processor, is a major energy consumer in this environment. This paper proposes a technique for reducing memory energy consumption using virtual machine scheduling in multicore systems. We devise several heuristic scheduling algorithms by using a memory power simulator, which we designed and implemented. We also implement the biggest cover set first (BCSF) scheduling algorithm in the working server system. Through extensive simulation and implementation experiments, we observe the effectiveness of the memory-aware virtual machine scheduling in saving memory energy. In addition, we find out that power-aware memory management is essential to reduce the memory energy consumption