570 research outputs found

    POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

    Full text link
    Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poetComment: Proceedings of the 39th International Conference on Machine Learning 2022 (ICML 2022

    Three main pain points from todayโ€™s Smartphones

    Get PDF
    The massive usage of smartphones that has occurred last years has been a boon for the telecommunications industry. Smartphone is blossoming freely and have been widely used in our daily life, learning, and working. The services, applications, and connectivity that the industry provides is something that consumers talk, blog, and tweet about with passion. The average Smartphone consumes only 10% of the data traffic that a laptop does, but smartphones, largely driven by the applications that consumers love, can connect to the network in the background hundreds or even thousands of times daily, pushing signaling traffic higher than network planners ever imagined. Unfortunately, with the widespread introduction of the smartphones, mobile network operators are confronted with the new challenges: congested network resources, worsening network KPIโ€™s and increasing complains from end users. What users want: The ingredients for a good quality experience can be summarized in four words: convenience, immediacy, simplicity and reliability. Users want a service that is efficient, predictable and easy to use. Factors contributing to a poor quality experience include lengthy loading times, no access, crashing, slowing down, poor battery life or anything potentially complicated or confusing. So the task before us all is to understand how handsets, their batteries, networks, and applications work together, to show/ determine and try by optimizing the proper performance balance for each so that both end user experience and network efficiency can be optimized. Keywords: mobile apps, battery, signaling, smartphone

    C-RAM: Breaking Mobile Device Memory Barriers Using the Cloud

    Get PDF
    Mobile applications are constrained by the available memory of mobile devices. We present C-RAM, a system that uses cloud-based memory to extend the memory of mobile devices. It splits application state and its associated computation between a mobile device and a cloud node to allow applications to consume more memory, while minimising the performance impact. C-RAM thus enables developers to realise new applications or port legacy desktop applications with a large memory footprint to mobile platforms without explicitly designing them to account for memory limitations. To handle network failures with partitioned application state, C-RAM uses a new snapshot-based fault tolerance mechanism in which changes to remote memory objects are periodically backed up to the device. After failure, or when network usage exceeds a given limit, the device rolls back execution to continue from the last snapshot. C-RAM supports local execution with an application state that exceeds the available device memory through a user-level virtual memory: objects are loaded on-demand from snapshots in flash memory. Our C-RAM prototype supports Objective-C applications on the unmodified iOS platform. With C-RAM, applications can consume 10ร— more memory than the device capacity, with a negligible impact on application performance. In some cases, C-RAM even achieves a significant speed-up in execution time (up to 9.7ร—)

    ๋ฉ”๋ชจ๋ฆฌ ์Šค์™‘ ํŒจํ„ด ๋ถ„์„์„ ํ†ตํ•œ ์Šค์™‘ ์‹œ์Šคํ…œ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์—ผํ—Œ์˜.The use of memory is one of the key parts of modern computer architecture (Von Neumann architecture) but when considering limited memory, it could be the most lethal part at the same time. Advances in hardware and software are making rapid strides in areas such as Big Data, HPC and machine learning and facing new turning points, while the use of memory increases along with those advances. In the server environment, various programs share resources which leads to a shortage of resources. Memory is one of those resources and needs to be managed. When the system is out of memory, the operating system evicts some of the pages out to storage and then loads the requested pages in memory. Given that the storage performance is slower than the memory, swap-induced delay is one of the critical issues in the overall performance degradation. Therefore, we designed and implemented a swpTracer to provide visualization to trace the swap in/out movement. To check the generality of the tool, we used mlock to optimize 429.mcf of Spec CPU 2006 based on the hint from swpTracer. The optimized program executes 2 to 3 times faster than the original program in a memory scarce environment. The scope of the performance improvement with previous system calls decreases when the memory limit increases. To sustain the improvement, we build a swap- prefetch to read ahead the swapped-out pages. The optimized application with swpTracer and swap-prefetch consistently exceeds the performance of the original code by 1.5x.๋ฉ”๋ชจ๋ฆฌ์˜ ์‚ฌ์šฉ์€ ํ˜„๋Œ€ ์ปดํ“จํ„ฐ ์•„ํ‚คํ…์ฒ˜(ํฐ ๋…ธ์ด๋งŒ ์•„ํ‚คํ…์ณ)์˜ ํ•ต์‹ฌ ๋ถ€๋ถ„ ์ค‘ ํ•˜ ๋‚˜์ด๊ธฐ ๋•Œ๋ฌธ์—, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•œ ํ™˜๊ฒฝ์€ ์„ฑ๋Šฅ์— ์น˜๋ช…์ ์ธ๋‹ค. ํ•˜๋“œ์›จ์–ด์™€ ์†Œํ”„ํŠธ์›จ ์–ด์˜ ๋ฐœ์ „์œผ๋กœ ๋น…๋ฐ์ดํ„ฐ, HPC, ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ๊ฐ™์€ ๋ถ„์•ผ๋“ค์ด ๋น ๋ฅธ ์†๋„๋กœ ๋ฐœ์ „ํ•˜์—ฌ ๊ทธ์— ๋”ฐ๋ผ ๋ฉ”๋ชจ๋ฆฌ์˜ ์‚ฌ์šฉ๋Ÿ‰๋„ ์ฆ๊ฐ€ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ œํ•œ๋œ ์ž„๋ฒ ๋””๋“œ ํ™˜๊ฒฝ ์ด๋‚˜, ์—ฌ๋Ÿฌ ์ž‘์—…์ด ๋™์‹œ์— ์ˆ˜ํ–‰๋˜๋Š” ์„œ๋ฒ„์—์„œ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์œผ๋กœ ์ž‘์—…์ด ์ค‘๋‹จ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์‹œ์Šคํ…œ์ด๋ฉ”๋ชจ๋ฆฌ๊ฐ€๋ถ€์กฑํ•˜๋ฉด์šด์˜์ฒด์ œ๋Š”์ผ๋ถ€ํŽ˜์ด์ง€๋ฅผ์ €์žฅ์†Œ๋กœ๋‚ด๋ณด๋‚ธ๋‹ค์Œ ์š”์ฒญ๋œ ํŽ˜์ด์ง€๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋“œํ•œ๋‹ค. ์Šคํ† ๋ฆฌ์ง€ ์„ฑ๋Šฅ์ด ๋ฉ”๋ชจ๋ฆฌ๋ณด๋‹ค ๋Š๋ฆฌ๋‹ค๋Š” ์ ์— ์„œ ์Šค์™‘์— ์˜ํ•œ ์ง€์—ฐ์€ ์ „๋ฐ˜์ ์ธ ์„ฑ๋Šฅ ์ €ํ•˜์˜ ์ค‘์š”ํ•œ ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋”ฐ๋ผ์„œ ์Šค์™‘์ด ํ”„๋กœ๊ทธ๋žจ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์— ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋„๋ก ํ”„๋กœ๊ทธ๋žจ์˜ ์Šค์™‘ ๋ฐœ์ƒ ์ถ”์ด๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์Šค์™‘ ๋ฐœ์ƒ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋„๋ก ํžŒํŠธ๋ฅผ ์ฃผ๋Š” ๋„๊ตฌ์ธ swpTracer๋ฅผ ์„ค๊ณ„, ์‹ค ํ–‰ํ–ˆ๋‹ค. mlock์„ ์‚ฌ์šฉํ•˜์—ฌ Spec CPU 2006 ๋ฒค์น˜๋งˆํฌ ์ค‘ 429.mcf์— ์ ์šฉํ–ˆ์„ ๋•Œ ๊ธฐ์กด ํ”„๋กœ๊ทธ๋žจ ๋Œ€๋น„ 2, 3 ๋ฐฐ ์„ฑ๋Šฅ์ด ๋นจ๋ผ์กŒ๋‹ค. ๊ธฐ์กด์˜ ์‹œ์Šคํ…œ ์ฝœ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”ํ–ˆ์„ ๋•Œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์‚ด์ง ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ์—๋Š” ๋น„์Šทํ•œ์„ฑ๋Šฅ์„๋ณด์—ฌ์ฃผ์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 50% ๋ถ€์กฑํ•œ์ˆœ๊ฐ„๋ถ€ํ„ฐ์„ฑ๋Šฅํ–ฅ์ƒํญ์ด์ค„์—ˆ๋‹ค. ์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์Šค์™‘ ์•„์›ƒ ๋˜์—ˆ๋˜ ํŽ˜์ด์ง€๋ฅผ ๋ฏธ๋ฆฌ ์ฝ์–ด๋‘๋Š” swap-prefetch๋ฅผ ๊ตฌํ˜„ํ–ˆ๋‹ค. ๋ฐฐ์—ด์„ 3๋ฒˆ ํšก๋‹จํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ๋Œ€์ƒ์œผ๋กœ ๋ฐฐ์—ด์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ•˜๋ฉด์„œ swap-prefetch์˜ ์„ฑ๋Šฅ์„ ์‹œํ—˜ํ–ˆ๋‹ค. ์›๋ณธ ์ฝ”๋“œ์™€ ์‹œ์Šคํ…œ ํ•จ์ˆ˜์ธ madvise๋ฅผ ์‚ฌ์šฉ ํ–ˆ์„ ๋•Œ๋ณด๋‹ค ํ‰๊ท ์ ์œผ๋กœ 1.5 ์ข‹์•„์กŒ๋‹ค. ๋˜, swap-prefetch๋ฅผ ๋‹ค๋ฅธ ์‹œ์Šคํ…œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ์™€ mlock๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ํ‰๊ท  1.25๋ฐฐ ์„ฑ๋Šฅ์ด ๋นจ๋ผ์กŒ๋‹ค.Abstract Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 Page Reclamation Policy . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Linux Swap Management . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Linux System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3 Design and Implementation 8 3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2.1 Kernel Level . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2.2 Application Level . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 4 Evaluation 15 4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.1 Generality of swpTracer . . . . . . . . . . . . . . . . . . . 16 4.2.2 Memory Optimization Method Comparison . . . . . . . . 17 Chapter 5 Related Work 20 Chapter 6 Conclusion 22 Bibliography ์ดˆ๋ก 28Maste

    Scrolling vs Paging: Reading Performance and Preference of Reading Modes in Long-form Online News

    Get PDF
    This study explores the impact of scrolling and dynamic pagination in long-form online documents on reader performance and reader experience. Previous research has produced mixed results, indicating no difference between modes, or a positive effect favouring scrolling. Recent advances in web standards have enabled simpler, dynamic, performant methods of pagination to tailor content responsively to any screen, meriting renewed study in this area. This paper uses one such method to load subsequent online news pages instantly without buffering. In an online browser experiment with 38 participants, an increase in reading speed in the scrolling mode was found at a level of significance. This follows previous research which has suggested that while a scrolling presentation style exacts extra demands on working memory capacity (WMC), many current web users have developed compensatory strategies and cognitive flexibility for navigating scrolling web documents

    VIRTUAL MEMORY ON A MANY-CORE NOC

    Get PDF
    Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds. Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage. The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links. In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory
    • โ€ฆ
    corecore