105 research outputs found

    Dynamic Binary Translation for Embedded Systems with Scratchpad Memory

    Get PDF
    Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well. In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead. This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area. StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements

    Executing Hard Real-Time Programs on NAND Flash Memory Considering Read Disturb Errors

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2017. 8. 이창건.죄근 IoT 와 μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ— λŒ€ν•œ 관심이 κΈ‰μ¦ν•˜λ©΄μ„œ NAND ν”Œλž˜μ‰¬ λ©”λͺ¨λ¦¬λ₯Ό μ‚¬μš©ν•˜λŠ” μž₯μΉ˜λ“€ λ˜ν•œ μ¦κ°€ν•˜κ³  μžˆλ‹€. μ΄λŸ¬ν•œ μž₯μΉ˜λ“€μ€ NAND ν”Œλž˜μ‹œ λ©”λͺ¨λ¦¬λ₯Ό μ‚¬μš©ν•¨μœΌλ‘œμ¨ 큰 이득을 얻을 수 μžˆμ§€λ§Œ μ—¬μ „νžˆ μ‹ λ’°μ„± μΈ‘λ©΄μ—μ„œλŠ” ν•΄κ²°λ˜μ§€ μ•Šμ€ μ΄μŠˆλ“€μ΄ μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΄λŸ¬ν•œ μ‹ λ’°μ„± 문제λ₯Ό 극볡할 수 μžˆλŠ” λ°©μ•ˆμ— λŒ€ν•΄ λ…Όμ˜ν•œλ‹€. NAND ν”Œλž˜μ‰¬ λ©”λͺ¨λ¦¬λŠ” 각 νŽ˜μ΄μ§€μ— λŒ€ν•΄ 읽기 λͺ…령을 반볡적으둜 μˆ˜ν–‰ ν•  μžˆλŠ” 물리적 νšŒμˆ˜κ°€ ν•œμ •λ˜μ–΄ 있기 λ•Œλ¬Έμ— 읽기 νšŸμˆ˜κ°€ ν•œκ³„μΉ˜μ— λ„λ‹¬ν•˜κΈ° 전에 μž¬ν• λ‹Ήμ„ ν•΄μ£Όμ–΄μ•Ό ν•˜λŠ” λ¬Έμ œκ°€ μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν”„λ‘œκ·Έλž¨ μ½”λ“œκ°€ μ €μž₯λ˜μ–΄ μžˆλŠ” read-only νŽ˜μ΄μ§€λ₯Ό 읽어 μ½”λ“œλ₯Ό μˆ˜ν–‰ν•˜λŠ” μ‹€μ‹œκ°„ μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ—μ„œ μ‹€μ‹œκ°„ μ œμ•½ 쑰건을 λ§Œμ‘±ν•˜λ©΄μ„œ μž¬ν• λ‹Ήμ„ ν•˜μ—¬ READ DISTURB ERROR λ₯Ό μ€„μ΄λŠ” 기법에 λŒ€ν•΄ μ œμ•ˆν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•˜λŠ” 기법을 κ΅¬ν˜„ν•˜κ³  μ‹€ν—˜ν•¨μœΌλ‘œμ¨, NAND ν”Œλž˜μ‰¬ λ©”λͺ¨λ¦¬μ˜ 읽기 ν•œκ³„μΉ˜μ— λ„λ‹¬ν•˜κΈ° 전에 μž¬ν• λ‹Ήμ΄ 보μž₯됨을 보인닀. λ˜ν•œ μ œμ•ˆν•˜λŠ” 기법을 μ‚¬μš© ν•  경우 μš”κ΅¬λ˜λŠ” RAM 크기가 μ΅œλŒ€ 48% κ°μ†Œν•¨μ„ ν™•μΈν•œλ‹€1 Introduction 1 2 Related works 6 3 Background and problem description 10 3.1 NAND flash memory 10 3.2 HRT-PLRU 13 3.3 Reliability issues of the NAND flash memory 18 3.4 Problem description 28 3.5 System notation 30 4 Approach 31 4.1 Per-task analysis 31 4.2 Convex optimization 37 5 Evaluation 41 6 Future works 46 7 Conclusion 47 Summary (Korean) 48 References 49Maste

    Elevating commodity storage with the SALSA host translation layer

    Full text link
    To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

    Multi-level Hybrid Cache: Impact and Feasibility

    Full text link

    Synergistically Coupling Of Solid State Drives And Hard Disks For Qos-Aware Virtual Memory

    Get PDF
    With significant advantages in capacity, power consumption, and price, solid state disk (SSD) has good potential to be employed as an extension of dynamic random-access memory, such that applications with large working sets could run efficiently on a modestly configured system. While initial results reported in recent works show promising prospects for this use of SSD by incorporating it into the management of virtual memory, frequent writes from write-intensive programs could quickly wear out SSD, making the idea less practical. This thesis makes four contributions towards solving this issue. First, we propose a scheme, HybridSwap, that integrates a hard disk with an SSD for virtual memory man-agement, synergistically achieving the advantages of both. In addition, HybridSwap can constrain performance loss caused by swapping according to user-specified QoS requirements. Second, We develop an efficient algorithm to record memory access history and to identify page access sequences and evaluate their locality. Using a history of page access patterns HybridSwap dynamically creates an out-of-memory virtual memory page layout on the swap space spanning the SSD and hard disk such that random reads are served by SSD and sequential reads are asynchronously served by the hard disk with high efficiency. Third, we build a QoS-assurance mechanism into HybridSwap to demonstrate the flexibility of the system in bounding the performance penalty due to swapping. It allows users to specify a bound on the program stall time due to page faults as a percentage of the program\u27s total run time. Forth, we have implemented HybridSwap in a recent Linux kernel, version 2.6.35.7. Our evaluation with representative benchmarks, such as Memcached for key-value store, and scientific programs from the ALGLIB cross-platform numerical analysis and data processing library, shows that the number of writes to SSD can be reduced by 40% with the system\u27s performance comparable to that with pure SSD swapping, and can satisfy a swapping-related QoS requirement as long as the I/O resource is sufficient

    Optimizing the flash-RAM energy trade-off in deeply embedded systems

    Full text link
    Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the more energy efficient RAM. We implement a novel compiler optimization that exploits the relative efficiency of RAM by statically moving carefully selected basic blocks from flash to RAM. Our technique uses integer linear programming, with an energy cost model to select a good set of basic blocks to place into RAM, without impacting stack or data storage. We evaluate our optimization on a common ARM microcontroller and succeed in reducing the average power consumption by up to 41% and reducing energy consumption by up to 22%, while increasing execution time. A case study is presented, where an application executes code then sleeps for a period of time. For this example we show that our optimization could allow the application to run on battery for up to 32% longer. We also show that for this scenario the total application energy can be reduced, even if the optimization increases the execution time of the code

    VIRTUAL MEMORY ON A MANY-CORE NOC

    Get PDF
    Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds. Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage. The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links. In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory
    • …
    corecore