105 research outputs found
Dynamic Binary Translation for Embedded Systems with Scratchpad Memory
Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well.
In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead.
This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area.
StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements
Executing Hard Real-Time Programs on NAND Flash Memory Considering Read Disturb Errors
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ 곡과λν μ»΄ν¨ν°κ³΅νλΆ, 2017. 8. μ΄μ°½κ±΄.μ£κ·Ό IoT μ μλ² λλ μμ€ν
μ λν κ΄μ¬μ΄ κΈμ¦νλ©΄μ NAND
νλμ¬ λ©λͺ¨λ¦¬λ₯Ό μ¬μ©νλ μ₯μΉλ€ λν μ¦κ°νκ³ μλ€. μ΄λ¬ν
μ₯μΉλ€μ NAND νλμ λ©λͺ¨λ¦¬λ₯Ό μ¬μ©ν¨μΌλ‘μ¨ ν° μ΄λμ μ»μ
μ μμ§λ§ μ¬μ ν μ λ’°μ± μΈ‘λ©΄μμλ ν΄κ²°λμ§ μμ μ΄μλ€μ΄
μλ€. λ³Έ λ
Όλ¬Έμμλ μ΄λ¬ν μ λ’°μ± λ¬Έμ λ₯Ό 극볡ν μ μλ
λ°©μμ λν΄ λ
Όμνλ€. NAND νλμ¬ λ©λͺ¨λ¦¬λ κ° νμ΄μ§μ λν΄
μ½κΈ° λͺ
λ Ήμ λ°λ³΅μ μΌλ‘ μν ν μλ 물리μ νμκ° νμ λμ΄
μκΈ° λλ¬Έμ μ½κΈ° νμκ° νκ³μΉμ λλ¬νκΈ° μ μ μ¬ν λΉμ
ν΄μ£Όμ΄μΌ νλ λ¬Έμ κ° μλ€. λ³Έ λ
Όλ¬Έμμλ νλ‘κ·Έλ¨ μ½λκ°
μ μ₯λμ΄ μλ read-only νμ΄μ§λ₯Ό μ½μ΄ μ½λλ₯Ό μννλ μ€μκ°
μλ² λλ μμ€ν
μμ μ€μκ° μ μ½ μ‘°κ±΄μ λ§μ‘±νλ©΄μ μ¬ν λΉμ
νμ¬ READ DISTURB ERROR λ₯Ό μ€μ΄λ κΈ°λ²μ λν΄ μ μνλ€.
λ³Έ λ
Όλ¬Έμμ μ μνλ κΈ°λ²μ ꡬννκ³ μ€νν¨μΌλ‘μ¨, NAND
νλμ¬ λ©λͺ¨λ¦¬μ μ½κΈ° νκ³μΉμ λλ¬νκΈ° μ μ μ¬ν λΉμ΄ 보μ₯λ¨μ
보μΈλ€. λν μ μνλ κΈ°λ²μ μ¬μ© ν κ²½μ° μꡬλλ RAM
ν¬κΈ°κ° μ΅λ 48% κ°μν¨μ νμΈνλ€1 Introduction 1
2 Related works 6
3 Background and problem description 10
3.1 NAND flash memory 10
3.2 HRT-PLRU 13
3.3 Reliability issues of the NAND flash memory 18
3.4 Problem description 28
3.5 System notation 30
4 Approach 31
4.1 Per-task analysis 31
4.2 Convex optimization 37
5 Evaluation 41
6 Future works 46
7 Conclusion 47
Summary (Korean) 48
References 49Maste
Elevating commodity storage with the SALSA host translation layer
To satisfy increasing storage demands in both capacity and performance,
industry has turned to multiple storage technologies, including Flash SSDs and
SMR disks. These devices employ a translation layer that conceals the
idiosyncrasies of their mediums and enables random access. Device translation
layers are, however, inherently constrained: resources on the drive are scarce,
they cannot be adapted to application requirements, and lack visibility across
multiple devices. As a result, performance and durability of many storage
devices is severely degraded.
In this paper, we present SALSA: a translation layer that executes on the
host and allows unmodified applications to better utilize commodity storage.
SALSA supports a wide range of single- and multi-device optimizations and,
because is implemented in software, can adapt to specific workloads. We
describe SALSA's design, and demonstrate its significant benefits using
microbenchmarks and case studies based on three applications: MySQL, the Swift
object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
Recommended from our members
Multi-level Hybrid Cache: Impact and Feasibility
Storage class memories, including flash, has been attracting much attention as promising candidates fitting into in today's enterprise storage systems. In particular, since the cost and performance characteristics of flash are in-between those of DRAM and hard disks, it has been considered by many studies as an secondary caching layer underneath main memory cache. However, there has been a lack of studies of correlation and interdependency between DRAM and flash caching. This paper views this problem as a special form of multi-level caching, and tries to understand the benefits of this multi-level hybrid cache hierarchy. We reveal that significant costs could be saved by using Flash to reduce the size of DRAM cache, while maintaing the same performance. We also discuss design challenges of using flash in the caching hierarchy and present potential solutions
Synergistically Coupling Of Solid State Drives And Hard Disks For Qos-Aware Virtual Memory
With significant advantages in capacity, power consumption, and price, solid state disk (SSD) has good potential to be employed as an extension of dynamic random-access memory, such that applications with large working sets could run efficiently on a modestly configured system. While initial results reported in recent works show promising prospects for this use of SSD by incorporating it into the management of virtual memory, frequent writes from write-intensive programs could quickly wear out SSD, making the idea less practical.
This thesis makes four contributions towards solving this issue. First, we propose a scheme, HybridSwap, that integrates a hard disk with an SSD for virtual memory man-agement, synergistically achieving the advantages of both. In addition, HybridSwap can constrain performance loss caused by swapping according to user-specified QoS requirements.
Second, We develop an efficient algorithm to record memory access history and to identify page access sequences and evaluate their locality. Using a history of page access patterns HybridSwap dynamically creates an out-of-memory virtual memory page layout on the swap space spanning the SSD and hard disk such that random reads are served by SSD and sequential reads are asynchronously served by the hard disk with high efficiency.
Third, we build a QoS-assurance mechanism into HybridSwap to demonstrate the flexibility of the system in bounding the performance penalty due to swapping. It allows users to specify a bound on the program stall time due to page faults as a percentage of the program\u27s total run time.
Forth, we have implemented HybridSwap in a recent Linux kernel, version 2.6.35.7. Our evaluation with representative benchmarks, such as Memcached for key-value store, and scientific programs from the ALGLIB cross-platform numerical analysis and data processing library, shows that the number of writes to SSD can be reduced by 40% with the system\u27s performance comparable to that with pure SSD swapping, and can satisfy a swapping-related QoS requirement as long as the I/O resource is sufficient
Optimizing the flash-RAM energy trade-off in deeply embedded systems
Deeply embedded systems often have the tightest constraints on energy
consumption, requiring that they consume tiny amounts of current and run on
batteries for years. However, they typically execute code directly from flash,
instead of the more energy efficient RAM. We implement a novel compiler
optimization that exploits the relative efficiency of RAM by statically moving
carefully selected basic blocks from flash to RAM. Our technique uses integer
linear programming, with an energy cost model to select a good set of basic
blocks to place into RAM, without impacting stack or data storage.
We evaluate our optimization on a common ARM microcontroller and succeed in
reducing the average power consumption by up to 41% and reducing energy
consumption by up to 22%, while increasing execution time. A case study is
presented, where an application executes code then sleeps for a period of time.
For this example we show that our optimization could allow the application to
run on battery for up to 32% longer. We also show that for this scenario the
total application energy can be reduced, even if the optimization increases the
execution time of the code
VIRTUAL MEMORY ON A MANY-CORE NOC
Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds.
Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage.
The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links.
In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory
- β¦