80 research outputs found
Parallel Modelling Paradigm in Multimedia Applications: Mapping and Scheduling onto a Multi-Processor System-on-Chip Platform
Multi-processor systems have appeared as a promising alternative to face the difficulties of creating even faster uni-processor systems using latest technologies. Emerg-ing design paradigms such as Multiprocessor System-on-a-Chip (MpSoC) offer high levels of performance and flex-ibility and at the same time promise low-cost, reliable and power-efficient implementations. However, the design com-plexity of such systems have increased tremendously. One source of the complexity stems from highly parallel het-erogeneous nature of the underlying hardware architecture, which poses many challenges for mapping of an applica-tion to the architecture. This motivates the development of a unified programming paradigm that facilitates the map-ping by hiding the architectural complexity and exposing the parallel resources of the architecture. To enable de-sign reuse, such a programming paradigm has to support a smooth translation of sequentially-coded software algo-rithms into their parallel implementations. In this paper we address the parallelization of sequential multimedia appli-cations written in C/C++ for their mapping and schedul-ing onto a flexible MpSoC platform. We show that using our approach an architecture-independent multi-threaded model of a MPEG–2 video decoder algorithm can be ob-tained with only few modifications to an existing sequential implementation of the algorithm. 1
Power-gated MOS current mode logic (PG-MCML): a power aware DPA-resistant standard cell library
MOS Current Mode Logic (MCML) is one of the most promising logic style to counteract power analysis attacks. Unfortunately, the static power consumption of MCML standard cells is significantly higher compared to equivalent functions implemented using static CMOS logic. As a result, the use of such a logic style is very limited in portable devices. Paradoxically, these devices are the most sensitive to physical attacks, thus the ones which would benefit more from the adoption of MCML
Dean\u27s Message
Customizable processors augmented with application-specific Instruction Set Extensions (ISEs) have begun to gain traction in recent years. The most effective ISEs include Architecturally Visible Storage (AVS), compiler-controlled memories accessible exclusively to the ISEs. Unfortunately, the usage of AVS memories creates a coherence problem with the data cache. A multiprocessor coherence protocol can solve the problem, however, this is an expensive solution when applied in a uniprocessor context. Instead, we can solve the problem by modifying the cache controller so that the AVS memories function as extra ways of the cache with respect to coherence, but are not generally accessible as extra ways for use under normal software execution. This solution, which we call Virtual Ways is less costly than a hardware coherence protocol, and eliminate coherence messages from the system bus, which improves energy consumption. Moreover, eliminating these messages makes Virtual Ways significantly more robust to performance degradation when there is a significant disparity in clock frequency between the processor and main memory. © 2010 Springer-Verlag
Recommended from our members
An out-of-order load-storequeue for spatial computing
The efficiency of spatial computing depends on the ability to achieve maximal parallelism. This necessitates memory interfaces that can correctly handle memory accesses that arrive in arbitrary order while still respecting data dependencies and ensuring appropriate ordering for semantic correctness. However, a typical memory interface for out-of-order processors (i.e., a load-store queue) cannot immediately meet these requirements: a different allocation policy is needed to achieve out-of-order execution in spatial systems that naturally omit the notion of sequential program order, a fundamental piece of information for correct execution. We show a novel and practical way to organize the allocation for an out-of-order load-store queue for spatial computing. The main idea is to dynamically allocate groups of memory accesses (depending on the dynamic behavior of the application), where the access order within the group is statically predetermined (for instance by a high-level synthesis tool). We detail the construction of our load-store queue and demonstrate on a few practical cases its advantages over standard accelerator-memory interfaces
Scalable and Low Cost Design Approach for Variable Block Size Motion Estimation (VBSME)
Variable block size motion estimation (VBSME) in state-of-theart video coding standards is one of the key features which improves the coding efficiency significantly compared to the previous standards. VBSME hardware design is a challenging task due to its complexity. The processing power requirement for VBSME depends on many factors such as frame size, frame rate and search area. In video coding standards these features are allowed to vary, depending on the requirements of the application. In this paper, a scalable and low cost approach is proposed for designing the VBSME which allows us to tailor the architecture for different applications requirements and implementation targets efficiently. This approach can be used in redesigning of current VBSME architectures to improve their scalability and reduce their design costs. Moreover, as this technique is not block size dependent, it can be employed in designing future coding standards with different block sizes
Recommended from our members
Iterative layering: Optimizing arithmetic circuits by structuring the information flow
Current logic synthesis techniques are ineffective for arithmetic circuits. They perform poorly for XOR-dominated circuits, and those with a high fan-in dependency between inputs and outputs. Many optimizers, therefore employ libraries of hand-optimized arithmetic components, but cannot optimize across component boundaries. To remedy this situation, we introduce a new logic synthesis algorithm which analyzes the input circuit based on its behavior on a set of random assignments of input variables, and outputs a structural implementation of the input circuit. The method presented here is similar to the covering algorithm used in multi-level optimizations [4]; however, it is not based on Sum-of-Product form, or any specific input representation. Our experiments show that our approach is not only capable of automatically reproducing some known architectural implementations without any prior knowledge about the functionality of the circuit, but also, in some cases, it is able to discover completely new designs which we have not seen described in literature. Copyright 2009 ACM
- …