11,561 research outputs found
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding
Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 Ă 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip
Exploring Processor and Memory Architectures for Multimedia
Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using todayâs most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture
Coarse-grained reconfigurable array architectures
Coarse-Grained ReconïŹgurable Array (CGRA) architectures accelerate the same inner loops that beneïŹt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efïŹciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ïŹexibility, performance, and power-efïŹciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ïŹne-tuning of source code
Online Reinforcement Learning for Dynamic Multimedia Systems
In our previous work, we proposed a systematic cross-layer framework for
dynamic multimedia systems, which allows each layer to make autonomous and
foresighted decisions that maximize the system's long-term performance, while
meeting the application's real-time delay constraints. The proposed solution
solved the cross-layer optimization offline, under the assumption that the
multimedia system's probabilistic dynamics were known a priori. In practice,
however, these dynamics are unknown a priori and therefore must be learned
online. In this paper, we address this problem by allowing the multimedia
system layers to learn, through repeated interactions with each other, to
autonomously optimize the system's long-term performance at run-time. We
propose two reinforcement learning algorithms for optimizing the system under
different design constraints: the first algorithm solves the cross-layer
optimization in a centralized manner, and the second solves it in a
decentralized manner. We analyze both algorithms in terms of their required
computation, memory, and inter-layer communication overheads. After noting that
the proposed reinforcement learning algorithms learn too slowly, we introduce a
complementary accelerated learning algorithm that exploits partial knowledge
about the system's dynamics in order to dramatically improve the system's
performance. In our experiments, we demonstrate that decentralized learning can
perform as well as centralized learning, while enabling the layers to act
autonomously. Additionally, we show that existing application-independent
reinforcement learning algorithms, and existing myopic learning algorithms
deployed in multimedia systems, perform significantly worse than our proposed
application-aware and foresighted learning methods.Comment: 35 pages, 11 figures, 10 table
PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization
Bundle adjustment (BA) is a fundamental optimization technique used in many
crucial applications, including 3D scene reconstruction, robotic localization,
camera calibration, autonomous driving, space exploration, street view map
generation etc. Essentially, BA is a joint non-linear optimization problem, and
one which can consume a significant amount of time and power, especially for
large optimization problems. Previous approaches of optimizing BA performance
heavily rely on parallel processing or distributed computing, which trade
higher power consumption for higher performance. In this paper we propose
{\pi}-BA, the first hardware-software co-designed BA engine on an embedded
FPGA-SoC that exploits custom hardware for higher performance and power
efficiency. Specifically, based on our key observation that not all points
appear on all images in a BA problem, we designed and implemented a
Co-Observation Optimization technique to accelerate BA operations with
optimized usage of memory and computation resources. Experimental results
confirm that {\pi}-BA outperforms the existing software implementations in
terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201
An Adaptive Design Methodology for Reduction of Product Development Risk
Embedded systems interaction with environment inherently complicates
understanding of requirements and their correct implementation. However,
product uncertainty is highest during early stages of development. Design
verification is an essential step in the development of any system, especially
for Embedded System. This paper introduces a novel adaptive design methodology,
which incorporates step-wise prototyping and verification. With each adaptive
step product-realization level is enhanced while decreasing the level of
product uncertainty, thereby reducing the overall costs. The back-bone of this
frame-work is the development of Domain Specific Operational (DOP) Model and
the associated Verification Instrumentation for Test and Evaluation, developed
based on the DOP model. Together they generate functionally valid test-sequence
for carrying out prototype evaluation. With the help of a case study 'Multimode
Detection Subsystem' the application of this method is sketched. The design
methodologies can be compared by defining and computing a generic performance
criterion like Average design-cycle Risk. For the case study, by computing
Average design-cycle Risk, it is shown that the adaptive method reduces the
product development risk for a small increase in the total design cycle time.Comment: 21 pages, 9 figure
- âŠ