11,561 research outputs found

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained ReconïŹgurable Array (CGRA) architectures accelerate the same inner loops that beneïŹt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efïŹciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ïŹ‚exibility, performance, and power-efïŹciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ïŹne-tuning of source code

    Online Reinforcement Learning for Dynamic Multimedia Systems

    Full text link
    In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori. In practice, however, these dynamics are unknown a priori and therefore must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner, and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and inter-layer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform as well as centralized learning, while enabling the layers to act autonomously. Additionally, we show that existing application-independent reinforcement learning algorithms, and existing myopic learning algorithms deployed in multimedia systems, perform significantly worse than our proposed application-aware and foresighted learning methods.Comment: 35 pages, 11 figures, 10 table

    PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization

    Full text link
    Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, space exploration, street view map generation etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large optimization problems. Previous approaches of optimizing BA performance heavily rely on parallel processing or distributed computing, which trade higher power consumption for higher performance. In this paper we propose {\pi}-BA, the first hardware-software co-designed BA engine on an embedded FPGA-SoC that exploits custom hardware for higher performance and power efficiency. Specifically, based on our key observation that not all points appear on all images in a BA problem, we designed and implemented a Co-Observation Optimization technique to accelerate BA operations with optimized usage of memory and computation resources. Experimental results confirm that {\pi}-BA outperforms the existing software implementations in terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201

    An Adaptive Design Methodology for Reduction of Product Development Risk

    Full text link
    Embedded systems interaction with environment inherently complicates understanding of requirements and their correct implementation. However, product uncertainty is highest during early stages of development. Design verification is an essential step in the development of any system, especially for Embedded System. This paper introduces a novel adaptive design methodology, which incorporates step-wise prototyping and verification. With each adaptive step product-realization level is enhanced while decreasing the level of product uncertainty, thereby reducing the overall costs. The back-bone of this frame-work is the development of Domain Specific Operational (DOP) Model and the associated Verification Instrumentation for Test and Evaluation, developed based on the DOP model. Together they generate functionally valid test-sequence for carrying out prototype evaluation. With the help of a case study 'Multimode Detection Subsystem' the application of this method is sketched. The design methodologies can be compared by defining and computing a generic performance criterion like Average design-cycle Risk. For the case study, by computing Average design-cycle Risk, it is shown that the adaptive method reduces the product development risk for a small increase in the total design cycle time.Comment: 21 pages, 9 figure
    • 

    corecore