258 research outputs found

    Optimizations for real-time implementation of H264/AVC video encoder on DSP processor

    Get PDF
    International audienceReal-time H.264/AVC high definition video encoding represents a challenging workload to most existing programmable processors. The new technologies of programmable processors such as Graphic Processor Unit (GPU) and multicore Digital signal Processor (DSP) offer a very promising solution to overcome these constraints. In this paper, an optimized implementation of H264/AVC video encoder on a single core among the six cores of TMS320C6472 DSP for Common Intermediate Format (CIF) (352x288) resolution is presented in order to move afterwards to a multicore implementation for standard and high definitions (SD,HD).Algorithmic optimization is applied to the intra prediction module to reduce the computational time. Furthermore, based on the DSP architectural features, various structural and hardware optimizations are adopted to minimize external memory access. The parallelism between CPU processing and data transfers is fully exploited using an Enhanced Direct Memory Access controller (EDMA). Experimental results show that the whole proposed optimizations, on a single core running at 700 MHz for CIF resolution, improve the encoding speed by up to 42.91%. They allow reaching the real-time encoding 25 f/s without inducing any Peak Signal to Noise Ratio (PSNR) degradation or bit-rate increase and make possible to achieve real time implementation for SD and HD resolutions when exploiting multicore features

    A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

    Get PDF
    The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

    Efficient Elliptic Curve Cryptography Software Implementation on Embedded Platforms

    Get PDF

    Hierarchical Scheduling for Real-Time Periodic Tasks in Symmetric Multiprocessing

    Get PDF
    In this paper, we present a new hierarchical scheduling framework for periodic tasks in symmetric multiprocessor (SMP) platforms. Partitioned and global scheduling are the two main approaches used by SMP based systems where global scheduling is recommended for overall performance and partitioned scheduling is recommended for hard real-time performance. Our approach combines both the global and partitioned approaches of traditional SMP-based schedulers to provide hard real-time performance guarantees for critical tasks and improved response times for soft real-time tasks. Implemented as part of VxWorks, the results are confirmed using a real-time benchmark application, where response times were improved for soft real-time tasks while still providing hard real-time performance

    Per-task energy accounting in computing systems

    Get PDF
    We present for the first time the concept of per-task energy accounting (PTEA) and relate it to per-task energy metering (PTEM). We show the benefits of supporting both in future computing systems. Using the shared last-level cache (LLC) as an example: (1) We illustrate the complexities in providing PTEM and PTEA; (2) we present an idealized PTEM model and an accurate and low-cost implementation of it; and (3) we introduce a hardware mechanism to provide accurate PTEA in the cache.Postprint (published version

    Integration of generic operating systems in partitioned architectures

    Get PDF
    Tese de mestrado, Engenharia Informática (Arquitectura, Sistemas e Redes de Computadores), Universidade de Lisboa, Faculdade de Ciências, 2009The Integrated Modular Avionics (IMA) specification defines a partitioned environment hosting multiple avionics functions of different criticalities on a shared computing platform. ARINC 653, one of the specifications related to the IMA concept, defines a standard interface between the software applications and the underlying operating system. Both these specifications come from the world of civil aviation, but they are getting interest from space industry partners, who have identified common requirements to those of aeronautic applications. Within the scope of this interest, the AIR architecture was defined, under a contract from the European Space Agency (ESA). AIR provides temporal and spatial segregation, and foresees the use of different operating systems in each partition. Temporal segregation is achieved through the fixed cyclic scheduling of computing resources to partitions. The present work extends the foreseen partition operating system (POS) heterogeneity to generic non-real-time operating systems. This was motivated by documented difficulties in porting applications to RTOSs, and by the notion that proper integration of a non-real-time POS will not compromise the timeliness of critical real-time functions. For this purpose, Linux is used as a case study. An embedded variant of Linux is built and evaluated regarding its adequacy as a POS in the AIR architecture. To guarantee safe integration, a solution based on the Linux paravirtualization interface, paravirt-ops, is proposed. In the course of these activities, the AIR architecture definition was also subject to improvements. The most significant one, motivated by the intended increased POS heterogeneity, was the introduction of a new component, the AIR Partition OS Adaptation Layer (PAL). The AIR PAL provides greater POS-independence to the major components of the AIR architecture, easing their independent certification efforts. Other improvements provide enhanced timeliness mechanisms, such as mode-based schedules and process deadline violation monitoring.A especificação Integrated Modular Avionics (IMA) define um ambiente compartimentado com funções de aviónica de diferentes criticalidades a coexistir numa plataforma computacional. A especificação relacionada ARINC 653 define uma interface padrão entre as aplicações e o sistema operativo subjacente. Ambas as especificações provêm do mundo da aviónica, mas estão a ganhar o interesse de parceiros da indústria espacial, que identificaram requisitos em comum entre as aplicações aeronáuticas e espaciais. No âmbito deste interesse, foi definida a arquitectura AIR, sob contrato da Agência Espacial Europeia (ESA). Esta arquitectura fornece segregação temporale espacial, e prevê o uso de diferentes sistemas operativos em cada partição. A segregação temporal é obtida através do escalonamento fixo e cíclico dos recursos às partições. Este trabalho estende a heterogeneidade prevista entre os sistemas operativos das partições (POS). Tal foi motivado pelas dificuldades documentadas em portar aplicações para sistemas operativos de tempo-real, e pela noção de que a integração apropriada de um POS não-tempo-real não comprometerá a pontualidade das funções críticas de tempo-real. Para este efeito, o Linux foi utilizado como caso de estudo. Uma variante embedida de Linux é construída e avaliada quanto à sua adequação como POS na arquitectura AIR. Para garantir uma integração segura, é proposta uma solução baseada na interface de paravirtualização do Linux, paravirt-ops. No decurso destas actividades, foram também feitas melhorias à definição da arquitectura AIR. O mais significante, motivado pelo pretendido aumento da heterogeneidade entre POSs, foi a introdução de um novo componente, AIR Partition OS Adaptation Layer (PAL). Este componente proporciona aos principais componentes da arquitectura AIR maior independência face ao POS, facilitando os esforços para a sua certificação independente. Outros melhoramentos fornecem mecanismos avançados de pontualidade, como mode-based schedules e monitorização de incumprimento de metas temporais de processos.ESA/ITI - European Space Agency Innovation Triangular Initiative (through ESTEC Contract 21217/07/NL/CB-Project AIR-II) and FCT - Fundação para a Ciência e Tecnologia (through the Multiannual Funding Programme

    Effective data parallel computing on multicore processors

    Get PDF
    The rise of chip multiprocessing or the integration of multiple general purpose processing cores on a single chip (multicores), has impacted all computing platforms including high performance, servers, desktops, mobile, and embedded processors. Programmers can no longer expect continued increases in software performance without developing parallel, memory hierarchy friendly software that can effectively exploit the chip level multiprocessing paradigm of multicores. The goal of this dissertation is to demonstrate a design process for data parallel problems that starts with a sequential algorithm and ends with a high performance implementation on a multicore platform. Our design process combines theoretical algorithm analysis with practical optimization techniques. Our target multicores are quad-core processors from Intel and the eight-SPE IBM Cell B.E. Target applications include Matrix Multiplications (MM), Finite Difference Time Domain (FDTD), LU Decomposition (LUD), and Power Flow Solver based on Gauss-Seidel (PFS-GS) algorithms. These applications are popular computation methods in science and engineering problems and are characterized by unit-stride (MM, LUD, and PFS-GS) or 2-point stencil (FDTD) memory access pattern. The main contributions of this dissertation include a cache- and space-efficient algorithm model, integrated data pre-fetching and caching strategies, and in-core optimization techniques. Our multicore efficient implementations of the above described applications outperform nai¨ve parallel implementations by at least 2x and scales well with problem size and with the number of processing cores

    Memory-Aware Scheduling for Fixed Priority Hard Real-Time Computing Systems

    Get PDF
    As a major component of a computing system, memory has been a key performance and power consumption bottleneck in computer system design. While processor speeds have been kept rising dramatically, the overall computing performance improvement of the entire system is limited by how fast the memory can feed instructions/data to processing units (i.e. so-called memory wall problem). The increasing transistor density and surging access demands from a rapidly growing number of processing cores also significantly elevated the power consumption of the memory system. In addition, the interference of memory access from different applications and processing cores significantly degrade the computation predictability, which is essential to ensure timing specifications in real-time system design. The recent IC technologies (such as 3D-IC technology) and emerging data-intensive real-time applications (such as Virtual Reality/Augmented Reality, Artificial Intelligence, Internet of Things) further amplify these challenges. We believe that it is not simply desirable but necessary to adopt a joint CPU/Memory resource management framework to deal with these grave challenges. In this dissertation, we focus on studying how to schedule fixed-priority hard real-time tasks with memory impacts taken into considerations. We target on the fixed-priority real-time scheduling scheme since this is one of the most commonly used strategies for practical real-time applications. Specifically, we first develop an approach that takes into consideration not only the execution time variations with cache allocations but also the task period relationship, showing a significant improvement in the feasibility of the system. We further study the problem of how to guarantee timing constraints for hard real-time systems under CPU and memory thermal constraints. We first study the problem under an architecture model with a single core and its main memory individually packaged. We develop a thermal model that can capture the thermal interaction between the processor and memory, and incorporate the periodic resource sever model into our scheduling framework to guarantee both the timing and thermal constraints. We further extend our research to the multi-core architectures with processing cores and memory devices integrated into a single 3D platform. To our best knowledge, this is the first research that can guarantee hard deadline constraints for real-time tasks under temperature constraints for both processing cores and memory devices. Extensive simulation results demonstrate that our proposed scheduling can improve significantly the feasibility of hard real-time systems under thermal constraints

    Graphite: A Distributed Parallel Simulator for Multicores

    Get PDF
    This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, or even thousands of cores. It provides high performance for fast design space exploration and software development for future processors. Several techniques are used to achieve this performance including: direct execution, multi-machine distribution, analytical modeling, and lax synchronization. Graphite is capable of accelerating simulations by leveraging several machines. It can distribute simulation of an off-the-shelf threaded application across a cluster of commodity Linux machines with no modification to the source code. It does this by providing a single, shared address space and consistent single-process image across machines. Graphite is designed to be a simulation framework, allowing different component models to be easily replaced to either model different architectures or tradeoff accuracy for performance. We evaluate Graphite from a number of perspectives and demonstrate that it can simulate target architectures containing over 1000 cores on ten 8-core servers. Performance scales well as more machines are added with near linear speedup in many cases. Simulation slowdown is as low as 41x versus native execution for some applications. The Graphite infrastructure and existing models will be released as open-source software to allow the community to simulate their own architectures and extend and improve the framework

    Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives

    Get PDF
    Concurrently exploring both algorithmic and architectural optimizations is a new design paradigm. This survey paper addresses the latest research and future perspectives on the simultaneous development of video coding, processing, and computing algorithms with emerging platforms that have multiple cores and reconfigurable architecture. As the algorithms in forthcoming visual systems become increasingly complex, many applications must have different profiles with different levels of performance. Hence, with expectations that the visual experience in the future will become continuously better, it is critical that advanced platforms provide higher performance, better flexibility, and lower power consumption. To achieve these goals, algorithm and architecture co-design is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper shows that seamless weaving of the development of previously autonomous visual computing algorithms and multicore or reconfigurable architectures will unavoidably become the leading trend in the future of video technology
    corecore