649 research outputs found

    Skalabilna implementacija dekodera po normi MPEG koriŔtenjem tokovnog programskog jezika

    Get PDF
    In this paper, we describe a scalable and portable parallelized implementation of a MPEG decoder using a streaming computation paradigm, tailored to new generations of multi--core systems. A novel, hybrid approach towards parallelization of both new and legacy applications is described, where only data--intensive and performance--critical parts are implemented in the streaming domain. An architecture--independent \u27StreamIt\u27 language is used for design, optimization and implementation of parallelized segments, while the developed \u27StreamGate\u27 interface provides a communication mechanism between the implementation domains. The proposed hybrid approach was employed in re--factoring of a reference MPEG video decoder implementation; identifying the most performance--critical segments and re-implementing them in \u27StreamIt\u27 language, with \u27StreamGate\u27 interface as a communication mechanism between the host and streaming kernel. We evaluated the scalability of the decoder with respect to the number of cores, video frame formats, sizes and decomposition. Decoder performance was examined in the presence of different processor load configurations and with respect to the number of simultaneously processed frames.U ovom radu opisujemo skalabilnu i prenosivu implementaciju dekodera po normi MPEG ostvarenu koriÅ”tenjem paradigme tokovnog računarstva, prilagođenu novim generacijama viÅ”ejezgrenih računala. Opisan je novi, hibridni pristup paralelizaciji novih ili postojećih aplikacija, gdje se samo podatkovno intenzivni i računski zahtjevni dijelovi implementiraju u tokovnoj domeni. Arhitekturno neovisni jezik StreamIt koristi se za oblikovanje, optimiranje i izvedbu paraleliziranih segmenata aplikacije, dok razvijeno sučelje \u27StreamGate\u27 omogućava komunikaciju između domena implementacije. Predloženi hibridni pristup razvoju paraleliziranih aplikacija iskoriÅ”ten je u preoblikovanju referentnog dekodera video zapisa po normi MPEG; identificirani su računski zahtjevni segmenti aplikacije i ponovno implementirani u jeziku StreamIt, sa sučeljem \u27StreamGate\u27 kao poveznicom između slijedne i tokovne domene. Ispitivana su svojstva skalabilnosti s obzirom na ciljani broj jezgri, format video zapisa i veličinu okvira te dekompoziciju ulaznih podataka. Svojstva dekodera  su praćena u prisustvu različitih opterećenja ispitnog računala, i s obzirom na broj istovremeno obrađivanih okvira

    Parallel HEVC Decoding on Multi- and Many-core Architectures : A Power and Performance Analysis

    Get PDF
    The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially

    Design and Validation of a Software Defined Radio Testbed for DVB-T Transmission

    Get PDF
    This paper describes the design and validation of a Software Defined Radio (SDR) testbed, which can be used for Digital Television transmission using the Digital Video Broadcasting - Terrestrial (DVB-T) standard. In order to generate a DVB-T-compliant signal with low computational complexity, we design an SDR architecture that uses the C/C++ language and exploits multithreading and vectorized instructions. Then, we transmit the generated DVB-T signal in real time, using a common PC equipped with multicore central processing units (CPUs) and a commercially available SDR modem board. The proposed SDR architecture has been validated using fixed TV sets, and portable receivers. Our results show that the proposed SDR architecture for DVB-T transmission is a low-cost low-complexity solution that, in the worst case, only requires less than 22% of CPU load and less than 170 MB of memory usage, on a 3.0 GHz Core i7 processor. In addition, using the same SDR modem board, we design an off-line software receiver that also performs time synchronization and carrier frequency offset estimation and compensation

    Efficiently mapping high-performance early vision algorithms onto multicore embedded platforms

    Get PDF
    The combination of low-cost imaging chips and high-performance, multicore, embedded processors heralds a new era in portable vision systems. Early vision algorithms have the potential for highly data-parallel, integer execution. However, an implementation must operate within the constraints of embedded systems including low clock rate, low-power operation and with limited memory. This dissertation explores new approaches to adapt novel pixel-based vision algorithms for tomorrow's multicore embedded processors. It presents : - An adaptive, multimodal background modeling technique called Multimodal Mean that achieves high accuracy and frame rate performance with limited memory and a slow-clock, energy-efficient, integer processing core. - A new workload partitioning technique to optimize the execution of early vision algorithms on multi-core systems. - A novel data transfer technique called cat-tail dma that provides globally-ordered, non-blocking data transfers on a multicore system. By using efficient data representations, Multimodal Mean provides comparable accuracy to the widely used Mixture of Gaussians (MoG) multimodal method. However, it achieves a 6.2x improvement in performance while using 18% less storage than MoG while executing on a representative embedded platform. When this algorithm is adapted to a multicore execution environment, the new workload partitioning technique demonstrates an improvement in execution times of 25% with only a 125 ms system reaction time. It also reduced the overall number of data transfers by 50%. Finally, the cat-tail buffering technique reduces the data-transfer latency between execution cores and main memory by 32.8% over the baseline technique when executing Multimodal Mean. This technique concurrently performs data transfers with code execution on individual cores, while maintaining global ordering through low-overhead scheduling to prevent collisions.Ph.D.Committee Chair: Wills, Scott; Committee Co-Chair: Wills, Linda; Committee Member: Bader, David; Committee Member: Davis, Jeff; Committee Member: Hamblen, James; Committee Member: Lanterman, Aaro
    • ā€¦
    corecore