94 research outputs found

    Classification-Based Optimization of Dynamic Dataflow Programs

    Get PDF
    International audienceThis chapter reviews dataflow programming as a whole and presents a classification-based methodology to bridge the gap between predictable and dynamic dataflow modeling in order to achieve expressiveness of the programming language as well as efficiency of the implementation. The authors conduct experiments across three MPEG video decoders including one based on the new High Efficiency Video Coding standard. Those dataflow-based video decoders are executed onto two different platforms: a desktop processor and an embedded platform composed of interconnected and tiny Very Long Instruction Word-style processors. The authors show that the fully automated transformations presented can result in a 80% gain in speed compared to runtime scheduling in the more favorable case

    VLSI architecture design approaches for real-time video processing

    Get PDF
    This paper discusses the programmable and dedicated approaches for real-time video processing applications. Various VLSI architecture including the design examples of both approaches are reviewed. Finally, discussions of several practical designs in real-time video processing applications are then considered in VLSI architectures to provide significant guidelines to VLSI designers for any further real-time video processing design works

    Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives

    Get PDF
    Concurrently exploring both algorithmic and architectural optimizations is a new design paradigm. This survey paper addresses the latest research and future perspectives on the simultaneous development of video coding, processing, and computing algorithms with emerging platforms that have multiple cores and reconfigurable architecture. As the algorithms in forthcoming visual systems become increasingly complex, many applications must have different profiles with different levels of performance. Hence, with expectations that the visual experience in the future will become continuously better, it is critical that advanced platforms provide higher performance, better flexibility, and lower power consumption. To achieve these goals, algorithm and architecture co-design is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper shows that seamless weaving of the development of previously autonomous visual computing algorithms and multicore or reconfigurable architectures will unavoidably become the leading trend in the future of video technology

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Baseband analog front-end and digital back-end for reconfigurable multi-standard terminals

    Get PDF
    Multimedia applications are driving wireless network operators to add high-speed data services such as Edge (E-GPRS), WCDMA (UMTS) and WLAN (IEEE 802.11a,b,g) to the existing GSM network. This creates the need for multi-mode cellular handsets that support a wide range of communication standards, each with a different RF frequency, signal bandwidth, modulation scheme etc. This in turn generates several design challenges for the analog and digital building blocks of the physical layer. In addition to the above-mentioned protocols, mobile devices often include Bluetooth, GPS, FM-radio and TV services that can work concurrently with data and voice communication. Multi-mode, multi-band, and multi-standard mobile terminals must satisfy all these different requirements. Sharing and/or switching transceiver building blocks in these handsets is mandatory in order to extend battery life and/or reduce cost. Only adaptive circuits that are able to reconfigure themselves within the handover time can meet the design requirements of a single receiver or transmitter covering all the different standards while ensuring seamless inter-interoperability. This paper presents analog and digital base-band circuits that are able to support GSM (with Edge), WCDMA (UMTS), WLAN and Bluetooth using reconfigurable building blocks. The blocks can trade off power consumption for performance on the fly, depending on the standard to be supported and the required QoS (Quality of Service) leve

    Performance evaluation of MPEG-4 Video encoder on Adres

    Get PDF
    Curs 2006-2007Actualment un típic embedded system (ex. telèfon mòbil) requereix alta qualitat per portar a terme tasques com codificar/descodificar a temps real; han de consumir poc energia per funcionar hores o dies utilitzant bateries lleugeres; han de ser el suficientment flexibles per integrar múltiples aplicacions i estàndards en un sol aparell; han de ser dissenyats i verificats en un període de temps curt tot i l’augment de la complexitat. Els dissenyadors lluiten contra aquestes adversitats, que demanen noves innovacions en arquitectures i metodologies de disseny. Coarse-grained reconfigurable architectures (CGRAs) estan emergent com a candidats potencials per superar totes aquestes dificultats. Diferents tipus d’arquitectures han estat presentades en els últims anys. L’alta granularitat redueix molt el retard, l’àrea, el consum i el temps de configuració comparant amb les FPGAs. D’altra banda, en comparació amb els tradicionals processadors coarse-grained programables, els alts recursos computacionals els permet d’assolir un alt nivell de paral•lelisme i eficiència. No obstant, els CGRAs existents no estant sent aplicats principalment per les grans dificultats en la programació per arquitectures complexes. ADRES és una nova CGRA dissenyada per I’Interuniversity Micro-Electronics Center (IMEC). Combina un processador very-long instruction word (VLIW) i un coarse-grained array per tenir dues opcions diferents en un mateix dispositiu físic. Entre els seus avantatges destaquen l’alta qualitat, poca redundància en les comunicacions i la facilitat de programació. Finalment ADRES és un patró enlloc d’una arquitectura concreta. Amb l’ajuda del compilador DRESC (Dynamically Reconfigurable Embedded System Compile), és possible trobar millors arquitectures o arquitectures específiques segons l’aplicació. Aquest treball presenta la implementació d’un codificador MPEG-4 per l’ADRES. Mostra l’evolució del codi per obtenir una bona implementació per una arquitectura donada. També es presenten les característiques principals d’ADRES i el seu compilador (DRESC). Els objectius són de reduir al màxim el nombre de cicles (temps) per implementar el codificador de MPEG-4 i veure les diferents dificultats de treballar en l’entorn ADRES. Els resultats mostren que els cícles es redueixen en un 67% comparant el codi inicial i final en el mode VLIW i un 84% comparant el codi inicial en VLIW i el final en mode CGA.Nowadays, a typical embedded system requires high performance to perform tasks such as video encoding/decoding at run-time. It should consume little energy to work hours or even days using a lightweight battery. It should be flexible enough to integrate multiple applications and standards in one single device. It has to be designed and verified in short time-to-market despite substantially increased complexity. The designers are struggling to meet these huge challenges, which call for innovations of both architectures and design methodology. Coarse-grained reconfigurable architectures (CGRAs) are emerging as potential candidates to meet the above challenges. Many of them were proposed in recent years. This coarse granularity greatly reduces delay, area, power and configuration time compared with FPGAs. On the other hand, compared with traditional "coarse-grained" programmable processors, their massive computational resources enable them to achieve high parallelism and efficiency. However, existing CGRAs have yet been widely adopted mainly because of programming difficulty for such complex architecture. ADRES is a novel CGRA designed by Interuniversity Micro-Electronics Center (IMEC). It tightly couples a very-long instruction word (VLIW) processor and a coarse-grained array by providing two functional views on the same physical resources. It brings advantages such as high performance, low communication overhead and easiness of programming. Finally, ADRES is a template instead of a concrete architecture. With the retargetable compilation support from DRESC (Dynamically Reconfigurable Embedded System Compile), architectural exploration becomes possible to discover better architectures or design domain-specific architectures. In this thesis, a performance of an MPEG-4 encoder in ADRES is presented. The thesis shows the code evolution to obtain a good implementation for a given architecture. The main features of ADRES and its compiler (DRESC) are presented. The objectives are to reduce as much as possible the amount of cycles (time) spent to encode video in MPEG-4 and test different issues working with ADRES environment. The cycles decrease a 67% comparing initial and final code in VLIW and 84% between initial VLIW and CGA mode.Director/a: Moisès Serra i SerraSupervisor: Eric Delfoss

    Processor Enhancements for Media Streaming Applications

    Get PDF
    The development of more processing demanding applications on the Internet (video broadcasting) on one hand and the popularity of recent devices at the user level (digital cameras, wireless videophones, ...) on the other hand introduce challenges at several levels. Today, such devices present processing capabilities and bandwidth settings that are inefficient to manage scalable QoS requirements in a typical media delivery framework. In this paper, we present an impact study of such a scalable data representation optimized for QoS (Matching Pursuit 3D algorithms) on processor architectures to achieve the best performance and power efficiency. A review of state of the art techniques for processor architecture enhancement let us expect promising opportunities from the latest developments in the reconfigurable computing research field. We present here the first design steps of an efficient reconfigurable coprocessor especially designed to cope with future video delivery and multimedia processing requirements. Architecture perspectives are proposed with respect to low development cost constraints, backward compatibilty and easy coprocessor usage using an original strategy based on a hardware/software codesign methodolog

    Implementation And Optimizaton Of Real-time H.264 Baseline Encoder On Tms320dm642 Dsp

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Günümüzde sayısal video kodlama sayısal gözetim sistemleri, video konferans, mobil uygulamalar ve video yayını gibi bir çok uygulamada zorunlu hale gelmiştir. Uluslararası bir video sıkıştırma standardı olan H.264/MPEG-4 bölüm 10, daha önceki standartlara göre kodlama verimini iyileştirmek amacıyla geliştirilmiştir. Fakat, bu kodlama geliştirmesi beraberinde kodlama karmaşıklığının da artmasına yol açmaktadır. Bu tez çalışmasında Texas Instruments TMS320DM642 sayısal sinyal işleyici üzerinde H.264 temel profil kodlayıcı gerçeklenmiştir. DM642 DSP çekirdeği üzerindeki gerçek zamanlı H.264/AVC kodlayıcı uygulaması hata esnekliği araçları ve çeyrek piksel hareket dengeleme dışında standart tüm H.264/AVC temel profil kodlama araçlarını sunmaktadır. Çeyrek piksel hareket dengelem yerine, tüm parlaklılık ve renklik bileşenleri için tam sayı ve yarım piksel pozisyonlarında hareket kestirim ve dengeleme gerçeklenmiştir. Kullanılan DM642 DSP çekirdeği platformu, 2-seviyeli bellek/önbellek aşama düzenine sahip ve VLIW içeren yüksek performanslı sayısal işlemci olarak tasarlanmıştır. Sunulan H.264 temel kodlayıcı sistemin gerçeklenmesi ve eniyilemesi bu tezin konusudur. Üstelik, algoritma bazlı, mimari ve bellek stratejilerini içeren eniyileme çalışma fazları detaylarıyla açıklanmaktadır. H.264/AVC video kodlayıcının hem geliştirme ortamında hem de DM642 EVM donanım ortamında çalışması doğrulanmıştır. Kısaca, kodlayıcı sisteme giriş olan CIF çözünürlükte sıkıştırılmamış YUV video dizisi H.264 Annex-B dosya biçiminde ve de ekrana video çıktı verilerek sıkıştırılmaktadır. Ek olarak, kodlayıcı çıktısı H.264 referans yazılımla doğruluğu kontrol edilmiş ve uyumluluğu kanıtlanmıştır.Recently, digital video coding is mandatory in many applications such as digital surveillance systems, video conferencing, mobile applications as well as video broadcasts. The H.264/MPEG-4 Part 10, an international video compression standard, is developed for improving the coding efficiency compared to previous standards. However, the coding improvement comes with an increase in coding complexity. In this thesis, an H.264 baseline profile encoder is implemented on Texas Instruments TMS320DM642 digital signal processor. The real-time implementation of the H.264/AVC encoder on DM642 DSP core offers most of the standard H.264/AVC baseline profile coding tools except error resiliency tools and quarter-pel motion estimation. Instead of quarter-pel motion compensation, integer and half pixel position motion estimation and compensation for all luminance and chrominance components are implemented. The target platform, DM64 DSP core, is designed as a high-performance digital media processor with two-level memory/cache hierarchy and VLIW architecture. The subject of the thesis is H.264 baseline encoder system realization and optimization on the target platform. Moreover, the study of optimization phases covering algorithmic, architectural and memory strategies are clarified in details. The H.264/AVC encoder system is verified both to execute on the development workstation and DM642 EVM (Evaluation Module) hardware platform. Briefly, the uncompressed input of a YUV video sequence with CIF resolution to the encoder system is compressed to H.264 Annex-B file format and displayed on screen. Additionally, the encoder output is verified with H.264 reference software and the compliancy is proven.Yüksek LisansM.Sc

    Enhancing a Neurosurgical Imaging System with a PC-based Video Processing Solution

    Get PDF
    This work presents a PC-based prototype video processing application developed to be used with a specific neurosurgical imaging device, the OPMI® PenteroTM operating microscope, in the Department of Neurosurgery of Helsinki University Central Hospital at Töölö, Helsinki. The motivation for implementing the software was the lack of some clinically important features in the imaging system provided by the microscope. The imaging system is used as an online diagnostic aid during surgery. The microscope has two internal video cameras; one for regular white light imaging and one for near-infrared fluorescence imaging, used for indocyanine green videoangiography. The footage of the microscope’s current imaging mode is accessed via the composite auxiliary output of the device. The microscope also has an external high resolution white light video camera, accessed via a composite output of a separate video hub. The PC was chosen as the video processing platform for its unparalleled combination of prototyping and high-throughput video processing capabilities. A thorough analysis of the platform and efficient video processing methods was conducted in the thesis and the results were used in the design of the imaging station. The features found feasible during the project were incorporated into a video processing application running on a GNU/Linux distribution Ubuntu. The clinical usefulness of the implemented features was ensured beforehand by consulting the neurosurgeons using the original system. The most significant shortcomings of the original imaging system were mended in this work. The key features of the developed application include: live streaming, simultaneous streaming and recording, and playing back of upto two video streams. The playback mode provides full media player controls, with a frame-by-frame precision rewinding, in an intuitive and responsive interface. A single view and a side-by-side comparison mode are provided for the streams. The former gives more detail, while the latter can be used, for example, for before-after and anatomic-angiographic comparisons.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format
    corecore