4 research outputs found

    An Adaptive Coding Pass Scanning Algorithm for Optimal Rate Control in Biomedical Images

    Get PDF
    High-efficiency, high-quality biomedical image compression is desirable especially for the telemedicine applications. This paper presents an adaptive coding pass scanning (ACPS) algorithm for optimal rate control. It can identify the significant portions of an image and discard insignificant ones as early as possible. As a result, waste of computational power and memory space can be avoided. We replace the benchmark algorithm known as postcompression rate distortion (PCRD) by ACPS. Experimental results show that ACPS is preferable to PCRD in terms of the rate distortion curve and computation time

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    Affectation de composantes basée sur des contraintes énergétiques dans une architecture multiprocesseurs en trois dimensions

    Full text link
    La lithographie et la loi de Moore ont permis des avancées extraordinaires dans la fabrication des circuits intégrés. De nos jours, plusieurs systèmes très complexes peuvent être embarqués sur la même puce électronique. Les contraintes de développement de ces systèmes sont tellement grandes qu’une bonne planification dès le début de leur cycle de développement est incontournable. Ainsi, la planification de la gestion énergétique au début du cycle de développement est devenue une phase importante dans la conception de ces systèmes. Pendant plusieurs années, l’idée était de réduire la consommation énergétique en ajoutant un mécanisme physique une fois le circuit créé, comme par exemple un dissipateur de chaleur. La stratégie actuelle est d’intégrer les contraintes énergétiques dès les premières phases de la conception des circuits. Il est donc essentiel de bien connaître la dissipation d’énergie avant l’intégration des composantes dans une architecture d’un système multiprocesseurs de façon à ce que chaque composante puisse fonctionner efficacement dans les limites de ses contraintes thermiques. Lorsqu’une composante fonctionne, elle consomme de l’énergie électrique qui est transformée en dégagement de chaleur. Le but de ce mémoire est de trouver une affectation efficace des composantes dans une architecture de multiprocesseurs en trois dimensions en tenant compte des limites des facteurs thermiques de ce système.Lithography and Moore’s law have led to extraordinary advances in integrated circuits manufacturing. Nowadays, many complex systems can be embedded on the same chip. Development constraints of these systems are so significant that a good planning from the beginning of the development stage is essential. Thus, the planning of energy management at the beginning of the development cycle has become important in the design of these systems. For several years, the idea was to reduce energy consumption by adding a cooling system once the circuit is created, a heat sink for example. The current strategy is to integrate energy constraints in the early stages of circuits design. It is therefore important to know the energy dissipation before the integration of the components in the architecture of a multiprocessor system so that each component can work within the limits of its thermal stresses. When a component is running, it consumes electric energy which is converted into heat. The aim of this thesis is to find an efficient assignment of components in a multiprocessor system architecture in three dimensions, taking into account the limits of its thermal factors

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications
    corecore