31 research outputs found

    Recursive algorithm, architectures and FPGA implementation of the two-dimensional discrete cosine transform

    Get PDF
    In this paper, a new recursive algorithm and two types of circuit architectures are presented for the computation of the two dimensional discrete cosine transform (2-D DCT). The new algorithm permits to compute the 2-D DCT by a simple procedure of the 1-D recursive calculations involving only cosine coefficients. The recursive kernel for the proposed algorithm contains a small number of operations. Also, it requires a smaller number of pre-computed data compared to many of existing algorithms in the same category. The kernel can be easily implemented in a simple circuit block with a short critical delay path. In order to evaluate the performance improvement resulting from the new algorithm, an architecture for the 2-D DCT designed by direct mapping from the computation structure of the proposed algorithm has been implemented in a FPGA board. The results show that the reduction of the hardware consumption can easily reach 25% and the clock frequency can increase 17% compared to a system implementing a recently reported 2-D DCT recursive algorithm. For a further reduction of the hardware, another architecture has been proposed for the same 2-D DCT computation. Using one recursive computation block to perform different functions, this architecture needs only approximately one half of the hardware that is required in the first architecture, which has been confirmed by a FPGA implementation

    Remote Sensing Data Compression

    Get PDF
    A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin

    Técnicas de compresión de imágenes hiperespectrales sobre hardware reconfigurable

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 18-12-2020Sensors are nowadays in all aspects of human life. When possible, sensors are used remotely. This is less intrusive, avoids interferces in the measuring process, and more convenient for the scientist. One of the most recurrent concerns in the last decades has been sustainability of the planet, and how the changes it is facing can be monitored. Remote sensing of the earth has seen an explosion in activity, with satellites now being launched on a weekly basis to perform remote analysis of the earth, and planes surveying vast areas for closer analysis...Los sensores aparecen hoy en día en todos los aspectos de nuestra vida. Cuando es posible, de manera remota. Esto es menos intrusivo, evita interferencias en el proceso de medida, y además facilita el trabajo científico. Una de las preocupaciones recurrentes en las últimas décadas ha sido la sotenibilidad del planeta, y cómo menitoirzar los cambios a los que se enfrenta. Los estudios remotos de la tierra han visto un gran crecimiento, con satélites lanzados semanalmente para analizar la superficie, y aviones sobrevolando grades áreas para análisis más precisos...Fac. de InformáticaTRUEunpu

    Approximate Computing for Energy Efficiency

    Get PDF

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

    Get PDF
    Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76 μ\muJ for the 4 points system (IDCT and IDST) and 3.06 μ\muJ for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA

    Efficient design-space exploration of custom instruction-set extensions

    Get PDF
    Customization of processors with instruction set extensions (ISEs) is a technique that improves performance through parallelization with a reasonable area overhead, in exchange for additional design effort. This thesis presents a collection of novel techniques that reduce the design effort and cost of generating ISEs by advancing automation and reconfigurability. In addition, these techniques maximize the perfomance gained as a function of the additional commited resources. Including ISEs into a processor design implies development at many levels. Most prior works on ISEs solve separate stages of the design: identification, selection, and implementation. However, the interations between these stages also hold important design trade-offs. In particular, this thesis addresses the lack of interaction between the hardware implementation stage and the two previous stages. Interaction with the implementation stage has been mostly limited to accurately measuring the area and timing requirements of the implementation of each ISE candidate as a separate hardware module. However, the need to independently generate a hardware datapath for each ISE limits the flexibility of the design and the performance gains. Hence, resource sharing is essential in order to create a customized unit with multi-function capabilities. Previously proposed resource-sharing techniques aggressively share resources amongst the ISEs, thus minimizing the area of the solution at any cost. However, it is shown that aggressively sharing resources leads to large ISE datapath latency. Thus, this thesis presents an original heuristic that can be parameterized in order to control the degree of resource sharing amongst a given set of ISEs, thereby permitting the exploration of the existing implementation trade-offs between instruction latency and area savings. In addition, this thesis introduces an innovative predictive model that is able to quickly expose the optimal trade-offs of this design space. Compared to an exhaustive exploration of the design space, the predictive model is shown to reduce by two orders of magnitude the number of executions of the resource-sharing algorithm that are required in order to find the optimal trade-offs. This thesis presents a technique that is the first one to combine the design spaces of ISE selection and resource sharing in ISE datapath synthesis, in order to offer the designer solutions that achieve maximum speedup and maximum resource utilization using the available area. Optimal trade-offs in the design space are found by guiding the selection process to favour ISE combinations that are likely to share resources with low speedup losses. Experimental results show that this combined approach unveils new trade-offs between speedup and area that are not identified by previous selection techniques; speedups of up to 238% over previous selection thecniques were obtained. Finally, multi-cycle ISEs can be pipelined in order to increase their throughput. However, it is shown that traditional ISE identification techniques do not allow this optimization due to control flow overhead. In order to obtain the benefits of overlapping loop executions, this thesis proposes to carefully insert loop control flow statements into the ISEs, thus allowing the ISE to control the iterations of the loop. The proposed ISEs broaden the scope of instruction-level parallelism and obtain higher speedups compared to traditional ISEs, primarily through pipelining, the exploitation of spatial parallelism, and reducing the overhead of control flow statements and branches. A detailed case study of a real application shows that the proposed method achieves 91% higher speedups than the state-of-the-art, with an area overhead of less than 8% in hardware implementation

    Towards Computational Efficiency of Next Generation Multimedia Systems

    Get PDF
    To address throughput demands of complex applications (like Multimedia), a next-generation system designer needs to co-design and co-optimize the hardware and software layers. Hardware/software knobs must be tuned in synergy to increase the throughput efficiency. This thesis provides such algorithmic and architectural solutions, while considering the new technology challenges (power-cap and memory aging). The goal is to maximize the throughput efficiency, under timing- and hardware-constraints

    Parallelism and the software-hardware interface in embedded systems

    Get PDF
    This thesis by publications addresses issues in the architecture and microarchitecture of next generation, high performance streaming Systems-on-Chip through quantifying the most important forms of parallelism in current and emerging embedded system workloads. The work consists of three major research tracks, relating to data level parallelism, thread level parallelism and the software-hardware interface which together reflect the research interests of the author as they have been formed in the last nine years. Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vectorlSIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes. The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapSUlate that functionality.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore