450 research outputs found

    A Genetic-algorithm-based Approach to the Design of DCT Hardware Accelerators

    Get PDF
    As modern applications demand an unprecedented level of computational resources, traditional computing system design paradigms are no longer adequate to guarantee significant performance enhancement at an affordable cost. Approximate Computing (AxC) has been introduced as a potential candidate to achieve better computational performances by relaxing non-critical functional system specifications. In this article, we propose a systematic and high-abstraction-level approach allowing the automatic generation of near Pareto-optimal approximate configurations for a Discrete Cosine Transform (DCT) hardware accelerator. We obtain the approximate variants by using approximate operations, having configurable approximation degree, rather than full-precise ones. We use a genetic searching algorithm to find the appropriate tuning of the approximation degree, leading to optimal tradeoffs between accuracy and gains. Finally, to evaluate the actual HW gains, we synthesize non-dominated approximate DCT variants for two different target technologies, namely, Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). Experimental results show that the proposed approach allows performing a meaningful exploration of the design space to find the best tradeoffs in a reasonable time. Indeed, compared to the state-of-the-art work on approximate DCT, the proposed approach allows an 18% average energy improvement while providing at the same time image quality improvement

    Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

    Get PDF
    Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat

    Bi-criteria Pipeline Mappings for Parallel Image Processing

    Get PDF
    Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonistic criteria should be optimized, such as throughput and latency (or a combination). Typical applications include digital image processing, where images are processed in steady-state mode. In this paper, we study the mapping of a particular image processing application, the JPEG encoding. Mapping pipelined JPEG encoding onto parallel platforms is useful for instance for encoding Motion JPEG images. As the bi-criteria mapping problem is NP-complete, we concentrate on the evaluation and performance of polynomial heuristics

    Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

    Get PDF
    Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280%~440% increase of processing throughput and 75%~80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications

    1994 Science Information Management and Data Compression Workshop

    Get PDF
    This document is the proceedings from the 'Science Information Management and Data Compression Workshop,' which was held on September 26-27, 1994, at the NASA Goddard Space Flight Center, Greenbelt, Maryland. The Workshop explored promising computational approaches for handling the collection, ingestion, archival and retrieval of large quantities of data in future Earth and space science missions. It consisted of eleven presentations covering a range of information management and data compression approaches that are being or have been integrated into actual or prototypical Earth or space science data information systems, or that hold promise for such an application. The workshop was organized by James C. Tilton and Robert F. Cromp of the NASA Goddard Space Flight Center

    Error tolerant multimedia stream processing: There's plenty of room at the top (of the system stack)

    Get PDF
    There is a growing realization that the expected fault rates and energy dissipation stemming from increases in CMOS integration will lead to the abandonment of traditional system reliability in favor of approaches that offer reliability to hardware-induced errors across the application, runtime support, architecture, device and integrated-circuit (IC) layers. Commercial stakeholders of multimedia stream processing (MSP) applications, such as information retrieval, stream mining systems, and high-throughput image and video processing systems already feel the strain of inadequate system-level scaling and robustness under the always-increasing user demand. While such applications can tolerate certain imprecision in their results, today's MSP systems do not support a systematic way to exploit this aspect for cross-layer system resilience. However, research is currently emerging that attempts to utilize the error-tolerant nature of MSP applications for this purpose. This is achieved by modifications to all layers of the system stack, from algorithms and software to the architecture and device layer, and even the IC digital logic synthesis itself. Unlike conventional processing that aims for worst-case performance and accuracy guarantees, error-tolerant MSP attempts to provide guarantees for the expected performance and accuracy. In this paper we review recent advances in this field from an MSP and a system (layer-by-layer) perspective, and attempt to foresee some of the components of future cross-layer error-tolerant system design that may influence the multimedia and the general computing landscape within the next ten years. © 1999-2012 IEEE

    Lossy compression and real-time geovisualization for ultra-low bandwidth telemetry from untethered underwater vehicles

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Master of Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2008Oceanographic applications of robotics are as varied as the undersea environment itself. As underwater robotics moves toward the study of dynamic processes with multiple vehicles, there is an increasing need to distill large volumes of data from underwater vehicles and deliver it quickly to human operators. While tethered robots are able to communicate data to surface observers instantly, communicating discoveries is more difficult for untethered vehicles. The ocean imposes severe limitations on wireless communications; light is quickly absorbed by seawater, and tradeoffs between frequency, bitrate and environmental effects result in data rates for acoustic modems that are routinely as low as tens of bits per second. These data rates usually limit telemetry to state and health information, to the exclusion of mission-specific science data. In this thesis, I present a system designed for communicating and presenting science telemetry from untethered underwater vehicles to surface observers. The system's goals are threefold: to aid human operators in understanding oceanographic processes, to enable human operators to play a role in adaptively responding to mission-specific data, and to accelerate mission planning from one vehicle dive to the next. The system uses standard lossy compression techniques to lower required data rates to those supported by commercially available acoustic modems (O(10)-O(100) bits per second). As part of the system, a method for compressing time-series science data based upon the Discrete Wavelet Transform (DWT) is explained, a number of low-bitrate image compression techniques are compared, and a novel user interface for reviewing transmitted telemetry is presented. Each component is motivated by science data from a variety of actual Autonomous Underwater Vehicle (AUV) missions performed in the last year.National Science Foundation Center for Subsurface Sensing and Imaging (CenSSIS ERC

    The Iray Light Transport Simulation and Rendering System

    Full text link
    While ray tracing has become increasingly common and path tracing is well understood by now, a major challenge lies in crafting an easy-to-use and efficient system implementing these technologies. Following a purely physically-based paradigm while still allowing for artistic workflows, the Iray light transport simulation and rendering system allows for rendering complex scenes by the push of a button and thus makes accurate light transport simulation widely available. In this document we discuss the challenges and implementation choices that follow from our primary design decisions, demonstrating that such a rendering system can be made a practical, scalable, and efficient real-world application that has been adopted by various companies across many fields and is in use by many industry professionals today
    • …
    corecore