729 research outputs found

    Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications

    Full text link
    Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors by adjusting the imprecision (distortion) of computation. Our approach is based on scalar quantization, followed by two forms of tight packing in floating-point (one of which is proposed in this paper) that allow for concurrent calculation of multiple results. We illustrate how our approach can operate as an optional pre- and post-processing layer for off-the-shelf optimized convolution routines. This is useful for multimedia applications that are tolerant to processing imprecision and for cases where the input signals are inherently noisy (error tolerant multimedia applications). Indicative experimental results with a digital music matching system and an MPEG-7 audio descriptor system demonstrate that the proposed approach offers up to 175% increase in processing throughput against optimized (full-precision) convolution with virtually no effect in the accuracy of the results. Based on marginal statistics of the input data, it is also shown how the throughput and distortion can be adjusted per input block of samples under constraints on the signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201

    Failure Mitigation in Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

    Full text link
    A new roll-forward technique is proposed that recovers from any single fail-stop failure in MM integer data streams (M3M\geq3) when undergoing linear, sesquilinear or bijective (LSB) operations, such as: scaling, additions/subtractions, inner or outer vector products and permutations. In the proposed approach, the MM input integer data streams are linearly superimposed to form MM numerically entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The output results can be extracted from any M1M-1 entangled output streams by additions and arithmetic shifts, thereby guaranteeing robustness to a fail-stop failure in any single stream computation. Importantly, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via convolution operations. Our analysis and experiments reveal that the proposed approach incurs only 1.8%1.8\% to 2.8%2.8\% reduction in processing throughput in comparison to the failure-intolerant approach. This overhead is 9 to 14 times smaller than that of the equivalent checksum-based method. Thus, our proposal can be used in distributed systems and unreliable processor hardware, or safety-critical applications, where robustness against fail-stop failures becomes a necessity.Comment: Proc. 21st IEEE International On-Line Testing Symposium (IOLTS 2015), July 2015, Halkidiki, Greec

    Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

    Get PDF
    A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on MM integer data streams (M3M\geq3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the MM input integer data streams are linearly superimposed to form MM numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the MM entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the MM streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03%0.03\% to 7%7\% reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in fault-generating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201

    Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

    Get PDF
    Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280%~440% increase of processing throughput and 75%~80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications

    Core Failure Mitigation in Integer Sum-of-Product Computations on Cloud Computing Systems

    Get PDF
    The decreasing mean-time-to-failure estimates in cloud computing systems indicate that multimedia applications running on such environments should be able to mitigate an increasing number of core failures at runtime. We propose a new roll-forward failure-mitigation approach for integer sumof-product computations, with emphasis on generic matrix multiplication (GEMM)and convolution/crosscorrelation (CONV) routines. Our approach is based on the production of redundant results within the numerical representation of the outputs via the use of numerical packing.This differs fromall existing roll-forward solutions that require a separate set of checksum (or duplicate) results. Our proposal imposes 37.5% reduction in the maximum output bitwidth supported in comparison to integer sum-ofproduct realizations performed on 32-bit integer representations which is comparable to the bitwidth requirement of checksummethods for multiple core failure mitigation. Experiments with state-of-the-art GEMM and CONV routines running on a c4.8xlarge compute-optimized instance of amazon web services elastic compute cloud (AWS EC2) demonstrate that the proposed approach is able to mitigate up to one quadcore failure while achieving processing throughput that is: 1) comparable to that of the conventional, failure-intolerant, integer GEMM and CONV routines, 2) substantially superior to that of the equivalent roll-forward failure-mitigation method based on checksum streams. Furthermore, when used within an image retrieval framework deployed over a cluster of AWS EC2 spot (i.e., low-cost albeit terminatable) instances, our proposal leads to: 1) 16%-23% cost reduction against the equivalent checksum-based method and 2) more than 70% cost reduction against conventional failure-intolerant processing on AWS EC2 on-demand (i.e., highercost albeit guaranteed) instances

    Generalized Numerical Entanglement For Reliable Linear, Sesquilinear And Bijective Operations On Integer Data Streams

    Get PDF
    We propose a new technique for the mitigation of fail-stop failures and/or silent data corruptions (SDCs) within linear, sesquilinear or bijective (LSB) operations on M integer data streams (M ⩾ 3). In the proposed approach, the M input streams are linearly superimposed to form M numerically entangled integer data streams that are stored in-place of the original inputs, i.e., no additional (aka. “checksum”) streams are used. An arbitrary number of LSB operations can then be performed in M processing cores using these entangled data streams. The output results can be extracted from any (M-K) entangled output streams by additions and arithmetic shifts, thereby mitigating K fail-stop failures (K ≤ ⌊(M-1)/2 ⌋ ), or detecting up to K SDCs per M-tuple of outputs at corresponding in-stream locations. Therefore, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. Our proposal is validated within an Amazon EC2 instance (Haswell architecture with AVX2 support) via integer matrix product operations. Our analysis and experiments for failstop failure mitigation and SDC detection reveal that the proposed approach incurs 0.75% to 37.23% reduction in processing throughput in comparison to the equivalent errorintolerant processing. This overhead is found to be up to two orders of magnitude smaller than that of the equivalent checksum-based method, with increased gains offered as the complexity of the performed LSB operations is increasing. Therefore, our proposal can be used in distributed systems, unreliable multicore clusters and safety-critical applications, where robustness against failures and SDCs is a necessity

    Reliable Linear, Sesquilinear, and Bijective Operations on Integer Data Streams Via Numerical Entanglement

    Get PDF
    A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on MM integer data streams ( M3M \geq 3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, MM input integer data streams are linearly superimposed to form MM numerically-entangled integer data streams that are stored in-place of the original inputs. LSB operations can then be performed directly using these entangled data streams. The results are extracted from the MM entangled output streams by additions and arithmetic shifts. Any soft errors affecting one disentangled output stream are guaranteed to be detectable via a post-computation reliability check. Additionally, when utilizing a separate processor core for each stream, our approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entire process is linearly related to the number of inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor via several types of operations: fast Fourier transforms, convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03% to 7% reduction in processing throughput for numerous LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in fault-generating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy

    Error tolerant multimedia stream processing: There's plenty of room at the top (of the system stack)

    Get PDF
    There is a growing realization that the expected fault rates and energy dissipation stemming from increases in CMOS integration will lead to the abandonment of traditional system reliability in favor of approaches that offer reliability to hardware-induced errors across the application, runtime support, architecture, device and integrated-circuit (IC) layers. Commercial stakeholders of multimedia stream processing (MSP) applications, such as information retrieval, stream mining systems, and high-throughput image and video processing systems already feel the strain of inadequate system-level scaling and robustness under the always-increasing user demand. While such applications can tolerate certain imprecision in their results, today's MSP systems do not support a systematic way to exploit this aspect for cross-layer system resilience. However, research is currently emerging that attempts to utilize the error-tolerant nature of MSP applications for this purpose. This is achieved by modifications to all layers of the system stack, from algorithms and software to the architecture and device layer, and even the IC digital logic synthesis itself. Unlike conventional processing that aims for worst-case performance and accuracy guarantees, error-tolerant MSP attempts to provide guarantees for the expected performance and accuracy. In this paper we review recent advances in this field from an MSP and a system (layer-by-layer) perspective, and attempt to foresee some of the components of future cross-layer error-tolerant system design that may influence the multimedia and the general computing landscape within the next ten years. © 1999-2012 IEEE

    Fault Tolerant Integer Data Computations: Algorithms and Applications

    Get PDF
    As computing units move to higher transistor integration densities and computing clusters become highly heterogeneous, studies begin to predict that, rather than being exceptions, data corruptions in memory and processor failures are likely to become more prevalent. It has therefore become imperative to improve the reliability of systems in the face of increasing soft error probabilities in memory and computing logic units of silicon CMOS integrated chips. This thesis introduces a new class of algorithms for fault tolerance in compute-intensive linear and sesquilinear (“one-and-half-linear”) data computations on integer data inputs within high-performance computing systems. The key difference between the proposed algorithms and existing fault tolerance methods is the elimination of the traditional requirement for additional hardware resources for system reliability. The first contribution of this thesis is in the detection of hardware-induced errors in integer matrix products. The proposed method of numerical packing for detecting a single error within a quadruple of matrix outputs is described in Chapter 2. The chapter includes analytic calculations of the proposed method’s computational complexity and reliability. Experimental results show that the proposed algorithm incurs comparable execution time overhead to existing algorithms for the detection and correction of a limited number of errors within generic matrix multiplication (GEMM) outputs. On the other hand, numerical packing becomes substantially more efficient in the mitigation of multiple errors. The achieved execution time gain of numerical packing is further analyzed with respect to its energy saving equivalent, thus paving the way for a new class of silent data corruption (SDC) mitigation method for integer matrix products that are fast, energy efficient, and highly reliable. A further advancement of the proposed numerical packing approach for the mitigation of core/processor failures in computing clusters (a.k.a., failstop failures) is described in Chapter 3 . The key advantage of this new packing approach is the ability to tolerate processor failures for all classes of sum-of-product computations. Because multimedia applications running on cloud computing platforms are now required to mitigate an increasing number of failures and outages at runtime, we analyze the efficiency of numerical packing within an image retrieval framework deployed over a cluster of AWS EC2 spot (i.e., low-cost albeit terminable) instances. Our results show that more than 70% reduction of cost can be achieved in comparison to conventional failure-intolerant processing based on AWS EC2 on-demand (i.e., higher-cost albeit guaranteed) instances. Finally, beyond numerical packing, we present a second approach for reliability in the case of linear and sesquilinear integer data computations by generalizing the recently-proposed concept of numerical entanglement. The proposed approach is capable of recovering from multiple fail-stop failures in a parallel/distributed computing environment. We present theoretical analysis of the computational and bit-width requirements of the proposed method in comparison to existing methods of checksum generation and processing. Our experiments with integer matrix products show that the proposed approach incurs 1.72% − 37.23% reduction in processing throughput in comparison to failure-intolerant processing while allowing for the mitigation of multiple fail-stop failures without the use of additional computing resources
    corecore