3,263 research outputs found

    High accuracy computation with linear analog optical systems: a critical study

    Get PDF
    High accuracy optical processors based on the algorithm of digital multiplication by analog convolution (DMAC) are studied for ultimate performance limitations. Variations of optical processors that perform high accuracy vector-vector inner products are studied in abstract and with specific examples. It is concluded that the use of linear analog optical processors in performing digital computations with DMAC leads to impractical requirements for the accuracy of analog optical systems and the complexity of postprocessing electronics

    CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information

    Get PDF
    Machine learning has become mainstream across industries. Numerous examples proved the validity of it for security applications. In this work, we investigate how to reverse engineer a neural network by using only power side-channel information. To this end, we consider a multilayer perceptron as the machine learning architecture of choice and assume a non-invasive and eavesdropping attacker capable of measuring only passive side-channel leakages like power consumption, electromagnetic radiation, and reaction time. We conduct all experiments on real data and common neural net architectures in order to properly assess the applicability and extendability of those attacks. Practical results are shown on an ARM CORTEX-M3 microcontroller. Our experiments show that the side-channel attacker is capable of obtaining the following information: the activation functions used in the architecture, the number of layers and neurons in the layers, the number of output classes, and weights in the neural network. Thus, the attacker can effectively reverse engineer the network using side-channel information. Next, we show that once the attacker has the knowledge about the neural network architecture, he/she could also recover the inputs to the network with only a single-shot measurement. Finally, we discuss several mitigations one could use to thwart such attacks.Comment: 15 pages, 16 figure

    ADaPTION: Toolbox and Benchmark for Training Convolutional Neural Networks with Reduced Numerical Precision Weights and Activation

    Full text link
    Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with high-precision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit weights and activations with only 0.8% drop in Top-1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference.Comment: 10 pages, 5 figure

    HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

    Get PDF
    There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

    Performance Analysis of Modified Lifting Based DWT Architecture and FPGA Implementation for Speed and Power

    Get PDF
    Demand for high speed and low power architecture for DWT computation have led to design of novel algorithms and architecture In this paper we design model and implement a hardware efficient high speed and power efficient DWT architecture based on modified lifting scheme algorithm The design is interfaced with SIPO and PISO to reduce the number of I O lines on the FPGA The design is implemented on Spartan III device and is compared with lifting scheme logic The proposed design operates at frequency of 280 MHz and consumes power less than 42 mW The presynthesis and post-synthesis results are verified and suitable test vectors are used in verifying the functionality of the design The design is suitable for real time data processin
    corecore