98 research outputs found

    Attentive Decision-making and Dynamic Resetting of Continual Running SRNNs for End-to-End Streaming Keyword Spotting

    Get PDF
    Efficient end-to-end processing of continuous and streaming signals is one of the key challenges for Artificial Intelligence (AI) in particular for Edge applications that are energy-constrained. Spiking neural networks are explored to achieve efficient edge AI, employing low-latency, sparse processing, and small network size resulting in low-energy operation. Spiking Recurrent Neural Networks (SRNNs) achieve good performance on sample data at excellent network size and energy. When applied to continual streaming data, like a series of concatenated keyword samples, SRNNs, like traditional RNNs, recognize successive information increasingly poorly as the network dynamics become saturated. SRNNs process concatenated streams of data in three steps: i) Relevant signals have to be localized. ii) Evidence then needs to be integrated to classify the signal, and finally, iii) the neural dynamics must be combined with network state resetting events to remedy network saturation. Here we show how a streaming form of attention can aid SRNNs in localizing events in a continuous stream of signals, where a brain-inspired decision-making circuit then integrates evidence to determine the correct classification. This decision then leads to a delayed network reset, remedying network state saturation. We demonstrate the effectiveness of this approach on streams of concatenated keywords, reporting high accuracy combined with low average network activity as the attention signal effectively gates network activity in the absence of signals. We also show that the dynamic normalization effected by the attention mechanism enables a degree of environmental transfer learning, where the same keywords obtained in different circumstances are still correctly classified. The principles presented here also carry over to similar applications of classical RNNs and thus may be of general interest for continual running applications.</p

    Hauptsätze der Differential- und Integral-Rechnung : als Leitfaden zum Gebrauch bei Vorlesungen / zusammengestellt von Robert Fricke ; 1. Theil

    Get PDF
    \u3cp\u3eThe conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory — called near-memory computing (NMC) — more viable. Processing right at the “home” of data can significantly diminish the data movement problem of data-intensive applications. In this paper, we survey the prior art on NMC across various dimensions (architecture, applications, tools, etc.) and identify the key challenges and open issues with future research directions. We also provide a glimpse of our approach to near-memory computing that includes i) NMC specific microarchitecture independent application characterization ii) a compiler framework to offload the NMC kernels on our target NMC platform and iii) an analytical model to evaluate the potential of NMC.\u3c/p\u3

    Heuristics for scenario creation to enable general loop transformations

    Get PDF
    Embedded system applications can have quite complex control flow graphs (CFGs). Often their control flow prohibits design time optimizations, like advanced global loop transformations. To solve this problem, and enable far more global optimizations, we could consider paths of the CFG in isolation. However coding all paths separately would cause a tremendous code copying. In practice we have to trade-off the extra optimization opportunities vs. the code size. To make this trade-off, in this paper we use so-called system scenarios. These scenarios bundle similar control paths, while still allowing sufficient optimizations. The problem treated in this paper is: what are the right scenarios; i.e., which paths should be grouped together. For complex CFGs the number of possible scenarios (ways of grouping CFG paths) is huge; it grows exponentially with the number of CFG paths. Therefore heuristics are needed to quickly discover reasonable groupings. The main contribution of this paper is that we propose and evaluate three of these heuristics on both synthetic benchmarks and on a real-life application

    Cross-domain modeling and optimization of high-speed visual servo systems

    Get PDF
    High-speed visual servo systems are used in an increasing number of applications. Yet modeling and optimizing these systems remains a research challenge, largely because these systems consist of tightly-coupled design parameters across multiple domains, including image sensors, vision algorithms, processing systems, mechanical systems, control systems, among others. To overcome such a challenge, this work applies an axiomatic design method to the design of high-speed visual servo systems, such that cross-domain couplings are explicitly modeled and subsequently eliminated when possible. More importantly, methods are proposed to model the sample rate, measurement error, and delay of visual feedback based on design parameters across multiple domains. Lastly, methods to construct a holistic model and to perform cross-domain optimization are proposed. The proposed methods are applied to a representative case study that demonstrates the necessity of cross-domain modeling and optimization, as well as the effectiveness of the proposed methods

    Designing energy efficient approximate multipliers for neural acceleration

    Get PDF
    \u3cp\u3eMany error resilient applications can be approximated using multi-layer perceptrons (MLPs) with insignificant degradation in output quality. Faster and energy efficient execution of such an application is achieved using a neural accelerator (NA). This work exploits the error resilience characteristics of a MLP by approximating the accelerator itself. An error resilience analysis of the MLP is performed to obtain key constraints which are used for designing energy efficient approximate multipliers. A systematic methodology for the design of approximate multipliers is used. A graph based netlist modification approach is considered. Approximate versions of basic standard cells are generated and these are used to replace accurate cells in the synthesized netlist in a systematic quality controlled manner. These approximate multipliers are further used for approximating the multiply and accumulate (MAC) units in the neural accelerator (NA). The results are validated by considering approximate neural replication of a robotic application, inversek2j. System level energy savings of upto 14% is obtained for less the 7% degradation in output quality. Average application speedup of 24% is obtained over accurate neural accelerator (NA). The results are compared with state-of-the-art approximate multipliers and a comparison with truncation (bit-wise scaling) is performed. Moreover, error healing capability of MLPs is shown by studying the impact of retraining on networks with approximate multipliers.\u3c/p\u3

    Quantization of constrained processor data paths applied to convolutional neural networks

    No full text
    \u3cp\u3eArtificial Neural Networks (NNs) can effectively be used to solve many classification and regression problems, and deliver state-of-the-art performance in the application domains of natural language processing (NLP) and computer vision (CV). However, the tremendous amount of data movement and excessive convolutional workload of these networks hampers large-scale mobile and embedded productization. Therefore these models are generally mapped to energy-efficient accelerators without floating-point support. Weight and data quantization is an effective way to deploy high-precision models to efficient integer-based platforms. In this paper a quantization method for platforms without wide accumulation registers is being proposed. Two constraints to maximize the bit width of weights and input data for a given accumulator size are introduced. These constraints exploit knowledge about the weight and data distribution of individual layers. Using these constraints, we propose a layer-wise quantization heuristic to find a good fixed-point network approximation. To reduce the number of configurations to consider, only solutions that fully utilize the available accumulator bits are being tested. We demonstrate that 16-bit accumulators are able to obtain a Top-1 classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks.\u3c/p\u3

    An automated approximation methodology for arithmetic circuits

    No full text
    \u3cp\u3eArithmetic circuits like adders and multipliers are key workforces of many error resilient applications. Prior efforts on approximating these arithmetic circuits mainly focused on manual circuit level functional modifications. These manual approaches need high design time and effort. Due to this only a limited no. of approximate design points can be generated from the original circuit leading to a sparsely occupied pareto front. This work proposes an automated approximation methodology for arithmetic circuits. Proposed method approximates the gate level standard cell library and uses these approximate standard cells to modify the netlist of the original circuit. A heuristic design space exploration methodology is proposed to speed-up the design process. We integrate this methodology with traditional ASIC flow and validate our results using adders and multipliers of different bitwidths. We show that our methodology improves on existing state-of-the-art manual as well as automated design techniques by generating non-dominant pareto-fronts. An application case study (sobel edge detection) is shown using approximate arithmetic circuits generated by our methodology. In case of sobel edge detector, we show upto 50% energy improvements for hardly any quality degradation (PSNR ≥ 20dB).\u3c/p\u3
    • …
    corecore