10 research outputs found

    Mind the Scaling Factors: Resilience Analysis of Quantized Adversarially Robust CNNs

    Get PDF
    As more deep learning algorithms enter safety-critical application domains, the importance of analyzing their resilience against hardware faults cannot be overstated. Most existing works focus on bit-flips in memory, fewer focus on compute errors, and almost none study the effect of hardware faults on adversarially trained convolutional neural networks (CNNs). In this work, we show that adversarially trained CNNs are more susceptible to failure due to hardware errors when compared to vanilla-trained models. We identify large differences in the quantization scaling factors of the CNNs which are resilient to hardware faults and those which are not. As adversarially trained CNNs learn robustness against input attack perturbations, their internal weight and activation distributions open a backdoor for injecting large magnitude hardware faults. We propose a simple weight decay remedy for adversarially trained models to maintain adversarial robustness and hardware resilience in the same CNN. We improve the fault resilience of an adversarially trained ResNet56 by 25% for large-scale bit-flip benchmarks on activation data while gaining slightly improved accuracy and adversarial robustness

    HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures

    Get PDF
    Energy and throughput efficient acceleration of convolutional neural networks (CNN) on devices with a strict power budget is achieved by leveraging different scheduling techniques to minimize data movement and maximize data reuse. Several dataflow mapping frameworks have been developed to explore the optimal scheduling of CNN layers on reconfigurable accelerators. However, previous works usually optimize each layer singularly, without leveraging the data reuse between the layers of CNNs. In this work, we present an analytical model to achieve efficient data reuse by searching for efficient scheduling of communication and computation across layers. We call this inter-layer scheduling framework HW-Flow-Fusion, as we explore the fused map-space of multiple layers sharing the available resources of the same accelerator, investigating the constraints and trade-offs of mapping the execution of multiple workloads with data dependencies. We propose a memory-efficient data reuse model, tiling, and resource partitioning strategies to fuse multiple layers without recomputation. Compared to standard single-layer scheduling, inter-layer scheduling can reduce the communication volume by 51% and 53% for selected VGG16-E and ResNet18 layers on a spatial array accelerator, and reduce the latency by 39% and 34% respectively, while also increasing the computation to communication ratio which improves the memory bandwidth efficiency

    HW-Flow: A Multi-Abstraction Level HW-CNN Codesign Pruning Methodology

    Get PDF
    Convolutional neural networks (CNNs) have produced unprecedented accuracy for many computer vision problems in the recent past. In power and compute-constrained embedded platforms, deploying modern CNNs can present many challenges. Most CNN architectures do not run in real-time due to the high number of computational operations involved during the inference phase. This emphasizes the role of CNN optimization techniques in early design space exploration. To estimate their efficacy in satisfying the target constraints, existing techniques are either hardware (HW) agnostic, pseudo-HW-aware by considering parameter and operation counts, or HW-aware through inflexible hardware-in-the-loop (HIL) setups. In this work, we introduce HW-Flow, a framework for optimizing and exploring CNN models based on three levels of hardware abstraction: Coarse, Mid and Fine. Through these levels, CNN design and optimization can be iteratively refined towards efficient execution on the target hardware platform. We present HW-Flow in the context of CNN pruning by augmenting a reinforcement learning agent with key metrics to understand the influence of its pruning actions on the inference hardware. With 2× reduction in energy and latency, we prune ResNet56, ResNet50, and DeepLabv3 with minimal accuracy degradation on the CIFAR-10, ImageNet, and CityScapes datasets, respectively

    HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

    Get PDF
    Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods

    RAW FX

    No full text
    Nel corso degli ultimi cinquant’anni, si è assistito ad una progressiva industrializzazione del cinema: accurate ricerche di mercato e scelte commerciali strategiche
hanno preso il posto di arte e passione.
Da oltre vent’anni, sei studios cinematografici controllano il mercato, creando un oligopolio che scoraggia la sperimentazione e l’originalità, promuovendo un
modello di produzione seriale che ripropone personaggi, trame e mondi già esplorati, privando le nuove generazioni della meraviglia e dello stupore che il grande
pubblico ha provato nei decenni passati.
La tendenza del cinema Hollywoodiano di creare immagini sempre più sorprendenti richiede uno sproporzionato sforzo lavorativo agli artisti che si occupano degli
effetti visivi, costringendo a ritmi di lavoro estremamente serrati con pochissimo riguardo al benessere delle persone coinvolte.
Il settore degli effetti visivi non riceve un adeguato riconoscimento economico per la sua cruciale importanza nel processo di film making, causando addirittura
la bancarotta di studi pluripremiati di enorme prestigio.
Oggi, i giganti del settore abbracciano i vantaggi e benefici della filosofia Open Source allo scopo di condividere la loro arte, creando una comunità tale da resistere
alle pesanti pressioni dei mega-studios.
Da questa premessa nasce il progetto Raw FX, che si pone come obiettivo lo sviluppo di soluzioni dedicate alla realizzazione di effetti digitali in grado di ridurre
i tempi e i costi che tali operazioni comportano, riducendo il carico di lavoro degli artisti e consentendo allo stesso tempo a piccole produzioni di ottenere risultati
di alto livello.
Si è analizzato il flusso di lavoro del grande cinema al fine di individuare l’area di intervento adeguata. La soluzione proposta, pensata per un’industria sempre in corsa
contro il tempo, è stata progettata per essere realizzata con tecnologie di prototipazione rapida consentendone la realizzazione nel più breve tempo possibile; l’intera
progettazione è effettuata con strumenti free o open source, nella speranza di ispirare altri progetti con simile intento

    Development and testing of BlenderFDS, the open, community-based, user interface for NIST FDS

    No full text
    Computational Fluid Dynamics (CFD) tools have increasingly begun to play an important role in risk assessments for fire safety design. NIST Fire dynamics simulator (FDS) seems the CFD modeling tool of first choice for the world-wide community of fire engineers. FDS is being developed as a CFD code only, so no user interface is available to the user, and the input data pre-processing phase is completely left to the user responsibility. In real world cases, especially where complex or curved geometries are to be described, the data input phase represents an important cost for the fire engineer. Back in 2009, in the opinion of the authors, no satisfactory pre-processing tool for this purpose existed. No multi-platform, open source and freely available pre-processing tool existed at all. The lack of an open pre-processor tool motivated the development of BlenderFDS, the open, community-based, user interface for FDS. This paper describes the design process and development choices of this new tool. Then the results of a recent, through evaluation of the tool are presented: BlenderFDS was employed for a fire safety study on Castel Thun, a fascinating medieval castle located at the foot of the Italian Alps. BlenderFDS allowed for a satisfactory control over the input data, and the generated namelists groups. The graphical user interface for 3D solid modeling and intense data sharing between BlenderFDS entities prevented duplication of efforts and lowered the risk of input data errors. This study demonstrates the value of a fully open tool-chain for CFD fire safety analysis. BlenderFDS tool is following the evolution of FDS ecosystem, both in terms of new FDS features and in terms of FDS users' community needs. Its open, bottom-up development model seems to be mostly appropriate to withstand such a challenge

    ERODE: Error Resilient Object DetEction by Recovering Bounding Box and Class Information

    No full text
    Fault resilience in computer vision algorithms is paramount in critical applications such as autonomous driving or surveillance. Convolutional neural networks (CNNs) are usually used in these tasks to identify objects of interest, which are passed to other decisional algorithms and used to take specific actions. However, incorrect detections due to computation errors could pose a safety risk. In this work, we present ERODE (Error Resilient Object DetEction), a framework that can be paired with a CNN to filter the detections by identifying possible errors and restoring the correct predictions, improving the faultresilience of the system. The proposed framework leverages the temporal correlation among consecutive images and CNN outputs, using motion estimation and tracking techniques to infer whether computation errors have occurred and, in that case, produce a new set of outputs. In order to evaluate the performance, precision and recall of the CNN with and without ERODE support have been computed and compared using the MOT17DET dataset, and EfficientDet D0 quantized to 16-bit, with errors injected in the activations computed during the inference. The experimental results show significantly reduced task accuracy degradation induced by bit-flips, proving that ERODE can increase the system’s fault resilience

    TEMET: Truncated REconfigurable Multiplier with Error Tuning

    No full text
    Approximate computing is a well-established technique to mitigate power consumption in error-tolerant domains such as image processing and machine learning. When paired with reconfigurable hardware, it enables dynamic adaptability to each specific task with improved power-accuracy trade-offs. In this work, we present a design methodology to enhance the energy and error metrics of a signed multiplier. This novel approach reduces the approximation error by leveraging a statistic-based truncation strategy. Our multiplier features 256 dynamically configurable approximation levels and run-time selection of the result precision. Our technique improves the mean-relative error by up to 34% compared to the zero truncation mechanism. Compared with an exact design, we achieve a maximum of 60.1% power saving for a PSNR of 10.3dB on a 5x5 Sobel filter. Moreover, we reduce the computation energy of LeNet by 31.5%, retaining 89.4% of the original accuracy on FashionMNIST

    NLCMAP: A Framework for the Efficient Mapping of Non-Linear Convolutional Neural Networks on FPGA Accelerators

    No full text
    This paper introduces NLCMap, a framework for the mapping space exploration targeting Non-Linear Convolutional Networks (NLCNs). NLCNs [1] are a novel neural network model that improves performances in certain computer vision applications by introducing a non-linearity in the weights computation. NLCNs are more challenging to efficiently map onto hardware accelerators if compared to traditional Convolutional Neural Networks (CNNs), due to data dependencies and additional computations. To this aim, we propose NLCMap, a framework that, given an NLC layer and a generic hardware accelerator with a certain on-chip memory budget, finds the optimal mapping that minimizes the accesses to the off-chip memory, which are often the critical aspect in CNNs acceleration

    AnaCoNGA: Analytical HW-CNN Co-Design Using Nested Genetic Algorithms

    Get PDF
    We present AnaCoNGA, an analytical co-design methodology, which enables two genetic algorithms to evaluate the fitness of design decisions on layer-wise quantization of a neural network and hardware (HW) resource allocation. We embed a hardware architecture search (HAS) algorithm into a quantization strategy search (QSS) algorithm to evaluate the hardware design Pareto-front of each considered quantization strategy. We harness the speed and flexibility of analytical HW-modeling to enable parallel HW-CNN co-design. With this approach, the QSS is focused on seeking high-accuracy quantization strategies which are guaranteed to have efficient hardware designs at the end of the search. Through AnaCoNGA, we improve the accuracy by 2.88 p.p. with respect to a uniform 2-bit ResNet20 on CIFAR-10, and achieve a 35% and 37% improvement in latency and DRAM accesses, while reducing LUT and BRAM resources by 9% and 59% respectively, when compared to a standard edge variant of the accelerator. The nested genetic algorithm formulation also reduces the search time by 51% compared to an equivalent, sequential co-design formulation
    corecore