225 research outputs found

    Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

    Get PDF
    After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has triggered a wave of research towards specialized hardware accelerators. Their performance is often constrained by I/O bandwidth and the energy consumption is dominated by I/O transfers to off-chip memory. We introduce and evaluate a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks. We show that an average compression ratio of 4.4x relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic

    Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems

    Full text link
    The execution of large deep neural networks (DNN) at mobile edge devices requires considerable consumption of critical resources, such as energy, while imposing demands on hardware capabilities. In approaches based on edge computing the execution of the models is offloaded to a compute-capable device positioned at the edge of 5G infrastructures. The main issue of the latter class of approaches is the need to transport information-rich signals over wireless links with limited and time-varying capacity. The recent split computing paradigm attempts to resolve this impasse by distributing the execution of DNN models across the layers of the systems to reduce the amount of data to be transmitted while imposing minimal computing load on mobile devices. In this context, we propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. This is in contrast with existing approaches, where the same adaptation requires costly context switching and model loading. Moreover, our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices. We present a comprehensive comparison with the most advanced split computing solutions, as well as an experimental evaluation on GPU-less devices

    Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey

    Full text link
    Adversarial attacks and defenses in machine learning and deep neural network have been gaining significant attention due to the rapidly growing applications of deep learning in the Internet and relevant scenarios. This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques, with a focus on deep neural network-based classification models. Specifically, we conduct a comprehensive classification of recent adversarial attack methods and state-of-the-art adversarial defense techniques based on attack principles, and present them in visually appealing tables and tree diagrams. This is based on a rigorous evaluation of the existing works, including an analysis of their strengths and limitations. We also categorize the methods into counter-attack detection and robustness enhancement, with a specific focus on regularization-based methods for enhancing robustness. New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks, and a hierarchical classification of the latest defense methods is provided, highlighting the challenges of balancing training costs with performance, maintaining clean accuracy, overcoming the effect of gradient masking, and ensuring method transferability. At last, the lessons learned and open challenges are summarized with future research opportunities recommended.Comment: 46 pages, 21 figure

    EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

    Get PDF
    In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly. This has sparked a surge of research into specialized hardware accelerators. Their performance is typically limited by I/O bandwidth, power consumption is dominated by I/O transfers to off-chip memory, and on-chip memories occupy a large part of the silicon area. We introduce and evaluate a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks. We present hardware architectures and synthesis results for the compressor and decompressor in 65 nm. With a throughput of one 8-bit word/cycle at 600 MHz, they fit into 2.8 kGE and 3.0 kGE of silicon area, respectively - together the size of less than seven 8-bit multiply-add units at the same throughput. We show that an average compression ratio of 5.1 7 for AlexNet, 4 for VGG-16, 2.4 7 for ResNet-34 and 2.2 7 for MobileNetV2 can be achieved - a gain of 45-70% over existing methods. Our approach also works effectively for various number formats, has a low frame-to-frame variance on the compression ratio, and achieves compression factors for gradient map compression during training that are even better than for inference

    Goal-directed cross-system interactions in brain and deep learning networks

    Get PDF
    Deep neural networks (DNN) have recently emerged as promising models for the mammalian ventral visual stream. However, how ventral stream adapts to various goal-directed influences and coordinates with higher-level brain regions during learning remain poorly understood. By incorporating top-down influences involving attentional cues, linguistic labels and novel category learning into DNN models, the thesis offers an explanation for how the tasks we do shape representations across levels in models and related brain regions including ventral visual stream, HPC and ventromedial prefrontal cortex (vmPFC) via a theoretical modelling approach. The thesis include three main contributions. In the first contribution, I developed a goal-directed attention mechanism which extends general-purpose DNN with the ability to reconfigure itself to better suit the current task goal, much like PFC modulates activity along the ventral stream. In the second contribution, I uncovered how linguistic labelling shapes semantic representation by amending existing DNN to both predict the meaning and the categorical label of an object. Supported by simulation results involving fine-grained and coarse-grained labels, I concluded that differences in label use, whether across languages or levels of expertise, manifest in differences in the semantic representations that support label discrimination. In the third contribution, I aimed to better understand cross-brain mechanisms in a novel learning task by combining insights on labelling and attention obtained from preceding efforts. Integrating DNN with a novel clustering model built off from SUSTAIN, the proposed account captures human category learning behaviour and the underlying neural mechanisms across multiple interacting brain areas involving HPC, vmPFC and the ventral visual stream. By extending models of the ventral stream to incorporate goal-directed cross-system coordination, I hope the thesis can inform understanding of the neurobiology supporting object recognition and category learning which in turn help us advance designs of deep learning models
    • …
    corecore