4 research outputs found

    Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference

    Full text link
    Applying ML advancements to healthcare can improve patient outcomes. However, the sheer operational complexity of ML models, combined with legacy hardware and multi-modal gigapixel images, poses a severe deployment limitation for real-time, on-device inference. We consider filter pruning as a solution, exploring segmentation models in cardiology and ophthalmology. Our preliminary results show a compression rate of up to 1148x with minimal loss in quality, stressing the need to consider task complexity and architectural details when using off-the-shelf models. At high compression rates, filter-pruned models exhibit faster inference on a CPU than the GPU baseline. We also demonstrate that such models' robustness and generalisability characteristics exceed that of the baseline and weight-pruned counterparts. We uncover intriguing questions and take a step towards realising cost-effective disease diagnosis, monitoring, and preventive solutions

    HASS:Hardware-aware sparsity search for dataflow DNN accelerator

    Get PDF
    Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and its scalability in data parallelism.Exploiting weights and activations sparsity can further enhance memory storage and computation efficiency. However, existing approaches focus on exploiting sparsity in non-dataflow accelerators, which cannot be applied onto dataflow accelerators because of the large hardware design space introduced. As such, this could miss opportunities to find an optimal combination of sparsity features and hardware designs.In this paper, we propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We propose a Hardware-Aware Sparsity Search (HASS) to systematically determine an efficient sparsity solution for dataflow accelerators. Over a set of models, we achieve an efficiency improvement ranging from 1.3× to 4.2× compared to existing sparse designs, which are either non-dataflow or non-hardware-aware. Particularly, the throughput of MobileNetV3 can be optimized to 4895 images per second. HASS is open-source: https://github.com/Yu-Zhewen/HAS

    Exploring the Relative Value of Collaborative Optimisation Pathways (Student Abstract)

    No full text
    Compression techniques in machine learning (ML) independently improve a model’s inference efficiency by reducing its memory footprint while aiming to maintain its quality. This paper lays groundwork in questioning the merit of a compression pipeline involving all techniques as opposed to skipping a few by considering a case study on a keyword spotting model: DS-CNN-S. In addition, it documents improvements to the model’s training and dataset infrastructure. For this model, preliminary findings suggest that a full-scale pipeline isn’t required to achieve a competent memory footprint and accuracy, but a more comprehensive study is required
    corecore