Search CORE

192 research outputs found

Vibration-based testing of bolted joints

Author: Fidlin Alexander
Sah Si Mohamed
Tcherniak Dmitri
Thomsen Jon Juel
Publication venue: CongressLine Ltd.
Publication date: 01/01/2017
Field of study

DeepliteRT: Computer Vision at the Edge

Author: Ashfaq Saad
AskariHemmat MohammadHossein
Hoffman Alexander
Mitra Saptarshi
Saboori Ehsan
Sah Sudhakar
Publication venue
Publication date: 19/09/2023
Field of study

The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically not available on edge platforms. Ultra low-bit quantization presents an attractive solution to this problem by scaling down the model weights and activations from 32-bit to less than 8-bit. We implement highly optimized ultra low-bit convolution operators for ARM-based targets that outperform existing methods by up to 4.34x. Our operator is implemented within Deeplite Runtime (DeepliteRT), an end-to-end solution for the compilation, tuning, and inference of ultra low-bit models on ARM devices. Compiler passes in DeepliteRT automatically convert a fake-quantized model in full precision to a compact ultra low-bit representation, easing the process of quantized model deployment on commodity hardware. We analyze the performance of DeepliteRT on classification and detection models against optimized 32-bit floating-point, 8-bit integer, and 2-bit baselines, achieving significant speedups of up to 2.20x, 2.33x and 2.17x, respectively.Comment: Accepted at British Machine Vision Conference (BMVC) 202

arXiv.org e-Print Archive

Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime

Author: Ashfaq Saad
AskariHemmat MohammadHossein
Hoffman Alexander
Mastropietro Olivier
Saboori Ehsan
Sah Sudhakar
Publication venue
Publication date: 18/07/2022
Field of study

Deep Learning has been one of the most disruptive technological advancements in recent times. The high performance of deep learning models comes at the expense of high computational, storage and power requirements. Sensing the immediate need for accelerating and compressing these models to improve on-device performance, we introduce Deeplite Neutrino for production-ready optimization of the models and Deeplite Runtime for deployment of ultra-low bit quantized models on Arm-based platforms. We implement low-level quantization kernels for Armv7 and Armv8 architectures enabling deployment on the vast array of 32-bit and 64-bit Arm-based devices. With efficient implementations using vectorization, parallelization, and tiling, we realize speedups of up to 2x and 2.2x compared to TensorFlow Lite with XNNPACK backend on classification and detection models, respectively. We also achieve significant speedups of up to 5x and 3.2x compared to ONNX Runtime for classification and detection models, respectively

arXiv.org e-Print Archive

Accelerated Proximal Iterative re-Weighted $\ell_1$ Alternating Minimization for Image Deblurring

Author: Adam Tarmizi
Hassan Mohd Fikree
Malyshev Alexander
Mohamed Nur Syarafina
Salam Md Sah Hj
Publication venue
Publication date: 10/09/2023
Field of study

The quadratic penalty alternating minimization (AM) method is widely used for solving the convex

\ell_1

total variation (TV) image deblurring problem. However, quadratic penalty AM for solving the nonconvex nonsmooth

\ell_p

0 < p < 1

TV image deblurring problems is less studied. In this paper, we propose two algorithms, namely proximal iterative re-weighted

\ell_1

AM (PIRL1-AM) and its accelerated version, accelerated proximal iterative re-weighted

\ell_1

AM (APIRL1-AM) for solving the nonconvex nonsmooth

\ell_p

TV image deblurring problem. The proposed algorithms are derived from the proximal iterative re-weighted

\ell_1

(IRL1) algorithm and the proximal gradient algorithm. Numerical results show that PIRL1-AM is effective in retaining sharp edges in image deblurring while APIRL1-AM can further provide convergence speed up in terms of the number of algorithm iterations and computational time

arXiv.org e-Print Archive

DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

Author: Ashfaq Saad
AskariHemmat MohammadHossein
Ganji Darshan C.
Hassanien Ahmed
Hoffman Alexander
Léonardon Mathieu
Mitra Saptarshi
Saboori Ehsan
Sah Sudhakar
Publication venue
Publication date: 18/04/2023
Field of study

A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, Multiple Data) hardware typically supports no less than 8-bit precision. To overcome this limitation, we propose DeepGEMM, a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. The proposed method precomputes all possible products of weights and activations, stores them in a lookup table, and efficiently accesses them at inference time to avoid costly multiply-accumulate operations. Our 2-bit implementation outperforms corresponding 8-bit integer kernels in the QNNPACK framework by up to 1.74x on x86 platforms

arXiv.org e-Print Archive

Conditional deletion of LRRC8A in the brain reduces stroke damage independently of swelling-activated glutamate release

Author: Balkaya Mustafa
Chen Sophie
Dohare Preeti
Fidaleo Antonio M
Mongin Alexander A
Nalwalk Julia W
Sah Rajan
Schober Alexandra L
Publication venue: Digital Commons@Becker
Publication date: 19/05/2023
Field of study

The ubiquitous volume-regulated anion channels (VRACs) facilitate cell volume control and contribute to many other physiological processes. Treatment with non-specific VRAC blockers or brain-specific deletion of the essential VRAC subunit LRRC8A is highly protective in rodent models of stroke. Here, we tested the widely accepted idea that the harmful effects of VRACs are mediated by release of the excitatory neurotransmitter glutamate. We produced conditional LRRC8A knockout either exclusively in astrocytes or in the majority of brain cells. Genetically modified mice were subjected to an experimental stroke (middle cerebral artery occlusion). The astrocytic LRRC8A knockout yielded no protection. Conversely, the brain-wide LRRC8A deletion strongly reduced cerebral infarction in both heterozygous (Het) and full KO mice. Yet, despite identical protection, Het mice had full swelling-activated glutamate release, whereas KO animals showed its virtual absence. These findings suggest that LRRC8A contributes to ischemic brain injury via a mechanism other than VRAC-mediated glutamate release

Digital Commons@Becker

Estimating bolt tightness from measured vibrations: Influence of boundary nonlinearity

Author: Brøns Marie
Fidlin Alexander
Sah Si Mohamed
Tcherniak Dmitri
Thomsen Jon Juel
Publication venue
Publication date: 01/01/2018
Field of study

Online Research Database In Technology