261 research outputs found

    Squeeze-and-Excitation SqueezeNext: An Efficient DNN for Hardware Deployment

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Convolution neural network is being used in field of autonomous driving vehicles or driver assistance systems (ADAS), and has achieved great success. Before the convolution neural network, traditional machine learning algorithms helped the driver assistance systems. Currently, there is a great exploration being done in architectures like MobileNet, SqueezeNext & SqueezeNet. It improved the CNN architectures and made it more suitable to implement on real-time embedded systems. This thesis proposes an efficient and a compact CNN to ameliorate the performance of existing CNN architectures. The intuition behind this proposed architecture is to supplant convolution layers with a more sophisticated block module and to develop a compact architecture with a competitive accuracy. Further, explores the bottleneck module and squeezenext basic block structure. The state-of-the-art squeezenext baseline architecture is used as a foundation to recreate and propose a high performance squeezenext architecture. The proposed architecture is further trained on the CIFAR-10 dataset from scratch. All the training and testing results are visualized with live loss and accuracy graphs. Focus of this thesis is to make an adaptable and a flexible model for efficient CNN performance which can perform better with the minimum tradeoff between model accuracy, size, and speed. Having a model size of 0.595MB along with accuracy of 92.60% and with a satisfactory training and validating speed of 9 seconds, this model can be deployed on real-time autonomous system platform such as Bluebox 2.0 by NXP

    USE-Net: Incorporating Squeeze-and-Excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets

    Get PDF
    Prostate cancer is the most common malignant tumors in men but prostate Magnetic Resonance Imaging (MRI) analysis remains challenging. Besides whole prostate gland segmentation, the capability to differentiate between the blurry boundary of the Central Gland (CG) and Peripheral Zone (PZ) can lead to differential diagnosis, since tumor's frequency and severity differ in these regions. To tackle the prostate zonal segmentation task, we propose a novel Convolutional Neural Network (CNN), called USE-Net, which incorporates Squeeze-and-Excitation (SE) blocks into U-Net. Especially, the SE blocks are added after every Encoder (Enc USE-Net) or Encoder-Decoder block (Enc-Dec USE-Net). This study evaluates the generalization ability of CNN-based architectures on three T2-weighted MRI datasets, each one consisting of a different number of patients and heterogeneous image characteristics, collected by different institutions. The following mixed scheme is used for training/testing: (i) training on either each individual dataset or multiple prostate MRI datasets and (ii) testing on all three datasets with all possible training/testing combinations. USE-Net is compared against three state-of-the-art CNN-based architectures (i.e., U-Net, pix2pix, and Mixed-Scale Dense Network), along with a semi-automatic continuous max-flow model. The results show that training on the union of the datasets generally outperforms training on each dataset separately, allowing for both intra-/cross-dataset generalization. Enc USE-Net shows good overall generalization under any training condition, while Enc-Dec USE-Net remarkably outperforms the other methods when trained on all datasets. These findings reveal that the SE blocks' adaptive feature recalibration provides excellent cross-dataset generalization when testing is performed on samples of the datasets used during training.Comment: 44 pages, 6 figures, Accepted to Neurocomputing, Co-first authors: Leonardo Rundo and Changhee Ha

    RepViT: Revisiting Mobile CNN From ViT Perspective

    Full text link
    Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with nearly 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M3, obtains 81.4\% accuracy with only 1.3ms latency. The code and trained models are available at \url{https://github.com/jameslahm/RepViT}.Comment: 9 pages, 7 figure

    Modeling Fission Gas Release at the Mesoscale using Multiscale DenseNet Regression with Attention Mechanism and Inception Blocks

    Full text link
    Mesoscale simulations of fission gas release (FGR) in nuclear fuel provide a powerful tool for understanding how microstructure evolution impacts FGR, but they are computationally intensive. In this study, we present an alternate, data-driven approach, using deep learning to predict instantaneous FGR flux from 2D nuclear fuel microstructure images. Four convolutional neural network (CNN) architectures with multiscale regression are trained and evaluated on simulated FGR data generated using a hybrid phase field/cluster dynamics model. All four networks show high predictive power, with R2R^{2} values above 98%. The best performing network combine a Convolutional Block Attention Module (CBAM) and InceptionNet mechanisms to provide superior accuracy (mean absolute percentage error of 4.4%), training stability, and robustness on very low instantaneous FGR flux values.Comment: Submitted at Journal of Nuclear Materials, 20 pages, 10 figures, 3 table

    SleepyWheels: An Ensemble Model for Drowsiness Detection leading to Accident Prevention

    Full text link
    Around 40 percent of accidents related to driving on highways in India occur due to the driver falling asleep behind the steering wheel. Several types of research are ongoing to detect driver drowsiness but they suffer from the complexity and cost of the models. In this paper, SleepyWheels a revolutionary method that uses a lightweight neural network in conjunction with facial landmark identification is proposed to identify driver fatigue in real time. SleepyWheels is successful in a wide range of test scenarios, including the lack of facial characteristics while covering the eye or mouth, the drivers varying skin tones, camera placements, and observational angles. It can work well when emulated to real time systems. SleepyWheels utilized EfficientNetV2 and a facial landmark detector for identifying drowsiness detection. The model is trained on a specially created dataset on driver sleepiness and it achieves an accuracy of 97 percent. The model is lightweight hence it can be further deployed as a mobile application for various platforms.Comment: 20 page

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Full text link
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa
    • …
    corecore