Search CORE

361 research outputs found

Cloud-based or On-device: An Empirical Study of Mobile Deep Inference

Author: Guo Tian
Publication venue
Publication date: 15/04/2018
Field of study

Modern mobile applications are benefiting significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of computational complexity and size constraints, these trained models are often hosted in the cloud. To utilize these cloud-based models, mobile apps will have to send input data over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it restricts the use case scenarios, e.g. mobile apps need to have network access. With mobile specific deep learning optimizations, it is now possible to employ on-device inference. However, because mobile hardware, such as GPU and memory size, can be very limited when compared to its desktop counterpart, it is important to understand the feasibility of this new on-device deep learning inference architecture. In this paper, we empirically evaluate the inference performance of three Convolutional Neural Networks (CNNs) using a benchmark Android application we developed. Our measurement and analysis suggest that on-device inference can cost up to two orders of magnitude greater response time and energy when compared to cloud-based inference, and that loading model and computing probability are two performance bottlenecks for on-device deep inferences.Comment: Accepted at The IEEE International Conference on Cloud Engineering (IC2E) conference 201

arXiv.org e-Print Archive

Crossref

Caffe barista: brewing caffe with FPGAs in the training loop

Author: Bouganis C-S
Rajagopal A
Venieris SI
Vink DA
Publication venue: 'Center for Open Science'
Publication date: 18/06/2020
Field of study

As the complexity of deep learning (DL) models increases, their compute requirements increase accordingly. Deploying a Convolutional Neural Network (CNN) involves two phases: training and inference. With the inference task typically taking place on resource-constrained devices, a lot of research has explored the field of low-power inference on custom hardware accelerators. On the other hand, training is both more compute- and memory-intensive and is primarily performed on power-hungry GPUs in large-scale data centres. CNN training on FPGAs is a nascent field of research. This is primarily due to the lack of tools to easily prototype and deploy various hardware and/or algorithmic techniques for power-efficient CNN training. This work presents Barista, an automated toolflow that provides seamless integration of FPGAs into the training of CNNs within the popular deep learning framework Caffe. To the best of our knowledge, this is the only tool that allows for such versatile and rapid deployment of hardware and algorithms for the FPGA-based training of CNNs, providing the necessary infrastructure for further research and development

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Neural Architecture Search as Program Transformation Exploration

Author: Crowley Elliot J
O'Boyle Michael F P
Turner Jack
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/02/2021
Field of study

Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3

\times

in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time. Code is available at~\href{https://github.com/jack-willturner/nas-as-program-transformation-exploration}{this https url}

arXiv.org e-Print Archive

Edinburgh Research Explorer

Exploring the Potential of Convolutional Neural Networks in Healthcare Engineering for Skin Disease Identification

Author: Jilani Sayyad et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 02/11/2023
Field of study

Skin disorders affect millions of individuals worldwide, underscoring the urgency of swift and accurate detection for optimal treatment outcomes. Convolutional Neural Networks (CNNs) have emerged as valuable assets for automating the identification of skin ailments. This paper conducts an exhaustive examination of the latest advancements in CNN-driven skin condition detection. Within dermatological applications, CNNs proficiently analyze intricate visual motifs and extricate distinctive features from skin imaging datasets. By undergoing training on extensive data repositories, CNNs proficiently classify an array of skin maladies such as melanoma, psoriasis, eczema, and acne. The paper spotlights pivotal progressions in CNN-centered skin ailment diagnosis, encompassing diverse CNN architectures, refinement methodologies, and data augmentation tactics. Moreover, the integration of transfer learning and ensemble approaches has further amplified the efficacy of CNN models. Despite their substantial potential, there exist pertinent challenges. The comprehensive portrayal of skin afflictions and the mitigation of biases mandate access to extensive and varied data pools. The quest for comprehending the decision-making processes propelling CNN models remains an ongoing endeavor. Ethical quandaries like algorithmic predisposition and data privacy also warrant significant consideration. By meticulously scrutinizing the evolutions, obstacles, and potential of CNN-oriented skin disorder diagnosis, this critique provides invaluable insights to researchers and medical professionals. It underscores the importance of precise and efficacious diagnostic instruments in ameliorating patient outcomes and curbing healthcare expenditures

International Journal on Recent and Innovation Trends in Computing and Communication

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

Author: Azarkhish Erfan
Benini Luca
Bonetti Andrea
Emery Stephane
Jokic Petar
Pons Marc
Publication venue
Publication date: 24/06/2021
Field of study

Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna