Search CORE

28 research outputs found

What does fault tolerant Deep Learning need from MPI?

Author: Amatya Vinay
Daily Jeff
Siegel Charles
Vishnu Abhinav
Publication venue
Publication date: 01/01/2017
Field of study

Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

arXiv.org e-Print Archive

Crossref

Higher order feature extraction and selection for robust human gesture recognition using CSI of COTS Wi-Fi devices

Author: Ahmad Hafisoh
Ahmed Hasmath Farhana
Harkat Houda
Narasingamurthi Kulasekharan
Phang Swee King
Vaithilingam Chockalingam
Publication venue: 'MDPI AG'
Publication date: 04/07/2019
Field of study

Device-free human gesture recognition (HGR) using commercial o the shelf (COTS) Wi-Fi devices has gained attention with recent advances in wireless technology. HGR recognizes the human activity performed, by capturing the reflections ofWi-Fi signals from moving humans and storing them as raw channel state information (CSI) traces. Existing work on HGR applies noise reduction and transformation to pre-process the raw CSI traces. However, these methods fail to capture the non-Gaussian information in the raw CSI data due to its limitation to deal with linear signal representation alone. The proposed higher order statistics-based recognition (HOS-Re) model extracts higher order statistical (HOS) features from raw CSI traces and selects a robust feature subset for the recognition task. HOS-Re addresses the limitations in the existing methods, by extracting third order cumulant features that maximizes the recognition accuracy. Subsequently, feature selection methods derived from information theory construct a robust and highly informative feature subset, fed as input to the multilevel support vector machine (SVM) classifier in order to measure the performance. The proposed methodology is validated using a public database SignFi, consisting of 276 gestures with 8280 gesture instances, out of which 5520 are from the laboratory and 2760 from the home environment using a 10 5 cross-validation. HOS-Re achieved an average recognition accuracy of 97.84%, 98.26% and 96.34% for the lab, home and lab + home environment respectively. The average recognition accuracy for 150 sign gestures with 7500 instances, collected from five di erent users was 96.23% in the laboratory environment.Taylor's University through its TAYLOR'S PhD SCHOLARSHIP Programmeinfo:eu-repo/semantics/publishedVersio

Multidisciplinary Digital Publishing Institute

Sapientia

An enhancement of toe model by investigating the influential factors of cloud adoption security objectives

Author: Bhuiyan Md. Yeahia
Publication venue
Publication date: 01/06/2018
Field of study

Cloud computing (CC) is a future technological trend for technological infrastructure development. And it is growing strongly as the backbone of industrial future technological infrastructure. As CC service has a lot to offer, it also has some major downside that clients cannot ignore. For CC service adoption, the potential candidates are SMEs but due to lack of resources, experience, expertise and low financial structure scenario CC can be most helpful. CC faces a major issue in term of cloud security, an organization doesn’t understand the cloud security factors in the organization and data owner doubts about their data. In the research paper, an investigation on the cloud security objectives to find out the influential factors for cloud adoption in SMEs by proposing an enhancement of Technology-Organization- Environment (TOE) model with some positive influential factor like cloud security, relative advantages, cost saving, availability, SLA, capability, top management, organizational readiness, IS knowledge, malicious insiders, government regulatory support, competitive pressure, size and type. Some negative influencing factors like technological readiness, cloud trust and lack of standards in cloud security. Data were collected by questionnaires from a selected IT company based on SaaS and public cloud. Case study method has been used for validating the enhance TOE model. The IBM Statistics SPSS v22 tool was used for data analysis. The results of data analysis support the enhancement as well as all the proposed hypotheses. In summary, the results of the analysis show that all the enhancement factors were found to have a significant cloud security influence on adoption of cloud computing for SMEs

Universiti Teknologi Malaysia Institutional Repository

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

Author: An Jianping
Hao Zhiwei
Hu Han
Luo Yong
Mao Shiwen
Xu Guanyu
Publication venue
Publication date: 10/09/2023
Field of study

Recent years have witnessed the great success of vision transformer (ViT), which has achieved state-of-the-art performance on multiple computer vision benchmarks. However, ViT models suffer from vast amounts of parameters and high computation cost, leading to difficult deployment on resource-constrained edge devices. Existing solutions mostly compress ViT models to a compact model but still cannot achieve real-time inference. To tackle this issue, we propose to explore the divisibility of transformer structure, and decompose the large ViT into multiple small models for collaborative inference at edge devices. Our objective is to achieve fast and energy-efficient collaborative inference while maintaining comparable accuracy compared with large ViTs. To this end, we first propose a collaborative inference framework termed DeViT to facilitate edge deployment by decomposing large ViTs. Subsequently, we design a decomposition-and-ensemble algorithm based on knowledge distillation, termed DEKD, to fuse multiple small decomposed models while dramatically reducing communication overheads, and handle heterogeneous models by developing a feature matching module to promote the imitations of decomposed models from the large ViT. Extensive experiments for three representative ViT backbones on four widely-used datasets demonstrate our method achieves efficient collaborative inference for ViTs and outperforms existing lightweight ViTs, striking a good trade-off between efficiency and accuracy. For example, our DeViTs improves end-to-end latency by 2.89

\times

with only 1.65% accuracy sacrifice using CIFAR-100 compared to the large ViT, ViT-L/16, on the GPU server. DeDeiTs surpasses the recent efficient ViT, MobileViT-S, by 3.54% in accuracy on ImageNet-1K, while running 1.72

\times

faster and requiring 55.28% lower energy consumption on the edge device.Comment: Accepted by IEEE Transactions on Mobile Computin

arXiv.org e-Print Archive

GRC-Sensing: An Architecture to Measure Acoustic Pollution Based on Crowdsensing

Author: Cano Juan-Carlos
Manzoni Pietro
Tavares De Araujo Cesariny Calafate Carlos Miguel
Vera Elsa
Zamora-Mero Willian Jesus
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

[EN] Noise pollution is an emerging and challenging problem of all large metropolitan areas, affecting the health of citizens in multiple ways. Therefore, obtaining a detailed and real-time map of noise in cities becomes of the utmost importance for authorities to take preventive measures. Until now, these measurements were limited to occasional sampling made by specialized companies, that mainly focus on major roads. In this paper, we propose an alternative approach to this problem based on crowdsensing. Our proposed architecture empowers participating citizens by allowing them to seamlessly, and based on their context, sample the noise in their surrounding environment. This allows us to provide a global and detailed view of noise levels around the city, including places traditionally not monitored due to poor accessibility, even while using their vehicles. In the paper, we detail how the different relevant issues in our architecture, i.e., smartphone calibration, measurement adequacy, server design, and client¿server interaction, were solved, and we have validated them in real scenarios to illustrate the potential of the solution achieved.This work was partially supported by Valencia's Traffic Management Department, by the "Ministerio de Economia y Competitividad, Programa Estatal de Investigacion, Desarrollo e Innovacion Orientada a los Retos de la Sociedad, Proyectos I + D + I 2014", Spain, under Grant TEC2014-52690-R, and the "Universidad Laica Eloy Alfaro de Manabi, and the Programa de Becas SENESCYT" de la Republica del Ecuador.Zamora-Mero, WJ.; Vera, E.; Tavares De Araujo Cesariny Calafate, CM.; Cano, J.; Manzoni, P. (2018). GRC-Sensing: An Architecture to Measure Acoustic Pollution Based on Crowdsensing. Sensors. 18(8):1-25. https://doi.org/10.3390/s18082596S12518

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

RiuNet

Understanding Spark System Performance for Image Processing in a Heterogeneous Commodity Cluster

Author: Adekoya Owolabi O 1982-
Publication venue: 'University of Saskatchewan Library'
Publication date: 08/08/2018
Field of study

In recent years, Apache Spark has seen a widespread adoption in industries and institutions due to its cache mechanism for faster Big Data analytics. However, the speed advantage Spark provides, especially in a heterogeneous cluster environment, is not obtainable out-of-the-box; it requires the right combination of configuration parameters from the myriads of parameters provided by Spark developers. Recognizing this challenge, this thesis undertakes a study to provide insight on Spark performance particularly, regarding the impact of choice parameter settings. These are parameters that are critical to fast job completion and effective utilization of resources. To this end, the study focuses on two specific example applications namely, flowerCounter and imageClustering, for processing still image datasets of Canola plants collected during the Summer of 2016 from selected plot fields using timelapse cameras in a heterogeneous Spark-clustered environments. These applications were of initial interest to the Plant Phenotyping and Imaging Research Centre (P2IRC) at the University of Saskatchewan. The P2IRC is responsible for developing systems that will aid fast analysis of large-scale seed breeding to ensure global food security. The flowerCounter application estimates the count of flowers from the images while the imageClustering application clusters images based on physical plant attributes. Two clusters are used for the experiments: a 12-node and 3-node cluster (including a master node), with Hadoop Distributed File System (HDFS) as the storage medium for the image datasets. Experiments with the two case study applications demonstrate that increasing the number of tasks does not always speed-up job processing due to increased communication overheads. Findings from other experiments show that numerous tasks with one core per executor and small allocated memory limits parallelism within an executor and result in inefficient use of cluster resources. Executors with large CPU and memory, on the other hand, do not speed-up analytics due to processing delays and threads concurrency. Further experimental results indicate that application processing time depends on input data storage in conjunction with locality levels and executor run time is largely dominated by the disk I/O time especially, the read time cost. With respect to horizontal node scaling, Spark scales with increasing homogeneous computing nodes but the speed-up degrades with heterogeneous nodes. Finally, this study shows that the effectiveness of speculative tasks execution in mitigating the impact of slow nodes varies for the applications

eCommons@USASK

University of Saskatchewan Research Archive