28 research outputs found
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
Higher order feature extraction and selection for robust human gesture recognition using CSI of COTS Wi-Fi devices
Device-free human gesture recognition (HGR) using commercial o the shelf (COTS) Wi-Fi
devices has gained attention with recent advances in wireless technology. HGR recognizes the human
activity performed, by capturing the reflections ofWi-Fi signals from moving humans and storing
them as raw channel state information (CSI) traces. Existing work on HGR applies noise reduction
and transformation to pre-process the raw CSI traces. However, these methods fail to capture
the non-Gaussian information in the raw CSI data due to its limitation to deal with linear signal
representation alone. The proposed higher order statistics-based recognition (HOS-Re) model extracts
higher order statistical (HOS) features from raw CSI traces and selects a robust feature subset for the
recognition task. HOS-Re addresses the limitations in the existing methods, by extracting third order
cumulant features that maximizes the recognition accuracy. Subsequently, feature selection methods
derived from information theory construct a robust and highly informative feature subset, fed as
input to the multilevel support vector machine (SVM) classifier in order to measure the performance.
The proposed methodology is validated using a public database SignFi, consisting of 276 gestures
with 8280 gesture instances, out of which 5520 are from the laboratory and 2760 from the home
environment using a 10 5 cross-validation. HOS-Re achieved an average recognition accuracy of
97.84%, 98.26% and 96.34% for the lab, home and lab + home environment respectively. The average
recognition accuracy for 150 sign gestures with 7500 instances, collected from five di erent users was
96.23% in the laboratory environment.Taylor's University through its TAYLOR'S PhD SCHOLARSHIP Programmeinfo:eu-repo/semantics/publishedVersio
An enhancement of toe model by investigating the influential factors of cloud adoption security objectives
Cloud computing (CC) is a future technological trend for technological infrastructure development. And it is growing strongly as the backbone of industrial future technological infrastructure. As CC service has a lot to offer, it also has some major downside that clients cannot ignore. For CC service adoption, the potential candidates are SMEs but due to lack of resources, experience, expertise and low financial structure scenario CC can be most helpful. CC faces a major issue in term of cloud security, an organization doesn’t understand the cloud security factors in the organization and data owner doubts about their data. In the research paper, an investigation on the cloud security objectives to find out the influential factors for cloud adoption in SMEs by proposing an enhancement of Technology-Organization- Environment (TOE) model with some positive influential factor like cloud security, relative advantages, cost saving, availability, SLA, capability, top management, organizational readiness, IS knowledge, malicious insiders, government regulatory support, competitive pressure, size and type. Some negative influencing factors like technological readiness, cloud trust and lack of standards in cloud security. Data were collected by questionnaires from a selected IT company based on SaaS and public cloud. Case study method has been used for validating the enhance TOE model. The IBM Statistics SPSS v22 tool was used for data analysis. The results of data analysis support the enhancement as well as all the proposed hypotheses. In summary, the results of the analysis show that all the enhancement factors were found to have a significant cloud security influence on adoption of cloud computing for SMEs
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Recent years have witnessed the great success of vision transformer (ViT),
which has achieved state-of-the-art performance on multiple computer vision
benchmarks. However, ViT models suffer from vast amounts of parameters and high
computation cost, leading to difficult deployment on resource-constrained edge
devices. Existing solutions mostly compress ViT models to a compact model but
still cannot achieve real-time inference. To tackle this issue, we propose to
explore the divisibility of transformer structure, and decompose the large ViT
into multiple small models for collaborative inference at edge devices. Our
objective is to achieve fast and energy-efficient collaborative inference while
maintaining comparable accuracy compared with large ViTs. To this end, we first
propose a collaborative inference framework termed DeViT to facilitate edge
deployment by decomposing large ViTs. Subsequently, we design a
decomposition-and-ensemble algorithm based on knowledge distillation, termed
DEKD, to fuse multiple small decomposed models while dramatically reducing
communication overheads, and handle heterogeneous models by developing a
feature matching module to promote the imitations of decomposed models from the
large ViT. Extensive experiments for three representative ViT backbones on four
widely-used datasets demonstrate our method achieves efficient collaborative
inference for ViTs and outperforms existing lightweight ViTs, striking a good
trade-off between efficiency and accuracy. For example, our DeViTs improves
end-to-end latency by 2.89 with only 1.65% accuracy sacrifice using
CIFAR-100 compared to the large ViT, ViT-L/16, on the GPU server. DeDeiTs
surpasses the recent efficient ViT, MobileViT-S, by 3.54% in accuracy on
ImageNet-1K, while running 1.72 faster and requiring 55.28% lower
energy consumption on the edge device.Comment: Accepted by IEEE Transactions on Mobile Computin
GRC-Sensing: An Architecture to Measure Acoustic Pollution Based on Crowdsensing
[EN] Noise pollution is an emerging and challenging problem of all large metropolitan areas, affecting the health of citizens in multiple ways. Therefore, obtaining a detailed and real-time map of noise in cities becomes of the utmost importance for authorities to take preventive measures. Until now, these measurements were limited to occasional sampling made by specialized companies, that mainly focus on major roads. In this paper, we propose an alternative approach to this problem based on crowdsensing. Our proposed architecture empowers participating citizens by allowing them to seamlessly, and based on their context, sample the noise in their surrounding environment. This allows us to provide a global and detailed view of noise levels around the city, including places traditionally not monitored due to poor accessibility, even while using their vehicles. In the paper, we detail how the different relevant issues in our architecture, i.e., smartphone calibration, measurement adequacy, server design, and clientÂżserver interaction, were solved, and we have validated them in real scenarios to illustrate the potential of the solution achieved.This work was partially supported by Valencia's Traffic Management Department, by the "Ministerio de Economia y Competitividad, Programa Estatal de Investigacion, Desarrollo e Innovacion Orientada a los Retos de la Sociedad, Proyectos I + D + I 2014", Spain, under Grant TEC2014-52690-R, and the "Universidad Laica Eloy Alfaro de Manabi, and the Programa de Becas SENESCYT" de la Republica del Ecuador.Zamora-Mero, WJ.; Vera, E.; Tavares De Araujo Cesariny Calafate, CM.; Cano, J.; Manzoni, P. (2018). GRC-Sensing: An Architecture to Measure Acoustic Pollution Based on Crowdsensing. Sensors. 18(8):1-25. https://doi.org/10.3390/s18082596S12518
Understanding Spark System Performance for Image Processing in a Heterogeneous Commodity Cluster
In recent years, Apache Spark has seen a widespread adoption in industries and institutions due to its
cache mechanism for faster Big Data analytics. However, the speed advantage Spark provides, especially in
a heterogeneous cluster environment, is not obtainable out-of-the-box; it requires the right combination of
configuration parameters from the myriads of parameters provided by Spark developers. Recognizing this
challenge, this thesis undertakes a study to provide insight on Spark performance particularly, regarding
the impact of choice parameter settings. These are parameters that are critical to fast job completion and
effective utilization of resources.
To this end, the study focuses on two specific example applications namely, flowerCounter and imageClustering,
for processing still image datasets of Canola plants collected during the Summer of 2016 from selected
plot fields using timelapse cameras in a heterogeneous Spark-clustered environments. These applications
were of initial interest to the Plant Phenotyping and Imaging Research Centre (P2IRC) at the University of
Saskatchewan. The P2IRC is responsible for developing systems that will aid fast analysis of large-scale seed
breeding to ensure global food security. The flowerCounter application estimates the count of flowers from
the images while the imageClustering application clusters images based on physical plant attributes. Two
clusters are used for the experiments: a 12-node and 3-node cluster (including a master node), with Hadoop
Distributed File System (HDFS) as the storage medium for the image datasets.
Experiments with the two case study applications demonstrate that increasing the number of tasks does
not always speed-up job processing due to increased communication overheads. Findings from other experiments
show that numerous tasks with one core per executor and small allocated memory limits parallelism
within an executor and result in inefficient use of cluster resources. Executors with large CPU and memory,
on the other hand, do not speed-up analytics due to processing delays and threads concurrency. Further
experimental results indicate that application processing time depends on input data storage in conjunction
with locality levels and executor run time is largely dominated by the disk I/O time especially, the read
time cost. With respect to horizontal node scaling, Spark scales with increasing homogeneous computing
nodes but the speed-up degrades with heterogeneous nodes. Finally, this study shows that the effectiveness
of speculative tasks execution in mitigating the impact of slow nodes varies for the applications