40 research outputs found
FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices
Deep neural networks show great potential as solutions to many sensing
application problems, but their excessive resource demand slows down execution
time, pausing a serious impediment to deployment on low-end devices. To address
this challenge, recent literature focused on compressing neural network size to
improve performance. We show that changing neural network size does not
proportionally affect performance attributes of interest, such as execution
time. Rather, extreme run-time nonlinearities exist over the network
configuration space. Hence, we propose a novel framework, called FastDeepIoT,
that uncovers the non-linear relation between neural network structure and
execution time, then exploits that understanding to find network configurations
that significantly improve the trade-off between execution time and accuracy on
mobile and embedded devices. FastDeepIoT makes two key contributions. First,
FastDeepIoT automatically learns an accurate and highly interpretable execution
time model for deep neural networks on the target device. This is done without
prior knowledge of either the hardware specifications or the detailed
implementation of the used deep learning library. Second, FastDeepIoT informs a
compression algorithm how to minimize execution time on the profiled device
without impacting accuracy. We evaluate FastDeepIoT using three different
sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus.
FastDeepIoT further reduces the neural network execution time by to
and energy consumption by to compared with the
state-of-the-art compression algorithms.Comment: Accepted by SenSys '1
Balancing Privacy Protection and Interpretability in Federated Learning
Federated learning (FL) aims to collaboratively train the global model in a
distributed manner by sharing the model parameters from local clients to a
central server, thereby potentially protecting users' private information.
Nevertheless, recent studies have illustrated that FL still suffers from
information leakage as adversaries try to recover the training data by
analyzing shared parameters from local clients. To deal with this issue,
differential privacy (DP) is adopted to add noise to the gradients of local
models before aggregation. It, however, results in the poor performance of
gradient-based interpretability methods, since some weights capturing the
salient region in feature map will be perturbed. To overcome this problem, we
propose a simple yet effective adaptive differential privacy (ADP) mechanism
that selectively adds noisy perturbations to the gradients of client models in
FL. We also theoretically analyze the impact of gradient perturbation on the
model interpretability. Finally, extensive experiments on both IID and Non-IID
data demonstrate that the proposed ADP can achieve a good trade-off between
privacy and interpretability in FL
An Experimental Evaluation of Datacenter Workloads On Low-Power Embedded Micro Servers
This paper presents a comprehensive evaluation of an ultra-low power cluster, built upon the Intel Edison based micro servers. The improved performance and high energy efficiency of micro servers have driven both academia and industry to explore the possibility of replacing conventional brawny servers with a larger swarm of embedded micro servers. Existing attempts mostly focus on mobile-class micro servers, whose capacities are similar to mobile phones. We, on the other hand, target on sensor-class micro servers, which are originally intended for uses in wearable technologies, sensor networks, and Internet-of-Things. Although sensor-class micro servers have much less capacity, they are touted for minimal power consumption (< 1 Watt), which opens new possibilities of achieving higher energy efficiency in datacenter workloads. Our systematic evaluation of the Edison cluster and comparisons to conventional brawny clusters involve careful workload choosing and laborious parameter tuning, which ensures maximum server utilization and thus fair comparisons. Results show that the Edison cluster achieves up to 3.5× improvement on work-done-per-joule for web service applications and data-intensive MapReduce jobs. In terms of scalability, the Edison cluster scales linearly on the throughput of web service workloads, and also shows satisfactory scalability for MapReduce workloads despite coordination overhead.This research was supported in part by NSF grant 13-20209.Ope
General-Purpose Multi-Modal OOD Detection Framework
Out-of-distribution (OOD) detection identifies test samples that differ from
the training data, which is critical to ensuring the safety and reliability of
machine learning (ML) systems. While a plethora of methods have been developed
to detect uni-modal OOD samples, only a few have focused on multi-modal OOD
detection. Current contrastive learning-based methods primarily study
multi-modal OOD detection in a scenario where both a given image and its
corresponding textual description come from a new domain. However, real-world
deployments of ML systems may face more anomaly scenarios caused by multiple
factors like sensor faults, bad weather, and environmental changes. Hence, the
goal of this work is to simultaneously detect from multiple different OOD
scenarios in a fine-grained manner. To reach this goal, we propose a
general-purpose weakly-supervised OOD detection framework, called WOOD, that
combines a binary classifier and a contrastive learning component to reap the
benefits of both. In order to better distinguish the latent representations of
in-distribution (ID) and OOD samples, we adopt the Hinge loss to constrain
their similarity. Furthermore, we develop a new scoring metric to integrate the
prediction results from both the binary classifier and contrastive learning for
identifying OOD samples. We evaluate the proposed WOOD model on multiple
real-world datasets, and the experimental results demonstrate that the WOOD
model outperforms the state-of-the-art methods for multi-modal OOD detection.
Importantly, our approach is able to achieve high accuracy in OOD detection in
three different OOD scenarios simultaneously. The source code will be made
publicly available upon publication
paper2repo: GitHub Repository Recommendation for Academic Papers
GitHub has become a popular social application platform, where a large number
of users post their open source projects. In particular, an increasing number
of researchers release repositories of source code related to their research
papers in order to attract more people to follow their work. Motivated by this
trend, we describe a novel item-item cross-platform recommender system,
, that recommends relevant repositories on GitHub that
match a given paper in an academic search system such as Microsoft Academic.
The key challenge is to identify the similarity between an input paper and its
related repositories across the two platforms, . Towards that end, paper2repo integrates text encoding and
constrained graph convolutional networks (GCN) to automatically learn and map
the embeddings of papers and repositories into the same space, where proximity
offers the basis for recommendation. To make our method more practical in real
life systems, labels used for model training are computed automatically from
features of user actions on GitHub. In machine learning, such automatic
labeling is often called {\em distant supervision\/}. To the authors'
knowledge, this is the first distant-supervised cross-platform (paper to
repository) matching system. We evaluate the performance of paper2repo on
real-world data sets collected from GitHub and Microsoft Academic. Results
demonstrate that it outperforms other state of the art recommendation methods
Eugene: Towards deep intelligence as a service
National Research Foundation (NRF) Singapore under its International Research Centres in Singapore Funding Initiativ
STFNets: Learning Sensing Signals from the Time-Frequency Perspective with Short-Time Fourier Neural Networks
Recent advances in deep learning motivate the use of deep neural networks in
Internet-of-Things (IoT) applications. These networks are modelled after signal
processing in the human brain, thereby leading to significant advantages at
perceptual tasks such as vision and speech recognition. IoT applications,
however, often measure physical phenomena, where the underlying physics (such
as inertia, wireless signal propagation, or the natural frequency of
oscillation) are fundamentally a function of signal frequencies, offering
better features in the frequency domain. This observation leads to a
fundamental question: For IoT applications, can one develop a new brand of
neural network structures that synthesize features inspired not only by the
biology of human perception but also by the fundamental nature of physics?
Hence, in this paper, instead of using conventional building blocks (e.g.,
convolutional and recurrent layers), we propose a new foundational neural
network building block, the Short-Time Fourier Neural Network (STFNet). It
integrates a widely-used time-frequency analysis method, the Short-Time Fourier
Transform, into data processing to learn features directly in the frequency
domain, where the physics of underlying phenomena leave better foot-prints.
STFNets bring additional flexibility to time-frequency analysis by offering
novel nonlinear learnable operations that are spectral-compatible. Moreover,
STFNets show that transforming signals to a domain that is more connected to
the underlying physics greatly simplifies the learning process. We demonstrate
the effectiveness of STFNets with extensive experiments. STFNets significantly
outperform the state-of-the-art deep learning models in all experiments. A
STFNet, therefore, demonstrates superior capability as the fundamental building
block of deep neural networks for IoT applications for various sensor inputs