3,557 research outputs found
Dream Distillation: A Data-Independent Model Compression Framework
Model compression is eminently suited for deploying deep learning on
IoT-devices. However, existing model compression techniques rely on access to
the original or some alternate dataset. In this paper, we address the model
compression problem when no real data is available, e.g., when data is private.
To this end, we propose Dream Distillation, a data-independent model
compression framework. Our experiments show that Dream Distillation can achieve
88.5% accuracy on the CIFAR-10 test set without actually training on the
original data!Comment: Presented at the ICML 2019 Joint Workshop on On-Device Machine
Learning & Compact Deep Neural Network Representations (ODML-CDNNR
Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency
The performance of video saliency estimation techniques has achieved
significant advances along with the rapid development of Convolutional Neural
Networks (CNNs). However, devices like cameras and drones may have limited
computational capability and storage space so that the direct deployment of
complex deep saliency models becomes infeasible. To address this problem, this
paper proposes a dynamic saliency estimation approach for aerial videos via
spatiotemporal knowledge distillation. In this approach, five components are
involved, including two teachers, two students and the desired spatiotemporal
model. The knowledge of spatial and temporal saliency is first separately
transferred from the two complex and redundant teachers to their simple and
compact students, and the input scenes are also degraded from high-resolution
to low-resolution to remove the probable data redundancy so as to greatly speed
up the feature extraction process. After that, the desired spatiotemporal model
is further trained by distilling and encoding the spatial and temporal saliency
knowledge of two students into a unified network. In this manner, the
inter-model redundancy can be further removed for the effective estimation of
dynamic saliency on aerial videos. Experimental results show that the proposed
approach outperforms ten state-of-the-art models in estimating visual saliency
on aerial videos, while its speed reaches up to 28,738 FPS on the GPU platform
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
Along with the rapid developments in communication technologies and the surge
in the use of mobile devices, a brand-new computation paradigm, Edge Computing,
is surging in popularity. Meanwhile, Artificial Intelligence (AI) applications
are thriving with the breakthroughs in deep learning and the many improvements
in hardware architectures. Billions of data bytes, generated at the network
edge, put massive demands on data processing and structural optimization. Thus,
there exists a strong demand to integrate Edge Computing and AI, which gives
birth to Edge Intelligence. In this paper, we divide Edge Intelligence into AI
for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial
Intelligence on Edge). The former focuses on providing more optimal solutions
to key problems in Edge Computing with the help of popular and effective AI
technologies while the latter studies how to carry out the entire process of
building AI models, i.e., model training and inference, on the edge. This paper
provides insights into this new inter-disciplinary field from a broader
perspective. It discusses the core concepts and the research road-map, which
should provide the necessary background for potential future research
initiatives in Edge Intelligence.Comment: 13 pages, 3 figure
RDPD: Rich Data Helps Poor Data via Imitation
In many situations, we need to build and deploy separate models in related
environments with different data qualities. For example, an environment with
strong observation equipments (e.g., intensive care units) often provides
high-quality multi-modal data, which are acquired from multiple sensory devices
and have rich-feature representations. On the other hand, an environment with
poor observation equipment (e.g., at home) only provides low-quality, uni-modal
data with poor-feature representations. To deploy a competitive model in a
poor-data environment without requiring direct access to multi-modal data
acquired from a rich-data environment, this paper develops and presents a
knowledge distillation (KD) method (RDPD) to enhance a predictive model trained
on poor data using knowledge distilled from a high-complexity model trained on
rich, private data. We evaluated RDPD on three real-world datasets and shown
that its distilled model consistently outperformed all baselines across all
datasets, especially achieving the greatest performance improvement over a
model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on
ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and
4.44% on ROC-AUC.Comment: Published in IJCAI 201
A Generic Network Compression Framework for Sequential Recommender Systems
Sequential recommender systems (SRS) have become the key technology in
capturing user's dynamic interests and generating high-quality recommendations.
Current state-of-the-art sequential recommender models are typically based on a
sandwich-structured deep neural network, where one or more middle (hidden)
layers are placed between the input embedding layer and output softmax layer.
In general, these models require a large number of parameters (such as using a
large embedding dimension or a deep network architecture) to obtain their
optimal performance. Despite the effectiveness, at some point, further
increasing model size may be harder for model deployment in resource-constraint
devices, resulting in longer responding time and larger memory footprint. To
resolve the issues, we propose a compressed sequential recommendation
framework, termed as CpRec, where two generic model shrinking techniques are
employed. Specifically, we first propose a block-wise adaptive decomposition to
approximate the input and softmax matrices by exploiting the fact that items in
SRS obey a long-tailed distribution. To reduce the parameters of the middle
layers, we introduce three layer-wise parameter sharing schemes. We instantiate
CpRec using deep convolutional neural network with dilated kernels given
consideration to both recommendation accuracy and efficiency. By the extensive
ablation studies, we demonstrate that the proposed CpRec can achieve up to
48 times compression rates in real-world SRS datasets. Meanwhile, CpRec
is faster during training\inference, and in most cases outperforms its
uncompressed counterpart.Comment: Accepted by SIGIR202
Teacher-Student Architecture for Knowledge Distillation: A Survey
Although Deep neural networks (DNNs) have shown a strong capacity to solve
large-scale problems in many areas, such DNNs are hard to be deployed in
real-world systems due to their voluminous parameters. To tackle this issue,
Teacher-Student architectures were proposed, where simple student networks with
a few parameters can achieve comparable performance to deep teacher networks
with many parameters. Recently, Teacher-Student architectures have been
effectively and widely embraced on various knowledge distillation (KD)
objectives, including knowledge compression, knowledge expansion, knowledge
adaptation, and knowledge enhancement. With the help of Teacher-Student
architectures, current studies are able to achieve multiple distillation
objectives through lightweight and generalized student networks. Different from
existing KD surveys that primarily focus on knowledge compression, this survey
first explores Teacher-Student architectures across multiple distillation
objectives. This survey presents an introduction to various knowledge
representations and their corresponding optimization objectives. Additionally,
we provide a systematic overview of Teacher-Student architectures with
representative learning algorithms and effective distillation schemes. This
survey also summarizes recent applications of Teacher-Student architectures
across multiple purposes, including classification, recognition, generation,
ranking, and regression. Lastly, potential research directions in KD are
investigated, focusing on architecture design, knowledge quality, and
theoretical studies of regression-based learning, respectively. Through this
comprehensive survey, industry practitioners and the academic community can
gain valuable insights and guidelines for effectively designing, learning, and
applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with
arXiv:2210.1733
Deep Learning Towards Mobile Applications
Recent years have witnessed an explosive growth of mobile devices. Mobile
devices are permeating every aspect of our daily lives. With the increasing
usage of mobile devices and intelligent applications, there is a soaring demand
for mobile applications with machine learning services. Inspired by the
tremendous success achieved by deep learning in many machine learning tasks, it
becomes a natural trend to push deep learning towards mobile applications.
However, there exist many challenges to realize deep learning in mobile
applications, including the contradiction between the miniature nature of
mobile devices and the resource requirement of deep neural networks, the
privacy and security concerns about individuals' data, and so on. To resolve
these challenges, during the past few years, great leaps have been made in this
area. In this paper, we provide an overview of the current challenges and
representative achievements about pushing deep learning on mobile devices from
three aspects: training with mobile data, efficient inference on mobile
devices, and applications of mobile deep learning. The former two aspects cover
the primary tasks of deep learning. Then, we go through our two recent
applications that apply the data collected by mobile devices to inferring mood
disturbance and user identification. Finally, we conclude this paper with the
discussion of the future of this area.Comment: Conference version accepted by ICDCS'1
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
Collaborative Deep Learning Across Multiple Data Centers
Valuable training data is often owned by independent organizations and
located in multiple data centers. Most deep learning approaches require to
centralize the multi-datacenter data for performance purpose. In practice,
however, it is often infeasible to transfer all data to a centralized data
center due to not only bandwidth limitation but also the constraints of privacy
regulations. Model averaging is a conventional choice for data parallelized
training, but its ineffectiveness is claimed by previous studies as deep neural
networks are often non-convex. In this paper, we argue that model averaging can
be effective in the decentralized environment by using two strategies, namely,
the cyclical learning rate and the increased number of epochs for local model
training. With the two strategies, we show that model averaging can provide
competitive performance in the decentralized mode compared to the
data-centralized one. In a practical environment with multiple data centers, we
conduct extensive experiments using state-of-the-art deep network architectures
on different types of data. Results demonstrate the effectiveness and
robustness of the proposed method.Comment: Submitted to AAAI 201
Knowledge Distillation for Federated Learning: a Practical Guide
Federated Learning (FL) enables the training of Deep Learning models without
centrally collecting possibly sensitive raw data. This paves the way for
stronger privacy guarantees when building predictive models. The most used
algorithms for FL are parameter-averaging based schemes (e.g., Federated
Averaging) that, however, have well known limits: (i) Clients must implement
the same model architecture; (ii) Transmitting model weights and model updates
implies high communication cost, which scales up with the number of model
parameters; (iii) In presence of non-IID data distributions,
parameter-averaging aggregation schemes perform poorly due to client model
drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve
and/or mitigate the weaknesses of parameter-averaging FL algorithms while
possibly introducing other trade-offs. In this article, we provide a review of
KD-based algorithms tailored for specific FL issues.Comment: 9 pages, 1 figur
- …