80,084 research outputs found
Federated Learning via Over-the-Air Computation
The stringent requirements for low-latency and privacy of the emerging
high-stake applications with intelligent devices such as drones and smart
vehicles make the cloud computing inapplicable in these scenarios. Instead,
edge machine learning becomes increasingly attractive for performing training
and inference directly at network edges without sending data to a centralized
data center. This stimulates a nascent field termed as federated learning for
training a machine learning model on computation, storage, energy and bandwidth
limited mobile devices in a distributed manner. To preserve data privacy and
address the issues of unbalanced and non-IID data points across different
devices, the federated averaging algorithm has been proposed for global model
aggregation by computing the weighted average of locally updated model at each
selected device. However, the limited communication bandwidth becomes the main
bottleneck for aggregating the locally computed updates. We thus propose a
novel over-the-air computation based approach for fast global model aggregation
via exploring the superposition property of a wireless multiple-access channel.
This is achieved by joint device selection and beamforming design, which is
modeled as a sparse and low-rank optimization problem to support efficient
algorithms design. To achieve this goal, we provide a
difference-of-convex-functions (DC) representation for the sparse and low-rank
function to enhance sparsity and accurately detect the fixed-rank constraint in
the procedure of device selection. A DC algorithm is further developed to solve
the resulting DC program with global convergence guarantees. The algorithmic
advantages and admirable performance of the proposed methodologies are
demonstrated through extensive numerical results
Model Compression with Adversarial Robustness: A Unified Optimization Framework
Deep model compression has been extensively studied, and state-of-the-art
methods can now achieve high compression ratios with minimal accuracy loss.
This paper studies model compression through a different lens: could we
compress models without hurting their robustness to adversarial attacks, in
addition to maintaining accuracy? Previous literature suggested that the goals
of robustness and compactness might sometimes contradict. We propose a novel
Adversarially Trained Model Compression (ATMC) framework. ATMC constructs a
unified constrained optimization formulation, where existing compression means
(pruning, factorization, quantization) are all integrated into the constraints.
An efficient algorithm is then developed. An extensive group of experiments are
presented, demonstrating that ATMC obtains remarkably more favorable trade-off
among model size, accuracy and robustness, over currently available
alternatives in various settings. The codes are publicly available at:
https://github.com/shupenggui/ATMC.Comment: 14 pages, NeurIPS 2019. The first two authors Gui and Wang
contributed equally and are listed alphabeticall
Towards Open-Text Semantic Parsing via Multi-Task Learning of Structured Embeddings
Open-text (or open-domain) semantic parsers are designed to interpret any
statement in natural language by inferring a corresponding meaning
representation (MR). Unfortunately, large scale systems cannot be easily
machine-learned due to lack of directly supervised data. We propose here a
method that learns to assign MRs to a wide range of text (using a dictionary of
more than 70,000 words, which are mapped to more than 40,000 entities) thanks
to a training scheme that combines learning from WordNet and ConceptNet with
learning from raw text. The model learns structured embeddings of words,
entities and MRs via a multi-task training process operating on these diverse
sources of data that integrates all the learnt knowledge into a single system.
This work ends up combining methods for knowledge acquisition, semantic
parsing, and word-sense disambiguation. Experiments on various tasks indicate
that our approach is indeed successful and can form a basis for future more
sophisticated systems
StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs
Weight pruning methods of DNNs have been demonstrated to achieve a good model
pruning rate without loss of accuracy, thereby alleviating the significant
computation/storage requirements of large-scale DNNs. Structured weight pruning
methods have been proposed to overcome the limitation of irregular network
structure and demonstrated actual GPU acceleration. However, in prior work the
pruning rate (degree of sparsity) and GPU acceleration are limited (to less
than 50%) when accuracy needs to be maintained. In this work,we overcome these
limitations by proposing a unified, systematic framework of structured weight
pruning for DNNs. It is a framework that can be used to induce different types
of structured sparsity, such as filter-wise, channel-wise, and shape-wise
sparsity, as well non-structured sparsity. The proposed framework incorporates
stochastic gradient descent with ADMM, and can be understood as a dynamic
regularization method in which the regularization target is analytically
updated in each iteration. Without loss of accuracy on the AlexNet model, we
achieve 2.58X and 3.65X average measured speedup on two GPUs, clearly
outperforming the prior work. The average speedups reach 3.15X and 8.52X when
allowing a moderate ac-curacy loss of 2%. In this case the model compression
for convolutional layers is 15.0X, corresponding to 11.93X measured CPU
speedup. Our experiments on ResNet model and on other data sets like UCF101 and
CIFAR-10 demonstrate the consistently higher performance of our framework
Representation Learning by Rotating Your Faces
The large pose discrepancy between two face images is one of the fundamental
challenges in automatic face recognition. Conventional approaches to
pose-invariant face recognition either perform face frontalization on, or learn
a pose-invariant representation from, a non-frontal face image. We argue that
it is more desirable to perform both tasks jointly to allow them to leverage
each other. To this end, this paper proposes a Disentangled Representation
learning-Generative Adversarial Network (DR-GAN) with three distinct novelties.
First, the encoder-decoder structure of the generator enables DR-GAN to learn a
representation that is both generative and discriminative, which can be used
for face image synthesis and pose-invariant face recognition. Second, this
representation is explicitly disentangled from other face variations such as
pose, through the pose code provided to the decoder and pose estimation in the
discriminator. Third, DR-GAN can take one or multiple images as the input, and
generate one unified identity representation along with an arbitrary number of
synthetic face images. Extensive quantitative and qualitative evaluation on a
number of controlled and in-the-wild databases demonstrate the superiority of
DR-GAN over the state of the art in both learning representations and rotating
large-pose face images.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM
Many model compression techniques of Deep Neural Networks (DNNs) have been
investigated, including weight pruning, weight clustering and quantization,
etc. Weight pruning leverages the redundancy in the number of weights in DNNs,
while weight clustering/quantization leverages the redundancy in the number of
bit representations of weights. They can be effectively combined in order to
exploit the maximum degree of redundancy. However, there lacks a systematic
investigation in literature towards this direction.
In this paper, we fill this void and develop a unified, systematic framework
of DNN weight pruning and clustering/quantization using Alternating Direction
Method of Multipliers (ADMM), a powerful technique in optimization theory to
deal with non-convex optimization problems. Both DNN weight pruning and
clustering/quantization, as well as their combinations, can be solved in a
unified manner. For further performance improvement in this framework, we adopt
multiple techniques including iterative weight quantization and retraining,
joint weight clustering training and centroid updating, weight clustering
retraining, etc. The proposed framework achieves significant improvements both
in individual weight pruning and clustering/quantization problems, as well as
their combinations. For weight pruning alone, we achieve 167x weight reduction
in LeNet-5, 24.7x in AlexNet, and 23.4x in VGGNet, without any accuracy loss.
For the combination of DNN weight pruning and clustering/quantization, we
achieve 1,910x and 210x storage reduction of weight data on LeNet-5 and
AlexNet, respectively, without accuracy loss. Our codes and models are released
at the link http://bit.ly/2D3F0n
Non-linear Dimensionality Regularizer for Solving Inverse Problems
Consider an ill-posed inverse problem of estimating causal factors from
observations, one of which is known to lie near some (un- known)
low-dimensional, non-linear manifold expressed by a predefined Mercer-kernel.
Solving this problem requires simultaneous estimation of these factors and
learning the low-dimensional representation for them. In this work, we
introduce a novel non-linear dimensionality regulariza- tion technique for
solving such problems without pre-training. We re-formulate Kernel-PCA as an
energy minimization problem in which low dimensionality constraints are
introduced as regularization terms in the energy. To the best of our knowledge,
ours is the first at- tempt to create a dimensionality regularizer in the KPCA
framework. Our approach relies on robustly penalizing the rank of the recovered
fac- tors directly in the implicit feature space to create their
low-dimensional approximations in closed form. Our approach performs robust
KPCA in the presence of missing data and noise. We demonstrate state-of-the-art
results on predicting missing entries in the standard oil flow dataset.
Additionally, we evaluate our method on the challenging problem of Non-Rigid
Structure from Motion and our approach delivers promising results on CMU mocap
dataset despite the presence of significant occlusions and noise
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices
Large-scale deep neural networks (DNNs) are both compute and memory
intensive. As the size of DNNs continues to grow, it is critical to improve the
energy efficiency and performance while maintaining accuracy. For DNNs, the
model size is an important factor affecting performance, scalability and energy
efficiency. Weight pruning achieves good compression ratios but suffers from
three drawbacks: 1) the irregular network structure after pruning; 2) the
increased training complexity; and 3) the lack of rigorous guarantee of
compression ratio and inference accuracy. To overcome these limitations, this
paper proposes CirCNN, a principled approach to represent weights and process
neural networks using block-circulant matrices. CirCNN utilizes the Fast
Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the
computational complexity (both in inference and training) from O(n2) to
O(nlogn) and the storage complexity from O(n2) to O(n), with negligible
accuracy loss. Compared to other approaches, CirCNN is distinct due to its
mathematical rigor: it can converge to the same effectiveness as DNNs without
compression. The CirCNN architecture, a universal DNN inference engine that can
be implemented on various hardware/software platforms with configurable network
architecture. To demonstrate the performance and energy efficiency, we test
CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN
architecture achieves very high energy efficiency and performance with a small
hardware footprint. Based on the FPGA implementation and ASIC synthesis
results, CirCNN achieves 6-102X energy efficiency improvements compared with
the best state-of-the-art results.Comment: 14 pages, 15 Figures, conferenc
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework
Hardware accelerations of deep learning systems have been extensively
investigated in industry and academia. The aim of this paper is to achieve
ultra-high energy efficiency and performance for hardware implementations of
deep neural networks (DNNs). An algorithm-hardware co-optimization framework is
developed, which is applicable to different DNN types, sizes, and application
scenarios. The algorithm part adopts the general block-circulant matrices to
achieve a fine-grained tradeoff between accuracy and compression ratio. It
applies to both fully-connected and convolutional layers and contains a
mathematically rigorous proof of the effectiveness of the method. The proposed
algorithm reduces computational complexity per layer from O() to O() and storage complexity from O() to O(), both for training and
inference. The hardware part consists of highly efficient Field Programmable
Gate Array (FPGA)-based implementations using effective reconfiguration, batch
processing, deep pipelining, resource re-using, and hierarchical control.
Experimental results demonstrate that the proposed framework achieves at least
152X speedup and 71X energy efficiency gain compared with IBM TrueNorth
processor under the same test accuracy. It achieves at least 31X energy
efficiency gain compared with the reference FPGA-based work.Comment: 6 figures, AAAI Conference on Artificial Intelligence, 201
Effective Image Retrieval via Multilinear Multi-index Fusion
Multi-index fusion has demonstrated impressive performances in retrieval task
by integrating different visual representations in a unified framework.
However, previous works mainly consider propagating similarities via neighbor
structure, ignoring the high order information among different visual
representations. In this paper, we propose a new multi-index fusion scheme for
image retrieval. By formulating this procedure as a multilinear based
optimization problem, the complementary information hidden in different indexes
can be explored more thoroughly. Specially, we first build our multiple indexes
from various visual representations. Then a so-called index-specific functional
matrix, which aims to propagate similarities, is introduced for updating the
original index. The functional matrices are then optimized in a unified tensor
space to achieve a refinement, such that the relevant images can be pushed more
closer. The optimization problem can be efficiently solved by the augmented
Lagrangian method with theoretical convergence guarantee. Unlike the
traditional multi-index fusion scheme, our approach embeds the multi-index
subspace structure into the new indexes with sparse constraint, thus it has
little additional memory consumption in online query stage. Experimental
evaluation on three benchmark datasets reveals that the proposed approach
achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP
94.1\% on Holiday and 62.39\% on Market-1501.Comment: 12 page
- …