3,042 research outputs found
NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks
"How much energy is consumed for an inference made by a convolutional neural
network (CNN)?" With the increased popularity of CNNs deployed on the
wide-spectrum of platforms (from mobile devices to workstations), the answer to
this question has drawn significant attention. From lengthening battery life of
mobile devices to reducing the energy bill of a datacenter, it is important to
understand the energy efficiency of CNNs during serving for making an
inference, before actually training the model. In this work, we propose
NeuralPower: a layer-wise predictive framework based on sparse polynomial
regression, for predicting the serving energy consumption of a CNN deployed on
any GPU platform. Given the architecture of a CNN, NeuralPower provides an
accurate prediction and breakdown for power and runtime across all layers in
the whole network, helping machine learners quickly identify the power,
runtime, or energy bottlenecks. We also propose the "energy-precision ratio"
(EPR) metric to guide machine learners in selecting an energy-efficient CNN
architecture that better trades off the energy consumption and prediction
accuracy. The experimental results show that the prediction accuracy of the
proposed NeuralPower outperforms the best published model to date, yielding an
improvement in accuracy of up to 68.5%. We also assess the accuracy of
predictions at the network level, by predicting the runtime, power, and energy
of state-of-the-art CNN architectures, achieving an average accuracy of 88.24%
in runtime, 88.34% in power, and 97.21% in energy. We comprehensively
corroborate the effectiveness of NeuralPower as a powerful framework for
machine learners by testing it on different GPU platforms and Deep Learning
software tools.Comment: Accepted as a conference paper at ACML 201
DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures
Recent breakthroughs in Neural Architectural Search (NAS) have achieved
state-of-the-art performances in applications such as image classification and
language modeling. However, these techniques typically ignore device-related
objectives such as inference time, memory usage, and power consumption.
Optimizing neural architecture for device-related objectives is immensely
crucial for deploying deep networks on portable devices with limited computing
resources. We propose DPP-Net: Device-aware Progressive Search for
Pareto-optimal Neural Architectures, optimizing for both device-related (e.g.,
inference time and memory usage) and device-agnostic (e.g., accuracy and model
size) objectives. DPP-Net employs a compact search space inspired by current
state-of-the-art mobile CNNs, and further improves search efficiency by
adopting progressive search (Liu et al. 2017). Experimental results on CIFAR-10
are poised to demonstrate the effectiveness of Pareto-optimal networks found by
DPP-Net, for three different devices: (1) a workstation with Titan X GPU, (2)
NVIDIA Jetson TX1 embedded system, and (3) mobile phone with ARM Cortex-A53.
Compared to CondenseNet and NASNet (Mobile), DPP-Net achieves better
performances: higher accuracy and shorter inference time on various devices.
Additional experimental results show that models found by DPP-Net also achieve
considerably-good performance on ImageNet as well.Comment: 13 pages 9 figures, ECCV 2018 Camera Read
M3A: Model, MetaModel, and Anomaly Detection in Web Searches
'Alice' is submitting one web search per five minutes, for three hours in a
row - is it normal? How to detect abnormal search behaviors, among Alice and
other users? Is there any distinct pattern in Alice's (or other users') search
behavior? We studied what is probably the largest, publicly available, query
log that contains more than 30 million queries from 0.6 million users. In this
paper, we present a novel, user-and group-level framework, M3A: Model,
MetaModel and Anomaly detection. For each user, we discover and explain a
surprising, bi-modal pattern of the inter-arrival time (IAT) of landed queries
(queries with user click-through). Specifically, the model Camel-Log is
proposed to describe such an IAT distribution; we then notice the correlations
among its parameters at the group level. Thus, we further propose the metamodel
Meta-Click, to capture and explain the two-dimensional, heavy-tail distribution
of the parameters. Combining Camel-Log and Meta-Click, the proposed M3A has the
following strong points: (1) the accurate modeling of marginal IAT
distribution, (2) quantitative interpretations, and (3) anomaly detection.Comment: 10 pages, 10 figures, 3 table
COCO-GAN: Generation by Parts via Conditional Coordinating
Humans can only interact with part of the surrounding environment due to
biological restrictions. Therefore, we learn to reason the spatial
relationships across a series of observations to piece together the surrounding
environment. Inspired by such behavior and the fact that machines also have
computational constraints, we propose \underline{CO}nditional
\underline{CO}ordinate GAN (COCO-GAN) of which the generator generates images
by parts based on their spatial coordinates as the condition. On the other
hand, the discriminator learns to justify realism across multiple assembled
patches by global coherence, local appearance, and edge-crossing continuity.
Despite the full images are never generated during training, we show that
COCO-GAN can produce \textbf{state-of-the-art-quality} full images during
inference. We further demonstrate a variety of novel applications enabled by
teaching the network to be aware of coordinates. First, we perform
extrapolation to the learned coordinate manifold and generate off-the-boundary
patches. Combining with the originally generated full image, COCO-GAN can
produce images that are larger than training samples, which we called
"beyond-boundary generation". We then showcase panorama generation within a
cylindrical coordinate system that inherently preserves horizontally cyclic
topology. On the computation side, COCO-GAN has a built-in divide-and-conquer
paradigm that reduces memory requisition during training and inference,
provides high-parallelism, and can generate parts of images on-demand.Comment: Accepted to ICCV'19 (oral). All images are compressed due to size
limit, please access the full-resolution version via Google Drive:
http://bit.ly/COCO-GAN-ful
Graph Autoencoders with Deconvolutional Networks
Recent studies have indicated that Graph Convolutional Networks (GCNs) act as
a \emph{low pass} filter in spectral domain and encode smoothed node
representations. In this paper, we consider their opposite, namely Graph
Deconvolutional Networks (GDNs) that reconstruct graph signals from smoothed
node representations. We motivate the design of Graph Deconvolutional Networks
via a combination of inverse filters in spectral domain and de-noising layers
in wavelet domain, as the inverse operation results in a \emph{high pass}
filter and may amplify the noise. Based on the proposed GDN, we further propose
a graph autoencoder framework that first encodes smoothed graph representations
with GCN and then decodes accurate graph signals with GDN. We demonstrate the
effectiveness of the proposed method on several tasks including unsupervised
graph-level representation , social recommendation and graph generatio
Improving Adversarial Robustness via Guided Complement Entropy
Adversarial robustness has emerged as an important topic in deep learning as
carefully crafted attack samples can significantly disturb the performance of a
model. Many recent methods have proposed to improve adversarial robustness by
utilizing adversarial training or model distillation, which adds additional
procedures to model training. In this paper, we propose a new training paradigm
called Guided Complement Entropy (GCE) that is capable of achieving
"adversarial defense for free," which involves no additional procedures in the
process of improving adversarial robustness. In addition to maximizing model
probabilities on the ground-truth class like cross-entropy, we neutralize its
probabilities on the incorrect classes along with a "guided" term to balance
between these two terms. We show in the experiments that our method achieves
better model robustness with even better performance compared to the commonly
used cross-entropy training objective. We also show that our method can be used
orthogonal to adversarial training across well-known methods with noticeable
robustness gain. To the best of our knowledge, our approach is the first one
that improves model robustness without compromising performance.Comment: ICCV'19 Camera Read
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Achieving state-of-the-art performance on natural language understanding
tasks typically relies on fine-tuning a fresh model for every task.
Consequently, this approach leads to a higher overall parameter cost, along
with higher technical maintenance for serving multiple models. Learning a
single multi-task model that is able to do well for all the tasks has been a
challenging and yet attractive proposition. In this paper, we propose
\textsc{HyperGrid}, a new approach for highly effective multi-task learning.
The proposed approach is based on a decomposable hypernetwork that learns
grid-wise projections that help to specialize regions in weight matrices for
different tasks. In order to construct the proposed hypernetwork, our method
learns the interactions and composition between a global (task-agnostic) state
and a local task-specific state. We apply our proposed \textsc{HyperGrid} on
the current state-of-the-art T5 model, demonstrating strong performance across
the GLUE and SuperGLUE benchmarks when using only a single multi-task model.
Our method helps bridge the gap between fine-tuning and multi-task learning
approaches
Sparse Sinkhorn Attention
We propose Sparse Sinkhorn Attention, a new efficient and sparse method for
learning to attend. Our method is based on differentiable sorting of internal
representations. Concretely, we introduce a meta sorting network that learns to
generate latent permutations over sequences. Given sorted sequences, we are
then able to compute quasi-global attention with only local windows, improving
the memory efficiency of the attention module. To this end, we propose new
algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a
dynamic sequence truncation method for tailoring Sinkhorn Attention for
encoding and/or decoding purposes. Via extensive experiments on algorithmic
seq2seq sorting, language modeling, pixel-wise image generation, document
classification and natural language inference, we demonstrate that our memory
efficient Sinkhorn Attention method is competitive with vanilla attention and
consistently outperforms recently proposed efficient Transformer models such as
Sparse Transformers
Complement Objective Training
Learning with a primary objective, such as softmax cross entropy for
classification and sequence generation, has been the norm for training deep
neural networks for years. Although being a widely-adopted approach, using
cross entropy as the primary objective exploits mostly the information from the
ground-truth class for maximizing data likelihood, and largely ignores
information from the complement (incorrect) classes. We argue that, in addition
to the primary objective, training also using a complement objective that
leverages information from the complement classes can be effective in improving
model performance. This motivates us to study a new training paradigm that
maximizes the likelihood of the groundtruth class while neutralizing the
probabilities of the complement classes. We conduct extensive experiments on
multiple tasks ranging from computer vision to natural language understanding.
The experimental results confirm that, compared to the conventional training
with just one primary objective, training also with the complement objective
further improves the performance of the state-of-the-art models across all
tasks. In addition to the accuracy improvement, we also show that models
trained with both primary and complement objectives are more robust to
single-step adversarial attacks.Comment: ICLR'19 Camera Read
Adversarial Robustness Across Representation Spaces
Adversarial robustness corresponds to the susceptibility of deep neural
networks to imperceptible perturbations made at test time. In the context of
image tasks, many algorithms have been proposed to make neural networks robust
to adversarial perturbations made to the input pixels. These perturbations are
typically measured in an norm. However, robustness often holds only
for the specific attack used for training. In this work we extend the above
setting to consider the problem of training of deep neural networks that can be
made simultaneously robust to perturbations applied in multiple natural
representation spaces. For the case of image data, examples include the
standard pixel representation as well as the representation in the discrete
cosine transform~(DCT) basis. We design a theoretically sound algorithm with
formal guarantees for the above problem. Furthermore, our guarantees also hold
when the goal is to require robustness with respect to multiple norm
based attacks. We then derive an efficient practical implementation and
demonstrate the effectiveness of our approach on standard datasets for image
classification
- …