51 research outputs found
Utilizing Class Information for Deep Network Representation Shaping
Statistical characteristics of deep network representations, such as sparsity
and correlation, are known to be relevant to the performance and
interpretability of deep learning. When a statistical characteristic is
desired, often an adequate regularizer can be designed and applied during the
training phase. Typically, such a regularizer aims to manipulate a statistical
characteristic over all classes together. For classification tasks, however, it
might be advantageous to enforce the desired characteristic per class such that
different classes can be better distinguished. Motivated by the idea, we design
two class-wise regularizers that explicitly utilize class information:
class-wise Covariance Regularizer (cw-CR) and class-wise Variance Regularizer
(cw-VR). cw-CR targets to reduce the covariance of representations calculated
from the same class samples for encouraging feature independence. cw-VR is
similar, but variance instead of covariance is targeted to improve feature
compactness. For the sake of completeness, their counterparts without using
class information, Covariance Regularizer (CR) and Variance Regularizer (VR),
are considered together. The four regularizers are conceptually simple and
computationally very efficient, and the visualization shows that the
regularizers indeed perform distinct representation shaping. In terms of
classification performance, significant improvements over the baseline and
L1/L2 weight regularization methods were found for 21 out of 22 tasks over
popular benchmark datasets. In particular, cw-VR achieved the best performance
for 13 tasks including ResNet-32/110.Comment: Published in AAAI 201
DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting
Traffic speed forecasting is one of the core problems in Intelligent
Transportation Systems. For a more accurate prediction, recent studies started
using not only the temporal speed patterns but also the spatial information on
the road network through the graph convolutional networks. Even though the road
network is highly complex due to its non-Euclidean and directional
characteristics, previous approaches mainly focus on modeling the spatial
dependencies only with the distance. In this paper, we identify two essential
spatial dependencies in traffic forecasting in addition to distance, direction
and positional relationship, for designing basic graph elements as the smallest
building blocks. Using the building blocks, we suggest DDP-GCN (Distance,
Direction, and Positional relationship Graph Convolutional Network) to
incorporate the three spatial relationships into prediction network for traffic
forecasting. We evaluate the proposed model with two large-scale real-world
datasets, and find 7.40% average improvement for 1-hour forecasting in highly
complex urban networks
A Differentiable Framework for End-to-End Learning of Hybrid Structured Compression
Filter pruning and low-rank decomposition are two of the foundational
techniques for structured compression. Although recent efforts have explored
hybrid approaches aiming to integrate the advantages of both techniques, their
performance gains have been modest at best. In this study, we develop a
\textit{Differentiable Framework~(DF)} that can express filter selection, rank
selection, and budget constraint into a single analytical formulation. Within
the framework, we introduce DML-S for filter selection, integrating scheduling
into existing mask learning techniques. Additionally, we present DTL-S for rank
selection, utilizing a singular value thresholding operator. The framework with
DML-S and DTL-S offers a hybrid structured compression methodology that
facilitates end-to-end learning through gradient-base optimization.
Experimental results demonstrate the efficacy of DF, surpassing
state-of-the-art structured compression methods. Our work establishes a robust
and versatile avenue for advancing structured compression techniques.Comment: 11 pages, 5 figures, 6 table
Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks
Compared to the traditional machine learning models, deep neural networks (DNN) are known to be highly sensitive to the choice of hyperparameters. While the required time and effort for manual tuning has been rapidly decreasing for the well developed and commonly used DNN architectures, undoubtedly DNN hyperparameter optimization will continue to be a major burden whenever a new DNN architecture needs to be designed, a new task needs to be solved, a new dataset needs to be addressed, or an existing DNN needs to be improved further. For hyperparameter optimization of general machine learning problems, numerous automated solutions have been developed where some of the most popular solutions are based on Bayesian Optimization (BO). In this work, we analyze four fundamental strategies for enhancing BO when it is used for DNN hyperparameter optimization. Specifically, diversification, early termination, parallelization, and cost function transformation are investigated. Based on the analysis, we provide a simple yet robust algorithm for DNN hyperparameter optimization - DEEP-BO (Diversified, Early-termination-Enabled, and Parallel Bayesian Optimization). When evaluated over six DNN benchmarks, DEEP-BO mostly outperformed well-known solutions including GP-Hedge, BOHB, and the speed-up variants that use Median Stopping Rule or Learning Curve Extrapolation. In fact, DEEP-BO consistently provided the top, or at least close to the top, performance over all the benchmark types that we have tested. This indicates that DEEP-BO is a robust solution compared to the existing solutions. The DEEP-BO code is publicly available at <uri>https://github.com/snu-adsl/DEEP-BO</uri>
On-Off Pattern Encoding and Path-Count Encoding as Deep Neural Network Representations
Understanding the encoded representation of Deep Neural Networks (DNNs) has
been a fundamental yet challenging objective. In this work, we focus on two
possible directions for analyzing representations of DNNs by studying simple
image classification tasks. Specifically, we consider \textit{On-Off pattern}
and \textit{PathCount} for investigating how information is stored in deep
representations. On-off pattern of a neuron is decided as `on' or `off'
depending on whether the neuron's activation after ReLU is non-zero or zero.
PathCount is the number of paths that transmit non-zero energy from the input
to a neuron. We investigate how neurons in the network encodes information by
replacing each layer's activation with On-Off pattern or PathCount and
evaluating its effect on classification performance. We also examine
correlation between representation and PathCount. Finally, we show a possible
way to improve an existing DNN interpretation method, Class Activation Map
(CAM), by directly utilizing On-Off or PathCount.Comment: 8 pages, 4 figure
Evaluating Feature Attribution Methods for Electrocardiogram
The performance of cardiac arrhythmia detection with electrocardiograms(ECGs)
has been considerably improved since the introduction of deep learning models.
In practice, the high performance alone is not sufficient and a proper
explanation is also required. Recently, researchers have started adopting
feature attribution methods to address this requirement, but it has been
unclear which of the methods are appropriate for ECG. In this work, we identify
and customize three evaluation metrics for feature attribution methods based on
the characteristics of ECG: localization score, pointing game, and degradation
score. Using the three evaluation metrics, we evaluate and analyze eleven
widely-used feature attribution methods. We find that some of the feature
attribution methods are much more adequate for explaining ECG, where Grad-CAM
outperforms the second-best method by a large margin.Comment: 5 pages, 3 figures. Code is available at
https://github.com/SNU-DRL/Attribution-EC
- …