21,888 research outputs found
Techniques for Interpretable Machine Learning
Interpretable machine learning tackles the important problem that humans
cannot understand the behaviors of complex machine learning models and how
these models arrive at a particular decision. Although many approaches have
been proposed, a comprehensive understanding of the achievements and challenges
is still lacking. We provide a survey covering existing techniques to increase
the interpretability of machine learning models. We also discuss crucial issues
that the community should consider in future work such as designing
user-friendly explanations and developing comprehensive evaluation metrics to
further push forward the area of interpretable machine learning.Comment: Accepted by Communications of the ACM (CACM), Review Articl
Dynamic Sparse Graph for Efficient Deep Learning
We propose to execute deep neural networks (DNNs) with dynamic and sparse
graph (DSG) structure for compressive memory and accelerative execution during
both training and inference. The great success of DNNs motivates the pursuing
of lightweight models for the deployment onto embedded devices. However, most
of the previous studies optimize for inference while neglect training or even
complicate it. Training is far more intractable, since (i) the neurons dominate
the memory cost rather than the weights in inference; (ii) the dynamic
activation makes previous sparse acceleration via one-off optimization on fixed
weight invalid; (iii) batch normalization (BN) is critical for maintaining
accuracy while its activation reorganization damages the sparsity. To address
these issues, DSG activates only a small amount of neurons with high
selectivity at each iteration via a dimension-reduction search (DRS) and
obtains the BN compatibility via a double-mask selection (DMS). Experiments
show significant memory saving (1.7-4.5x) and operation reduction (2.3-4.4x)
with little accuracy loss on various benchmarks.Comment: ICLR 201
Inductive Guided Filter: Real-time Deep Image Matting with Weakly Annotated Masks on Mobile Devices
Recently, significant progress has been achieved in deep image matting. Most
of the classical image matting methods are time-consuming and require an ideal
trimap which is difficult to attain in practice. A high efficient image matting
method based on a weakly annotated mask is in demand for mobile applications.
In this paper, we propose a novel method based on Deep Learning and Guided
Filter, called Inductive Guided Filter, which can tackle the real-time general
image matting task on mobile devices. We design a lightweight hourglass network
to parameterize the original Guided Filter method that takes an image and a
weakly annotated mask as input. Further, the use of Gabor loss is proposed for
training networks for complicated textures in image matting. Moreover, we
create an image matting dataset MAT-2793 with a variety of foreground objects.
Experimental results demonstrate that our proposed method massively reduces
running time with robust accuracy
PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking
Localized adversarial patches aim to induce misclassification in machine
learning models by arbitrarily modifying pixels within a restricted region of
an image. Such attacks can be realized in the physical world by attaching the
adversarial patch to the object to be misclassified, and defending against such
attacks is an unsolved/open problem. In this paper, we propose a general
defense framework called PatchGuard that can achieve high provable robustness
while maintaining high clean accuracy against localized adversarial patches.
The cornerstone of PatchGuard involves the use of CNNs with small receptive
fields to impose a bound on the number of features corrupted by an adversarial
patch. Given a bounded number of corrupted features, the problem of designing
an adversarial patch defense reduces to that of designing a secure feature
aggregation mechanism. Towards this end, we present our robust masking defense
that robustly detects and masks corrupted features to recover the correct
prediction. Notably, we can prove the robustness of our defense against any
adversary within our threat model. Our extensive evaluation on ImageNet,
ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates
that our defense achieves state-of-the-art performance in terms of both
provable robust accuracy and clean accuracy.Comment: USENIX Security Symposium 2021; extended technical repor
Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding
Learning to estimate 3D geometry in a single frame and optical flow from
consecutive frames by watching unlabeled videos via deep convolutional network
has made significant progress recently. Current state-of-the-art (SoTA) methods
treat the two tasks independently. One typical assumption of the existing depth
estimation methods is that the scenes contain no independent moving objects.
while object moving could be easily modeled using optical flow. In this paper,
we propose to address the two tasks as a whole, i.e. to jointly understand
per-pixel 3D geometry and motion. This eliminates the need of static scene
assumption and enforces the inherent geometrical consistency during the
learning process, yielding significantly improved results for both tasks. We
call our method as "Every Pixel Counts++" or "EPC++". Specifically, during
training, given two consecutive frames from a video, we adopt three parallel
networks to predict the camera motion (MotionNet), dense depth map (DepthNet),
and per-pixel optical flow between two frames (OptFlowNet) respectively. The
three types of information are fed into a holistic 3D motion parser (HMP), and
per-pixel 3D motion of both rigid background and moving objects are
disentangled and recovered. Comprehensive experiments were conducted on
datasets with different scenes, including driving scenario (KITTI 2012 and
KITTI 2015 datasets), mixed outdoor/indoor scenes (Make3D) and synthetic
animation (MPI Sintel dataset). Performance on the five tasks of depth
estimation, optical flow estimation, odometry, moving object segmentation and
scene flow estimation shows that our approach outperforms other SoTA methods.
Code will be available at: https://github.com/chenxuluo/EPC.Comment: Chenxu Luo, Zhenheng Yang, and Peng Wang contributed equally, TPAMI
submissio
Deep Adversarial Training for Multi-Organ Nuclei Segmentation in Histopathology Images
Nuclei segmentation is a fundamental task that is critical for various
computational pathology applications including nuclei morphology analysis, cell
type classification, and cancer grading. Conventional vision-based methods for
nuclei segmentation struggle in challenging cases and deep learning approaches
have proven to be more robust and generalizable. However, CNNs require large
amounts of labeled histopathology data. Moreover, conventional CNN-based
approaches lack structured prediction capabilities which are required to
distinguish overlapping and clumped nuclei. Here, we present an approach to
nuclei segmentation that overcomes these challenges by utilizing a conditional
generative adversarial network (cGAN) trained with synthetic and real data. We
generate a large dataset of H&E training images with perfect nuclei
segmentation labels using an unpaired GAN framework. This synthetic data along
with real histopathology data from six different organs are used to train a
conditional GAN with spectral normalization and gradient penalty for nuclei
segmentation. This adversarial regression framework enforces higher order
consistency when compared to conventional CNN models. We demonstrate that this
nuclei segmentation approach generalizes across different organs, sites,
patients and disease states, and outperforms conventional approaches,
especially in isolating individual and overlapping nuclei
Deep Joint Task Learning for Generic Object Extraction
This paper investigates how to extract objects-of-interest without relying on
hand-craft features and sliding windows approaches, that aims to jointly solve
two sub-tasks: (i) rapidly localizing salient objects from images, and (ii)
accurately segmenting the objects based on the localizations. We present a
general joint task learning framework, in which each task (either object
localization or object segmentation) is tackled via a multi-layer convolutional
neural network, and the two networks work collaboratively to boost performance.
In particular, we propose to incorporate latent variables bridging the two
networks in a joint optimization manner. The first network directly predicts
the positions and scales of salient objects from raw images, and the latent
variables adjust the object localizations to feed the second network that
produces pixelwise object masks. An EM-type method is presented for the
optimization, iterating with two steps: (i) by using the two networks, it
estimates the latent variables by employing an MCMC-based sampling method; (ii)
it optimizes the parameters of the two networks unitedly via back propagation,
with the fixed latent variables. Extensive experiments suggest that our
framework significantly outperforms other state-of-the-art approaches in both
accuracy and efficiency (e.g. 1000 times faster than competing approaches).Comment: 9 pages, 4 figures, NIPS 201
Crossbar-aware neural network pruning
Crossbar architecture based devices have been widely adopted in neural
network accelerators by taking advantage of the high efficiency on
vector-matrix multiplication (VMM) operations. However, in the case of
convolutional neural networks (CNNs), the efficiency is compromised
dramatically due to the large amounts of data reuse. Although some mapping
methods have been designed to achieve a balance between the execution
throughput and resource overhead, the resource consumption cost is still huge
while maintaining the throughput.
Network pruning is a promising and widely studied leverage to shrink the
model size. Whereas, previous work didn`t consider the crossbar architecture
and the corresponding mapping method, which cannot be directly utilized by
crossbar-based neural network accelerators. Tightly combining the crossbar
structure and its mapping, this paper proposes a crossbar-aware pruning
framework based on a formulated L0-norm constrained optimization problem.
Specifically, we design an L0-norm constrained gradient descent (LGD) with
relaxant probabilistic projection (RPP) to solve this problem. Two grains of
sparsity are successfully achieved: i) intuitive crossbar-grain sparsity and
ii) column-grain sparsity with output recombination, based on which we further
propose an input feature maps (FMs) reorder method to improve the model
accuracy. We evaluate our crossbar-aware pruning framework on median-scale
CIFAR10 dataset and large-scale ImageNet dataset with VGG and ResNet models.
Our method is able to reduce the crossbar overhead by 44%-72% with little
accuracy degradation. This work greatly saves the resource and the related
energy cost, which provides a new co-design solution for mapping CNNs onto
various crossbar devices with significantly higher efficiency
MCNE: An End-to-End Framework for Learning Multiple Conditional Network Representations of Social Network
Recently, the Network Representation Learning (NRL) techniques, which
represent graph structure via low-dimension vectors to support social-oriented
application, have attracted wide attention. Though large efforts have been
made, they may fail to describe the multiple aspects of similarity between
social users, as only a single vector for one unique aspect has been
represented for each node. To that end, in this paper, we propose a novel
end-to-end framework named MCNE to learn multiple conditional network
representations, so that various preferences for multiple behaviors could be
fully captured. Specifically, we first design a binary mask layer to divide the
single vector as conditional embeddings for multiple behaviors. Then, we
introduce the attention network to model interaction relationship among
multiple preferences, and further utilize the adapted message sending and
receiving operation of graph neural network, so that multi-aspect preference
information from high-order neighbors will be captured. Finally, we utilize
Bayesian Personalized Ranking loss function to learn the preference similarity
on each behavior, and jointly learn multiple conditional node embeddings via
multi-task learning framework. Extensive experiments on public datasets
validate that our MCNE framework could significantly outperform several
state-of-the-art baselines, and further support the visualization and transfer
learning tasks with excellent interpretability and robustness.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'19
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data
Scene understanding of high resolution aerial images is of great importance
for the task of automated monitoring in various remote sensing applications.
Due to the large within-class and small between-class variance in pixel values
of objects of interest, this remains a challenging task. In recent years, deep
convolutional neural networks have started being used in remote sensing
applications and demonstrate state of the art performance for pixel level
classification of objects. \textcolor{black}{Here we propose a reliable
framework for performant results for the task of semantic segmentation of
monotemporal very high resolution aerial images. Our framework consists of a
novel deep learning architecture, ResUNet-a, and a novel loss function based on
the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination
with residual connections, atrous convolutions, pyramid scene parsing pooling
and multi-tasking inference. ResUNet-a infers sequentially the boundary of the
objects, the distance transform of the segmentation mask, the segmentation mask
and a colored reconstruction of the input. Each of the tasks is conditioned on
the inference of the previous ones, thus establishing a conditioned
relationship between the various tasks, as this is described through the
architecture's computation graph. We analyse the performance of several
flavours of the Generalized Dice loss for semantic segmentation, and we
introduce a novel variant loss function for semantic segmentation of objects
that has excellent convergence properties and behaves well even under the
presence of highly imbalanced classes.} The performance of our modeling
framework is evaluated on the ISPRS 2D Potsdam dataset. Results show
state-of-the-art performance with an average F1 score of 92.9\% over all
classes for our best model.Comment: Accepted for publication to the ISPRS Journal of Photogrammetry and
Remote Sensin
- …