2,683 research outputs found
Boosting CNN beyond Label in Inverse Problems
Convolutional neural networks (CNN) have been extensively used for inverse
problems. However, their prediction error for unseen test data is difficult to
estimate a priori since the neural networks are trained using only selected
data and their architecture are largely considered a blackbox. This poses a
fundamental challenge to neural networks for unsupervised learning or
improvement beyond the label. In this paper, we show that the recent
unsupervised learning methods such as Noise2Noise, Stein's unbiased risk
estimator (SURE)-based denoiser, and Noise2Void are closely related to each
other in their formulation of an unbiased estimator of the prediction error,
but each of them are associated with its own limitations. Based on these
observations, we provide a novel boosting estimator for the prediction error.
In particular, by employing combinatorial convolutional frame representation of
encoder-decoder CNN and synergistically combining it with the batch
normalization, we provide a close form formulation for the unbiased estimator
of the prediction error that can be minimized for neural network training
beyond the label. Experimental results show that the resulting algorithm, what
we call Noise2Boosting, provides consistent improvement in various inverse
problems under both supervised and unsupervised learning setting
Text Classification Algorithms: A Survey
In recent years, there has been an exponential growth in the number of
complex documents and texts that require a deeper understanding of machine
learning methods to be able to accurately classify texts in many applications.
Many machine learning approaches have achieved surpassing results in natural
language processing. The success of these learning algorithms relies on their
capacity to understand complex models and non-linear relationships within data.
However, finding suitable structures, architectures, and techniques for text
classification is a challenge for researchers. In this paper, a brief overview
of text classification algorithms is discussed. This overview covers different
text feature extractions, dimensionality reduction methods, existing algorithms
and techniques, and evaluations methods. Finally, the limitations of each
technique and their application in the real-world problem are discussed
Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network
Because of the internal malfunction of satellite sensors and poor atmospheric
conditions such as thick cloud, the acquired remote sensing data often suffer
from missing information, i.e., the data usability is greatly reduced. In this
paper, a novel method of missing information reconstruction in remote sensing
images is proposed. The unified spatial-temporal-spectral framework based on a
deep convolutional neural network (STS-CNN) employs a unified deep
convolutional neural network combined with spatial-temporal-spectral
supplementary information. In addition, to address the fact that most methods
can only deal with a single missing information reconstruction task, the
proposed approach can solve three typical missing information reconstruction
tasks: 1) dead lines in Aqua MODIS band 6; 2) the Landsat ETM+ Scan Line
Corrector (SLC)-off problem; and 3) thick cloud removal. It should be noted
that the proposed model can use multi-source data (spatial, spectral, and
temporal) as the input of the unified framework. The results of both simulated
and real-data experiments demonstrate that the proposed model exhibits high
effectiveness in the three missing information reconstruction tasks listed
above.Comment: To be published in IEEE Transactions on Geoscience and Remote Sensin
RARE: Image Reconstruction using Deep Priors Learned without Ground Truth
Regularization by denoising (RED) is an image reconstruction framework that
uses an image denoiser as a prior. Recent work has shown the state-of-the-art
performance of RED with learned denoisers corresponding to pre-trained
convolutional neural nets (CNNs). In this work, we propose to broaden the
current denoiser-centric view of RED by considering priors corresponding to
networks trained for more general artifact-removal. The key benefit of the
proposed family of algorithms, called regularization by artifact-removal
(RARE), is that it can leverage priors learned on datasets containing only
undersampled measurements. This makes RARE applicable to problems where it is
practically impossible to have fully-sampled groundtruth data for training. We
validate RARE on both simulated and experimentally collected data by
reconstructing a free-breathing whole-body 3D MRIs into ten respiratory phases
from heavily undersampled k-space measurements. Our results corroborate the
potential of learning regularizers for iterative inversion directly on
undersampled and noisy measurements.Comment: In press for IEEE Journal of Special Topics in Signal Processin
Robust Visual Knowledge Transfer via EDA
We address the problem of visual knowledge adaptation by leveraging labeled
patterns from source domain and a very limited number of labeled instances in
target domain to learn a robust classifier for visual categorization. This
paper proposes a new extreme learning machine based cross-domain network
learning framework, that is called Extreme Learning Machine (ELM) based Domain
Adaptation (EDA). It allows us to learn a category transformation and an ELM
classifier with random projection by minimizing the l_(2,1)-norm of the network
output weights and the learning error simultaneously. The unlabeled target
data, as useful knowledge, is also integrated as a fidelity term to guarantee
the stability during cross domain learning. It minimizes the matching error
between the learned classifier and a base classifier, such that many existing
classifiers can be readily incorporated as base classifiers. The network output
weights cannot only be analytically determined, but also transferrable.
Additionally, a manifold regularization with Laplacian graph is incorporated,
such that it is beneficial to semi-supervised learning. Extensively, we also
propose a model of multiple views, referred as MvEDA. Experiments on benchmark
visual datasets for video event recognition and object recognition, demonstrate
that our EDA methods outperform existing cross-domain learning methods.Comment: This paper has been accepted for publication in IEEE Transactions on
Image Processin
Learning to Hash for Indexing Big Data - A Survey
The explosive growth in big data has attracted much attention in designing
efficient indexing and search methods recently. In many critical applications
such as large-scale search and pattern matching, finding the nearest neighbors
to a query is a fundamental research problem. However, the straightforward
solution using exhaustive comparison is infeasible due to the prohibitive
computational complexity and memory requirement. In response, Approximate
Nearest Neighbor (ANN) search based on hashing techniques has become popular
due to its promising performance in both efficiency and accuracy. Prior
randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore
data-independent hash functions with random projections or permutations.
Although having elegant theoretic guarantees on the search quality in certain
metric spaces, performance of randomized hashing has been shown insufficient in
many real-world applications. As a remedy, new approaches incorporating
data-driven learning methods in development of advanced hash functions have
emerged. Such learning to hash methods exploit information such as data
distributions or class labels when optimizing the hash codes or functions.
Importantly, the learned hash codes are able to preserve the proximity of
neighboring data in the original feature spaces in the hash code spaces. The
goal of this paper is to provide readers with systematic understanding of
insights, pros and cons of the emerging techniques. We provide a comprehensive
survey of the learning to hash framework and representative techniques of
various types, including unsupervised, semi-supervised, and supervised. In
addition, we also summarize recent hashing approaches utilizing the deep
learning models. Finally, we discuss the future direction and trends of
research in this area
Scene Parsing with Integration of Parametric and Non-parametric Models
We adopt Convolutional Neural Networks (CNNs) to be our parametric model to
learn discriminative features and classifiers for local patch classification.
Based on the occurrence frequency distribution of classes, an ensemble of CNNs
(CNN-Ensemble) are learned, in which each CNN component focuses on learning
different and complementary visual patterns. The local beliefs of pixels are
output by CNN-Ensemble. Considering that visually similar pixels are
indistinguishable under local context, we leverage the global scene semantics
to alleviate the local ambiguity. The global scene constraint is mathematically
achieved by adding a global energy term to the labeling energy function, and it
is practically estimated in a non-parametric framework. A large margin based
CNN metric learning method is also proposed for better global belief
estimation. In the end, the integration of local and global beliefs gives rise
to the class likelihood of pixels, based on which maximum marginal inference is
performed to generate the label prediction maps. Even without any
post-processing, we achieve state-of-the-art results on the challenging
SiftFlow and Barcelona benchmarks.Comment: 13 Pages, 6 figures, IEEE Transactions on Image Processing (T-IP)
201
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
Many practical perception systems exist within larger processes that include
interactions with users or additional components capable of evaluating the
quality of predicted solutions. In these contexts, it is beneficial to provide
these oracle mechanisms with multiple highly likely hypotheses rather than a
single prediction. In this work, we pose the task of producing multiple outputs
as a learning problem over an ensemble of deep networks -- introducing a novel
stochastic gradient descent based approach to minimize the loss with respect to
an oracle. Our method is simple to implement, agnostic to both architecture and
loss function, and parameter-free. Our approach achieves lower oracle error
compared to existing methods on a wide range of tasks and deep architectures.
We also show qualitatively that the diverse solutions produced often provide
interpretable representations of task ambiguity
A novel method for extracting interpretable knowledge from a spiking neural classifier with time-varying synaptic weights
This paper presents a novel method for information interpretability in an
MC-SEFRON classifier. To develop a method to extract knowledge stored in a
trained classifier, first, the binary-class SEFRON classifier developed earlier
is extended to handle multi-class problems. MC-SEFRON uses the population
encoding scheme to encode the real-valued input data into spike patterns.
MC-SEFRON is trained using the same supervised learning rule used in the
SEFRON. After training, the proposed method extracts the knowledge for a given
class stored in the classifier by mapping the weighted postsynaptic potential
in the time domain to the feature domain as Feature Strength Functions (FSFs).
A set of FSFs corresponding to each output class represents the extracted
knowledge from the classifier. This knowledge encoding method is derived to
maintain consistency between the classification in the time domain and the
feature domain. The correctness of the FSF is quantitatively measured by using
FSF directly for classification tasks. For a given input, each FSF is sampled
at the input value to obtain the corresponding feature strength value (FSV).
Then the aggregated FSVs obtained for each class are used to determine the
output class labels during classification. FSVs are also used to interpret the
predictions during the classification task. Using ten UCI datasets and the
MNIST dataset, the knowledge extraction method, interpretation and the
reliability of the FSF are demonstrated. Based on the studies, it can be seen
that on an average, the difference in the classification accuracies using the
FSF directly and those obtained by MC-SEFRON is only around 0.9% & 0.1\% for
the UCI datasets and the MNIST dataset respectively. This clearly shows that
the knowledge represented by the FSFs has acceptable reliability and the
interpretability of classification using the classifier's knowledge has been
justified.Comment: 16 pages, 6 figure
Bilinear CNNs for Fine-grained Visual Recognition
We present a simple and effective architecture for fine-grained visual
recognition called Bilinear Convolutional Neural Networks (B-CNNs). These
networks represent an image as a pooled outer product of features derived from
two CNNs and capture localized feature interactions in a translationally
invariant manner. B-CNNs belong to the class of orderless texture
representations but unlike prior work they can be trained in an end-to-end
manner. Our most accurate model obtains 84.1%, 79.4%, 86.9% and 91.3% per-image
accuracy on the Caltech-UCSD birds [67], NABirds [64], FGVC aircraft [42], and
Stanford cars [33] dataset respectively and runs at 30 frames-per-second on a
NVIDIA Titan X GPU. We then present a systematic analysis of these networks and
show that (1) the bilinear features are highly redundant and can be reduced by
an order of magnitude in size without significant loss in accuracy, (2) are
also effective for other image classification tasks such as texture and scene
recognition, and (3) can be trained from scratch on the ImageNet dataset
offering consistent improvements over the baseline architecture. Finally, we
present visualizations of these models on various datasets using top
activations of neural units and gradient-based inversion techniques. The source
code for the complete system is available at http://vis-www.cs.umass.edu/bcnn
- …