5,211 research outputs found
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
We propose an approach to discover class-specific pixels for the
weakly-supervised semantic segmentation task. We show that properly combining
saliency and attention maps allows us to obtain reliable cues capable of
significantly boosting the performance. First, we propose a simple yet powerful
hierarchical approach to discover the class-agnostic salient regions, obtained
using a salient object detector, which otherwise would be ignored. Second, we
use fully convolutional attention maps to reliably localize the class-specific
regions in a given image. We combine these two cues to discover class-specific
pixels which are then used as an approximate ground truth for training a CNN.
While solving the weakly supervised semantic segmentation task, we ensure that
the image-level classification task is also solved in order to enforce the CNN
to assign at least one pixel to each object present in the image.
Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of
60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared to
the published state-of-the-art results. The code is made publicly available
Robust Real-Time Recognition of Action Sequences Using a Multi-Camera Network
Real-time identification of human activities in urban environments is increasingly becoming important in the context of public safety and national security. Distributed camera networks that provide multiple views of a scene are ideally suited for real-time action recognition. However, deployments of multi-camera based real-time action recognition systems have thus far been inhibited because of several practical issues and restrictive assumptions that are typically made such as the knowledge of a subjects orientation with respect to the cameras, the duration of each action and the conformation of a network deployment during the testing phase to that of a training deployment. In reality, action recognition involves classification of continuously streaming data from multiple views which consists of an interleaved sequence of various human actions. While there has been extensive research on machine learning techniques for action recognition from a single view, the issues arising in the fusion of data from multiple views for reliable action recognition have not received as much attention. In this thesis, I have developed a fusion framework for human action recognition using a multi-camera network that addresses these practical issues of unknown subject orientation, unknown view configurations, action interleaving and variable duration actions.;The proposed framework consists of two components: (1) a score-fusion technique that utilizes underlying view-specific supervised learning classifiers to classify an action within a given set of frames and (2) a sliding window technique that is used to parse a sequence of frames into multiple actions. The use of a score-fusion technique as opposed to a feature-level fusion of data from multiple views allows us to robustly classify actions even when camera configurations are arbitrary and different from training phase and at the same time reduces the required network bandwidth for data transmission permitting wireless deployments. Moreover, the proposed framework is independent of the underlying classifier that is used to generate scores for each action snippet and thus offers more flexibility compared to sequential approaches like Hidden Markov Models. The amount of training and parameterization is also significantly lower compared to HMM-based approaches. This Real-Time recognition system has been tested on 4 classifiers which are Linear Discriminant Analysis, Multinomial Naive Bayes, Logistic Regression and Support Vector Machines. Over 90% accuracy has been achieved by this system in Real-Time recognizing variable duration actions performed by the subject. The performance of the system is also shown to be robust to camera failures
Multimodal Subspace Support Vector Data Description
In this paper, we propose a novel method for projecting data from multiple
modalities to a new subspace optimized for one-class classification. The
proposed method iteratively transforms the data from the original feature space
of each modality to a new common feature space along with finding a joint
compact description of data coming from all the modalities. For data in each
modality, we define a separate transformation to map the data from the
corresponding feature space to the new optimized subspace by exploiting the
available information from the class of interest only. We also propose
different regularization strategies for the proposed method and provide both
linear and non-linear formulations. The proposed Multimodal Subspace Support
Vector Data Description outperforms all the competing methods using data from a
single modality or fusing data from all modalities in four out of five
datasets.Comment: 26 pages manuscript (6 tables, 2 figures), 24 pages supplementary
material (27 tables, 10 figures). The manuscript and supplementary material
are combined as a single .pdf (50 pages) fil
Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks
Early and accurate diagnosis of interstitial lung diseases (ILDs) is crucial
for making treatment decisions, but can be challenging even for experienced
radiologists. The diagnostic procedure is based on the detection and
recognition of the different ILD pathologies in thoracic CT scans, yet their
manifestation often appears similar. In this study, we propose the use of a
deep purely convolutional neural network for the semantic segmentation of ILD
patterns, as the basic component of a computer aided diagnosis (CAD) system for
ILDs. The proposed CNN, which consists of convolutional layers with dilated
filters, takes as input a lung CT image of arbitrary size and outputs the
corresponding label map. We trained and tested the network on a dataset of 172
sparsely annotated CT scans, within a cross-validation scheme. The training was
performed in an end-to-end and semi-supervised fashion, utilizing both labeled
and non-labeled image regions. The experimental results show significant
performance improvement with respect to the state of the art
Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images
In hyperspectral remote sensing data mining, it is important to take into
account of both spectral and spatial information, such as the spectral
signature, texture feature and morphological property, to improve the
performances, e.g., the image classification accuracy. In a feature
representation point of view, a nature approach to handle this situation is to
concatenate the spectral and spatial features into a single but high
dimensional vector and then apply a certain dimension reduction technique
directly on that concatenated vector before feed it into the subsequent
classifier. However, multiple features from various domains definitely have
different physical meanings and statistical properties, and thus such
concatenation hasn't efficiently explore the complementary properties among
different features, which should benefit for boost the feature
discriminability. Furthermore, it is also difficult to interpret the
transformed results of the concatenated vector. Consequently, finding a
physically meaningful consensus low dimensional feature representation of
original multiple features is still a challenging task. In order to address the
these issues, we propose a novel feature learning framework, i.e., the
simultaneous spectral-spatial feature selection and extraction algorithm, for
hyperspectral images spectral-spatial feature representation and
classification. Specifically, the proposed method learns a latent low
dimensional subspace by projecting the spectral-spatial feature into a common
feature space, where the complementary information has been effectively
exploited, and simultaneously, only the most significant original features have
been transformed. Encouraging experimental results on three public available
hyperspectral remote sensing datasets confirm that our proposed method is
effective and efficient
- …