416,576 research outputs found
Quantum support vector data description for anomaly detection
Anomaly detection is a critical problem in data analysis and pattern
recognition, finding applications in various domains. We introduce quantum
support vector data description (QSVDD), an unsupervised learning algorithm
designed for anomaly detection. QSVDD utilizes a shallow-depth quantum circuit
to learn a minimum-volume hypersphere that tightly encloses normal data,
tailored for the constraints of noisy intermediate-scale quantum (NISQ)
computing. Simulation results on the MNIST and Fashion MNIST image datasets
demonstrate that QSVDD outperforms both quantum autoencoder and deep
learning-based approaches under similar training conditions. Notably, QSVDD
offers the advantage of training an extremely small number of model parameters,
which grows logarithmically with the number of input qubits. This enables
efficient learning with a simple training landscape, presenting a compact
quantum machine learning model with strong performance for anomaly detection.Comment: 14 pages, 5 figure
Continuous 3D Label Stereo Matching using Local Expansion Moves
We present an accurate stereo matching method using local expansion moves
based on graph cuts. This new move-making scheme is used to efficiently infer
per-pixel 3D plane labels on a pairwise Markov random field (MRF) that
effectively combines recently proposed slanted patch matching and curvature
regularization terms. The local expansion moves are presented as many
alpha-expansions defined for small grid regions. The local expansion moves
extend traditional expansion moves by two ways: localization and spatial
propagation. By localization, we use different candidate alpha-labels according
to the locations of local alpha-expansions. By spatial propagation, we design
our local alpha-expansions to propagate currently assigned labels for nearby
regions. With this localization and spatial propagation, our method can
efficiently infer MRF models with a continuous label space using randomized
search. Our method has several advantages over previous approaches that are
based on fusion moves or belief propagation; it produces submodular moves
deriving a subproblem optimality; it helps find good, smooth, piecewise linear
disparity maps; it is suitable for parallelization; it can use cost-volume
filtering techniques for accelerating the matching cost computations. Even
using a simple pairwise MRF, our method is shown to have best performance in
the Middlebury stereo benchmark V2 and V3.Comment: 14 pages. An extended version of our preliminary conference paper
[39], Taniai et al. "Graph Cut based Continuous Stereo Matching using Locally
Shared Labels" in the proceedings of IEEE Conference on Computer Vision and
Pattern Recognition (CVPR 2014). Our results were submitted to Middlebury
Stereo Benchmark Version 2 on April 22, 2015, and to Version 3 on July 4,
201
Volumetric Super-Resolution of Multispectral Data
Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7
ETM+) provide low-spatial high-spectral resolution multispectral (MS) or
high-spatial low-spectral resolution panchromatic (PAN) images, separately. In
order to reconstruct a high-spatial/high-spectral resolution multispectral
image volume, either the information in MS and PAN images are fused (i.e.
pansharpening) or super-resolution reconstruction (SRR) is used with only MS
images captured on different dates. Existing methods do not utilize temporal
information of MS and high spatial resolution of PAN images together to improve
the resolution. In this paper, we propose a multiframe SRR algorithm using
pansharpened MS images, taking advantage of both temporal and spatial
information available in multispectral imagery, in order to exceed spatial
resolution of given PAN images. We first apply pansharpening to a set of
multispectral images and their corresponding PAN images captured on different
dates. Then, we use the pansharpened multispectral images as input to the
proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The
proposed SRR method is obtained by deriving the subband relations between
multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images
comparing our method to conventional techniques.Comment: arXiv admin note: text overlap with arXiv:1705.0125
Pattern recognition of Xe double beta decay events and background discrimination in a high pressure Xenon TPC
High pressure gas detectors offer advantages for the detection of rare
events, where background reduction is crucial. For the neutrinoless double beta
decay of 136Xe a high pressure xenon gas Time Projection Chamber (TPC) combines
a good energy resolution and a detailed topological information of each event.
The ionization topology of the double beta decay event of 136Xe in gaseous
xenon has a characteristic shape defined by the two straggling electron tracks
ending up in two higher ionization charge density blobs. With a properly
pixelized readout, this topological information is invaluable to perform
powerful background discrimination. In this study we carry out detailed
simulations of the signal topology, as well as the competing topologies from
gamma events that typically compose the background at these energies. We define
observables based on graph theory concepts and develop automated discrimination
algorithms which reduce the background level in around three orders of
magnitude while keeping signal efficiency of 40%. This result supports the
competitiveness of current or future double beta experiments based on gas TPCs,
like the Neutrino Xenon TPC (NEXT) currently under construction in the
Laboratorio Subterraneo de Canfranc (LSC).Comment: 26 pages, 9 figures, accepted for publication in Journal of Physics
CNN-based Cost Volume Analysis as Confidence Measure for Dense Matching
Due to its capability to identify erroneous disparity assignments in dense
stereo matching, confidence estimation is beneficial for a wide range of
applications, e.g. autonomous driving, which needs a high degree of confidence
as mandatory prerequisite. Especially, the introduction of deep learning based
methods resulted in an increasing popularity of this field in recent years,
caused by a significantly improved accuracy. Despite this remarkable
development, most of these methods rely on features learned from disparity maps
only, not taking into account the corresponding 3-dimensional cost volumes.
However, it was already demonstrated that with conventional methods based on
hand-crafted features this additional information can be used to further
increase the accuracy. In order to combine the advantages of deep learning and
cost volume based features, in this paper, we propose a novel Convolutional
Neural Network (CNN) architecture to directly learn features for confidence
estimation from volumetric 3D data. An extensive evaluation on three datasets
using three common dense stereo matching techniques demonstrates the generality
and state-of-the-art accuracy of the proposed method.Comment: The IEEE International Conference on Computer Vision (ICCV) Workshops
(2019
Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
Heatmap regression with a deep network has become one of the mainstream
approaches to localize facial landmarks. However, the loss function for heatmap
regression is rarely studied. In this paper, we analyze the ideal loss function
properties for heatmap regression in face alignment problems. Then we propose a
novel loss function, named Adaptive Wing loss, that is able to adapt its shape
to different types of ground truth heatmap pixels. This adaptability penalizes
loss more on foreground pixels while less on background pixels. To address the
imbalance between foreground and background pixels, we also propose Weighted
Loss Map, which assigns high weights on foreground and difficult background
pixels to help training process focus more on pixels that are crucial to
landmark localization. To further improve face alignment accuracy, we introduce
boundary prediction and CoordConv with boundary coordinates. Extensive
experiments on different benchmarks, including COFW, 300W and WFLW, show our
approach outperforms the state-of-the-art by a significant margin on various
evaluation metrics. Besides, the Adaptive Wing loss also helps other heatmap
regression tasks. Code will be made publicly available at
https://github.com/protossw512/AdaptiveWingLoss.Comment: [v2] Camera-ready version for ICCV 2019. [v3] Corrected AUC(fr10%) on
table
Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition
It remains a challenge to efficiently extract spatialtemporal information
from skeleton sequences for 3D human action recognition. Although most recent
action recognition methods are based on Recurrent Neural Networks which present
outstanding performance, one of the shortcomings of these methods is the
tendency to overemphasize the temporal information. Since 3D convolutional
neural network(3D CNN) is a powerful tool to simultaneously learn features from
both spatial and temporal dimensions through capturing the correlations between
three dimensional signals, this paper proposes a novel two-stream model using
3D CNN. To our best knowledge, this is the first application of 3D CNN in
skeleton-based action recognition. Our method consists of three stages. First,
skeleton joints are mapped into a 3D coordinate space and then encoding the
spatial and temporal information, respectively. Second, 3D CNN models are
seperately adopted to extract deep features from two streams. Third, to enhance
the ability of deep features to capture global relationships, we extend every
stream into multitemporal version. Extensive experiments on the SmartHome
dataset and the large-scale NTU RGB-D dataset demonstrate that our method
outperforms most of RNN-based methods, which verify the complementary property
between spatial and temporal information and the robustness to noise.Comment: 5 pages, 6 figures, 3 tabel
Two Stream 3D Semantic Scene Completion
Inferring the 3D geometry and the semantic meaning of surfaces, which are
occluded, is a very challenging task. Recently, a first end-to-end learning
approach has been proposed that completes a scene from a single depth image.
The approach voxelizes the scene and predicts for each voxel if it is occupied
and, if it is occupied, the semantic class label. In this work, we propose a
two stream approach that leverages depth information and semantic information,
which is inferred from the RGB image, for this task. The approach constructs an
incomplete 3D semantic tensor, which uses a compact three-channel encoding for
the inferred semantic information, and uses a 3D CNN to infer the complete 3D
semantic tensor. In our experimental evaluation, we show that the proposed two
stream approach substantially outperforms the state-of-the-art for semantic
scene completion
A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset
This paper aims to determine which is the best human action recognition
method based on features extracted from RGB-D devices, such as the Microsoft
Kinect. A review of all the papers that make reference to MSR Action3D, the
most used dataset that includes depth information acquired from a RGB-D device,
has been performed. We found that the validation method used by each work
differs from the others. So, a direct comparison among works cannot be made.
However, almost all the works present their results comparing them without
taking into account this issue. Therefore, we present different rankings
according to the methodology used for the validation in orden to clarify the
existing confusion.Comment: 16 pages and 7 table
An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions
In this paper, we show that different body parts do not play equally
important roles in recognizing a human action in video data. We investigate to
what extent a body part plays a role in recognition of different actions and
hence propose a generic method of assigning weights to different body points.
The approach is inspired by the strong evidence in the applied perception
community that humans perform recognition in a foveated manner, that is they
recognize events or objects by only focusing on visually significant aspects.
An important contribution of our method is that the computation of the weights
assigned to body parts is invariant to viewing directions and camera parameters
in the input data. We have performed extensive experiments to validate the
proposed approach and demonstrate its significance. In particular, results show
that considerable improvement in performance is gained by taking into account
the relative importance of different body parts as defined by our approach.Comment: arXiv admin note: substantial text overlap with arXiv:1705.04641,
arXiv:1705.05741, arXiv:1705.0443
- …