416,576 research outputs found

    Quantum support vector data description for anomaly detection

    Full text link
    Anomaly detection is a critical problem in data analysis and pattern recognition, finding applications in various domains. We introduce quantum support vector data description (QSVDD), an unsupervised learning algorithm designed for anomaly detection. QSVDD utilizes a shallow-depth quantum circuit to learn a minimum-volume hypersphere that tightly encloses normal data, tailored for the constraints of noisy intermediate-scale quantum (NISQ) computing. Simulation results on the MNIST and Fashion MNIST image datasets demonstrate that QSVDD outperforms both quantum autoencoder and deep learning-based approaches under similar training conditions. Notably, QSVDD offers the advantage of training an extremely small number of model parameters, which grows logarithmically with the number of input qubits. This enables efficient learning with a simple training landscape, presenting a compact quantum machine learning model with strong performance for anomaly detection.Comment: 14 pages, 5 figure

    Continuous 3D Label Stereo Matching using Local Expansion Moves

    Full text link
    We present an accurate stereo matching method using local expansion moves based on graph cuts. This new move-making scheme is used to efficiently infer per-pixel 3D plane labels on a pairwise Markov random field (MRF) that effectively combines recently proposed slanted patch matching and curvature regularization terms. The local expansion moves are presented as many alpha-expansions defined for small grid regions. The local expansion moves extend traditional expansion moves by two ways: localization and spatial propagation. By localization, we use different candidate alpha-labels according to the locations of local alpha-expansions. By spatial propagation, we design our local alpha-expansions to propagate currently assigned labels for nearby regions. With this localization and spatial propagation, our method can efficiently infer MRF models with a continuous label space using randomized search. Our method has several advantages over previous approaches that are based on fusion moves or belief propagation; it produces submodular moves deriving a subproblem optimality; it helps find good, smooth, piecewise linear disparity maps; it is suitable for parallelization; it can use cost-volume filtering techniques for accelerating the matching cost computations. Even using a simple pairwise MRF, our method is shown to have best performance in the Middlebury stereo benchmark V2 and V3.Comment: 14 pages. An extended version of our preliminary conference paper [39], Taniai et al. "Graph Cut based Continuous Stereo Matching using Locally Shared Labels" in the proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014). Our results were submitted to Middlebury Stereo Benchmark Version 2 on April 22, 2015, and to Version 3 on July 4, 201

    Volumetric Super-Resolution of Multispectral Data

    Full text link
    Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7 ETM+) provide low-spatial high-spectral resolution multispectral (MS) or high-spatial low-spectral resolution panchromatic (PAN) images, separately. In order to reconstruct a high-spatial/high-spectral resolution multispectral image volume, either the information in MS and PAN images are fused (i.e. pansharpening) or super-resolution reconstruction (SRR) is used with only MS images captured on different dates. Existing methods do not utilize temporal information of MS and high spatial resolution of PAN images together to improve the resolution. In this paper, we propose a multiframe SRR algorithm using pansharpened MS images, taking advantage of both temporal and spatial information available in multispectral imagery, in order to exceed spatial resolution of given PAN images. We first apply pansharpening to a set of multispectral images and their corresponding PAN images captured on different dates. Then, we use the pansharpened multispectral images as input to the proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The proposed SRR method is obtained by deriving the subband relations between multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images comparing our method to conventional techniques.Comment: arXiv admin note: text overlap with arXiv:1705.0125

    Pattern recognition of 136^{136}Xe double beta decay events and background discrimination in a high pressure Xenon TPC

    Full text link
    High pressure gas detectors offer advantages for the detection of rare events, where background reduction is crucial. For the neutrinoless double beta decay of 136Xe a high pressure xenon gas Time Projection Chamber (TPC) combines a good energy resolution and a detailed topological information of each event. The ionization topology of the double beta decay event of 136Xe in gaseous xenon has a characteristic shape defined by the two straggling electron tracks ending up in two higher ionization charge density blobs. With a properly pixelized readout, this topological information is invaluable to perform powerful background discrimination. In this study we carry out detailed simulations of the signal topology, as well as the competing topologies from gamma events that typically compose the background at these energies. We define observables based on graph theory concepts and develop automated discrimination algorithms which reduce the background level in around three orders of magnitude while keeping signal efficiency of 40%. This result supports the competitiveness of current or future double beta experiments based on gas TPCs, like the Neutrino Xenon TPC (NEXT) currently under construction in the Laboratorio Subterraneo de Canfranc (LSC).Comment: 26 pages, 9 figures, accepted for publication in Journal of Physics

    CNN-based Cost Volume Analysis as Confidence Measure for Dense Matching

    Full text link
    Due to its capability to identify erroneous disparity assignments in dense stereo matching, confidence estimation is beneficial for a wide range of applications, e.g. autonomous driving, which needs a high degree of confidence as mandatory prerequisite. Especially, the introduction of deep learning based methods resulted in an increasing popularity of this field in recent years, caused by a significantly improved accuracy. Despite this remarkable development, most of these methods rely on features learned from disparity maps only, not taking into account the corresponding 3-dimensional cost volumes. However, it was already demonstrated that with conventional methods based on hand-crafted features this additional information can be used to further increase the accuracy. In order to combine the advantages of deep learning and cost volume based features, in this paper, we propose a novel Convolutional Neural Network (CNN) architecture to directly learn features for confidence estimation from volumetric 3D data. An extensive evaluation on three datasets using three common dense stereo matching techniques demonstrates the generality and state-of-the-art accuracy of the proposed method.Comment: The IEEE International Conference on Computer Vision (ICCV) Workshops (2019

    Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression

    Full text link
    Heatmap regression with a deep network has become one of the mainstream approaches to localize facial landmarks. However, the loss function for heatmap regression is rarely studied. In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems. Then we propose a novel loss function, named Adaptive Wing loss, that is able to adapt its shape to different types of ground truth heatmap pixels. This adaptability penalizes loss more on foreground pixels while less on background pixels. To address the imbalance between foreground and background pixels, we also propose Weighted Loss Map, which assigns high weights on foreground and difficult background pixels to help training process focus more on pixels that are crucial to landmark localization. To further improve face alignment accuracy, we introduce boundary prediction and CoordConv with boundary coordinates. Extensive experiments on different benchmarks, including COFW, 300W and WFLW, show our approach outperforms the state-of-the-art by a significant margin on various evaluation metrics. Besides, the Adaptive Wing loss also helps other heatmap regression tasks. Code will be made publicly available at https://github.com/protossw512/AdaptiveWingLoss.Comment: [v2] Camera-ready version for ICCV 2019. [v3] Corrected AUC(fr10%) on table

    Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

    Full text link
    It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present outstanding performance, one of the shortcomings of these methods is the tendency to overemphasize the temporal information. Since 3D convolutional neural network(3D CNN) is a powerful tool to simultaneously learn features from both spatial and temporal dimensions through capturing the correlations between three dimensional signals, this paper proposes a novel two-stream model using 3D CNN. To our best knowledge, this is the first application of 3D CNN in skeleton-based action recognition. Our method consists of three stages. First, skeleton joints are mapped into a 3D coordinate space and then encoding the spatial and temporal information, respectively. Second, 3D CNN models are seperately adopted to extract deep features from two streams. Third, to enhance the ability of deep features to capture global relationships, we extend every stream into multitemporal version. Extensive experiments on the SmartHome dataset and the large-scale NTU RGB-D dataset demonstrate that our method outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise.Comment: 5 pages, 6 figures, 3 tabel

    Two Stream 3D Semantic Scene Completion

    Full text link
    Inferring the 3D geometry and the semantic meaning of surfaces, which are occluded, is a very challenging task. Recently, a first end-to-end learning approach has been proposed that completes a scene from a single depth image. The approach voxelizes the scene and predicts for each voxel if it is occupied and, if it is occupied, the semantic class label. In this work, we propose a two stream approach that leverages depth information and semantic information, which is inferred from the RGB image, for this task. The approach constructs an incomplete 3D semantic tensor, which uses a compact three-channel encoding for the inferred semantic information, and uses a 3D CNN to infer the complete 3D semantic tensor. In our experimental evaluation, we show that the proposed two stream approach substantially outperforms the state-of-the-art for semantic scene completion

    A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset

    Get PDF
    This paper aims to determine which is the best human action recognition method based on features extracted from RGB-D devices, such as the Microsoft Kinect. A review of all the papers that make reference to MSR Action3D, the most used dataset that includes depth information acquired from a RGB-D device, has been performed. We found that the validation method used by each work differs from the others. So, a direct comparison among works cannot be made. However, almost all the works present their results comparing them without taking into account this issue. Therefore, we present different rankings according to the methodology used for the validation in orden to clarify the existing confusion.Comment: 16 pages and 7 table

    An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions

    Full text link
    In this paper, we show that different body parts do not play equally important roles in recognizing a human action in video data. We investigate to what extent a body part plays a role in recognition of different actions and hence propose a generic method of assigning weights to different body points. The approach is inspired by the strong evidence in the applied perception community that humans perform recognition in a foveated manner, that is they recognize events or objects by only focusing on visually significant aspects. An important contribution of our method is that the computation of the weights assigned to body parts is invariant to viewing directions and camera parameters in the input data. We have performed extensive experiments to validate the proposed approach and demonstrate its significance. In particular, results show that considerable improvement in performance is gained by taking into account the relative importance of different body parts as defined by our approach.Comment: arXiv admin note: substantial text overlap with arXiv:1705.04641, arXiv:1705.05741, arXiv:1705.0443
    corecore