59 research outputs found

    Multiple depth maps integration for 3D reconstruction using geodesic graph cuts

    Get PDF
    Depth images, in particular depth maps estimated from stereo vision, may have a substantial amount of outliers and result in inaccurate 3D modelling and reconstruction. To address this challenging issue, in this paper, a graph-cut based multiple depth maps integration approach is proposed to obtain smooth and watertight surfaces. First, confidence maps for the depth images are estimated to suppress noise, based on which reliable patches covering the object surface are determined. These patches are then exploited to estimate the path weight for 3D geodesic distance computation, where an adaptive regional term is introduced to deal with the “shorter-cuts” problem caused by the effect of the minimal surface bias. Finally, the adaptive regional term and the boundary term constructed using patches are combined in the graph-cut framework for more accurate and smoother 3D modelling. We demonstrate the superior performance of our algorithm on the well-known Middlebury multi-view database and additionally on real-world multiple depth images captured by Kinect. The experimental results have shown that our method is able to preserve the object protrusions and details while maintaining surface smoothness

    Feature aggregation and region-aware learning for detection of splicing forgery.

    Get PDF
    Detection of image splicing forgery become an increasingly difficult task due to the scale variations of the forged areas and the covered traces of manipulation from post-processing techniques. Most existing methods fail to jointly multi-scale local and global information and ignore the correlations between the tampered and real regions in inter-image, which affects the detection performance of multi-scale tampered regions. To tackle these challenges, in this paper, we propose a novel method based on feature aggregation and region-aware learning to detect the manipulated areas with varying scales. In specific, we first integrate multi-level adjacency features using a feature selection mechanism to improve feature representation. Second, a cross-domain correlation aggregation module is devised to perform correlation enhancement of local features from CNN and global representations from Transformer, allowing for a complementary fusion of dual-domain information. Third, a region-aware learning mechanism is designed to improve feature discrimination by comparing the similarities and differences of the features between different regions. Extensive evaluations on benchmark datasets indicate the effectiveness in detecting multi-scale spliced tampered regions

    Exposing image forgery by detecting traces of feather operation

    Get PDF
    Powerful digital image editing tools make it very easy to produce a perfect image forgery. The feather operation is necessary when tampering an image by copy–paste operation because it can help the boundary of pasted object to blend smoothly and unobtrusively with its surroundings. We propose a blind technique capable of detecting traces of feather operation to expose image forgeries. We model the feather operation, and the pixels of feather region will present similarity in their gradient phase angle and feather radius. An effectual scheme is designed to estimate each feather region pixel׳s gradient phase angle and feather radius, and the pixel׳s similarity to its neighbor pixels is defined and used to distinguish the feathered pixels from un-feathered pixels. The degree of image credibility is defined, and it is more acceptable to evaluate the reality of one image than just using a decision of YES or NO. Results of experiments on several forgeries demonstrate the effectiveness of the technique

    Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos

    Get PDF
    BACKGROUND INTRODUCTION In this paper, we present an efficient framework to cognitively detect and track salient objects from videos. In general, colored visible image in red-green-blue (RGB) has better distinguishability in human visual perception, yet it suffers from the effect of illumination noise and shadows. On the contrary, the thermal image is less sensitive to these noise effects though its distinguishability varies according to environmental settings. To this end, cognitive fusion of these two modalities provides an effective solution to tackle this problem. METHODS First, a background model is extracted followed by two stage background-subtraction for foreground detection in visible and thermal images. To deal with cases of occlusion or overlap, knowledge based forward tracking and backward tracking are employed to identify separate objects even the foreground detection fails. RESULTS To evaluate the proposed method, a publicly available color-thermal benchmark dataset OTCBVS is employed here. For our foreground detection evaluation, objective and subjective analysis against several state-of-the-art methods have been done on our manually segmented ground truth. For our object tracking evaluation, comprehensive qualitative experiments have also been done on all video sequences. CONCLUSIONS Promising results have shown that the proposed fusion based approach can successfully detect and track multiple human objects in most scenes regardless of any light change or occlusion problem

    Fusion of block and keypoints based approaches for effective copy-move image forgery detection

    Get PDF
    Keypoint-based and block-based methods are two main categories of techniques for detecting copy-move forged images, one of the most common digital image forgery schemes. In general, block-based methods suffer from high computational cost due to the large number of image blocks used and fail to handle geometric transformations. On the contrary, keypoint-based approaches can overcome these two drawbacks yet are found difficult to deal with smooth regions. As a result, fusion of these two approaches is proposed for effective copy-move forgery detection. First, our scheme adaptively determines an appropriate initial size of regions to segment the image into non-overlapped regions. Feature points are extracted as keypoints using the scale invariant feature transform (SIFT) from the image. The ratio between the number of keypoints and the total number of pixels in that region is used to classify the region into smooth or non-smooth (keypoints) regions. Accordingly, block based approach using Zernike moments and keypoint based approach using SIFT along with filtering and post-processing are respectively applied to these two kinds of regions for effective forgery detection. Experimental results show that the proposed fusion scheme outperforms the keypoint-based method in reliability of detection and the block-based method in efficiency

    Multiscale 2-D singular spectrum analysis and principal component analysis for spatial–spectral noise-robust feature extraction and classification of hyperspectral images.

    Get PDF
    In hyperspectral images (HSI), most feature extraction and data classification methods rely on corrected dataset, in which the noisy and water absorption bands are removed. This can result in not only extra working burden but also information loss from removed bands. To tackle these issues, in this article, we propose a novel spatial-spectral feature extraction framework, multiscale 2-D singular spectrum analysis (2-D-SSA) with principal component analysis (PCA) (2-D-MSSP), for noise-robust feature extraction and data classification of HSI. First, multiscale 2-D-SSA is applied to exploit the multiscale spatial features in each spectral band of HSI via extracting the varying trends within defined windows. Taking the extracted trend signals at each scale level as the input features, the PCA is employed to the spectral domain for dimensionality reduction and spatial-spectral feature extraction. The derived spatial-spectral features in each scale are separately classified and then fused at decision-level for efficacy. As our 2-D-MSSP method can extract features and simultaneously remove noise in both spatial and spectral domains, which ensures it to be noise-robust for classification of HSI, even the uncorrected dataset. Experiments on three publicly available datasets have fully validated the efficacy and robustness of the proposed approach, when benchmarked with 10 state-of-the-art classifiers, including six spatial-spectral methods and four deep learning classifiers. In addition, both quantitative and qualitative assessment has validated the efficacy of our approach in noise-robust classification of HSI even with limited training samples, especially in classifying uncorrected data without filtering noisy bands

    Dynamic Non-Rigid Objects Reconstruction with a Single RGB-D Sensor

    Get PDF
    This paper deals with the 3D reconstruction problem for dynamic non-rigid objects with a single RGB-D sensor. It is a challenging task as we consider the almost inevitable accumulation error issue in some previous sequential fusion methods and also the possible failure of surface tracking in a long sequence. Therefore, we propose a global non-rigid registration framework and tackle the drifting problem via an explicit loop closure. Our novel scheme starts with a fusion step to get multiple partial scans from the input sequence, followed by a pairwise non-rigid registration and loop detection step to obtain correspondences between neighboring partial pieces and those pieces that form a loop. Then, we perform a global registration procedure to align all those pieces together into a consistent canonical space as guided by those matches that we have established. Finally, our proposed model-update step helps fixing potential misalignments that still exist after the global registration. Both geometric and appearance constraints are enforced during our alignment; therefore, we are able to get the recovered model with accurate geometry as well as high fidelity color maps for the mesh. Experiments on both synthetic and various real datasets have demonstrated the capability of our approach to reconstruct complete and watertight deformable objects

    Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

    Full text link
    Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model

    Novel two dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging

    Get PDF
    Feature extraction is of high importance for effective data classification in hyperspectral imaging (HSI). Considering the high correlation among band images, spectral-domain feature extraction is widely employed. For effective spatial information extraction, a 2-D extension to singular spectrum analysis (SSA), a recent technique for generic data mining and temporal signal analysis, is proposed. With 2D-SSA applied to HSI, each band image is decomposed into varying trend, oscillations and noise. Using the trend and selected oscillations as features, the reconstructed signal, with noise highly suppressed, becomes more robust and effective for data classification. Three publicly available data sets for HSI remote sensing data classification are used in our experiments. Comprehensive results using a support vector machine (SVM) classifier have quantitatively evaluated the efficacy of the proposed approach. Benchmarked with several state-of-the-art methods including 2-D empirical mode decomposition (2D-EMD), it is found that our proposed 2D-SSA approach generates the best results in most cases. Unlike 2D-EMD which requires sequential transforms to obtain detailed decomposition, 2D-SSA extracts all components simultaneously. As a result, the executive time in feature extraction can also be dramatically reduced. The superiority in terms of enhanced discrimination ability from 2D-SSA is further validated when a relatively weak classifier, k-nearest neighbor (k-NN), is used for data classification. In addition, the combination of 2D-SSA with 1D-PCA (2D-SSA-PCA) has generated the best results among several other approaches, which has demonstrated the great potential in combining 2D-SSA with other approaches for effective spatial-spectral feature extraction and dimension reduction in HSI
    • …
    corecore