29 research outputs found

    Weighted Nuclear Norm Minimization Based Tongue Specular Reflection Removal

    Get PDF
    In computational tongue diagnosis, specular reflection is generally inevitable in tongue image acquisition, which has adverse impact on the feature extraction and tends to degrade the diagnosis performance. In this paper, we proposed a two-stage (i.e., the detection and inpainting pipeline) approach to address this issue: (i) by considering both highlight reflection and subreflection areas, a superpixel-based segmentation method was adopted for the detection of the specular reflection areas; (ii) by extending the weighted nuclear norm minimization (WNNM) model, a nonlocal inpainting method is proposed for specular reflection removal. Experimental results on synthetic and real images show that the proposed method is accurate in detecting the specular reflection areas and is effective in restoring tongue image with more natural texture information of tongue body

    HRCTNet: a hybrid network with high-resolution representation for object detection in UAV image

    No full text
    Abstract Object detection in unmanned aerial vehicle (UAV) images has attracted the increasing attention of researchers in recent years. However, it is challenging for small object detection using conventional detection methods because less location and semantic information are extracted from the feature maps of UAV images. To remedy this problem, three new feature extraction modules are proposed in this paper to refine the feature maps for small objects in UAV images. Namely, Small-Kernel-Block (SKBlock), Large-Kernel-Block (LKBlock), and Conv-Trans-Block (CTBlock), respectively. Based on these three modules, a novel backbone called High-Resolution Conv-Trans Network (HRCTNet) is proposed. Additionally, an activation function Acon is deployed in our network to reduce the possibility of dying ReLU and remove redundant features. Based on the characteristics of extreme imbalanced labels in UAV image datasets, a loss function Ployloss is adopted to train HRCTNet. To verify the effectiveness of the proposed HRCTNet, corresponding experiments have been conducted on several datasets. On VisDrone dataset, HRCTNet achieves 49.5% on AP50 and 29.1% on AP, respectively. As on COCO dataset, with limited FLOPs, HRCTNet achieves 37.9% on AP and 24.1% on APS. The experimental results demonstrate that HRCTNet outperforms the existing methods for object detection in UAV images

    Spatial–Temporal Graph Transformer With Sign Mesh Regression for Skinned-Based Sign Language Production

    No full text
    Sign language production aims to automatically generate coordinated sign language videos from spoken language. As a typical sequence to sequence task, the existing methods are mostly to regard the skeletons as a whole sequence, however, those do not take the rich graph information among both joints and edges into consideration. In this paper, we propose a novel method named Spatial-Temporal Graph Transformer (STGT) to deal with this problem. Specifically, according to kinesiology, we first design a novel graph representation to achieve graph features from skeletons. Then the spatial-temporal graph self-attention utilizes graph topology to capture the intra-frame and inter-frame correlations, respectively. Our key innovation is that the attention maps are calculated on both spatial and temporal dimensions in turn, meanwhile, graph convolution is used to strengthen the short-term features of skeletal structure. Finally, due to the generated skeletons are based on the form of skeleton points and lines so far. In order to visualize the generated sign language videos, we design a sign mesh regression module to render the skeletons into skinned animations including body and hands posture. Comparing with states of art baseline on RWTH-PHONEIX Weather-2014T in Experiment Section, STGT can obtain the highest values on BLEU and ROUGE, which indicates our method produces most accurate and intuitive sign language videos

    A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

    No full text
    As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy

    Down image recognition based on deep convolutional neural network

    No full text
    Since of the scale and the various shapes of down in the image, it is difficult for traditional image recognition method to correctly recognize the type of down image and get the required recognition accuracy, even for the Traditional Convolutional Neural Network (TCNN). To deal with the above problems, a Deep Convolutional Neural Network (DCNN) for down image classification is constructed, and a new weight initialization method is proposed. Firstly, the salient regions of a down image were cut from the image using the visual saliency model. Then, these salient regions of the image were used to train a sparse autoencoder and get a collection of convolutional filters, which accord with the statistical characteristics of dataset. At last, a DCNN with Inception module and its variants was constructed. To improve the recognition accuracy, the depth of the network is deepened. The experiment results indicate that the constructed DCNN increases the recognition accuracy by 2.7% compared to TCNN, when recognizing the down in the images. The convergence rate of the proposed DCNN with the new weight initialization method is improved by 25.5% compared to TCNN. Keywords: Deep convolutional neural network, Weight initialization, Sparse autoencoder, Visual saliency model, Image recognitio

    Using compaction simulation experiment to recover burial history: Taking the fourth Member of Shahejie Formation in Leijia area, Western Depression of Liaohe River as an example

    No full text
    The Leijia area in the Western Depression of the Liaohe River has good oil and gas exploration and development potential, but the current analysis of the burial history in this area lacks a clear understanding.This paper uses the recovery principle of subsidence history to restore the original thickness of the remaining stratum in the study area.Firstly, the sonic curve method and the adjacent layer comparison method are used to restore the denuded thickness of the stratum.Sandstone, argillaceous limestone, limestone mudstone, mudstone) were subjected to compaction simulation experiments, and the porosity-depth curve of the sample was obtained according to the experiment.Use the existing burial data to calculate the thickness of the stratum framework, and according to the constant volume formula of the framework, use each single lithological layer as the smallest unit to carry out the backstripping calculation one by one to obtain the sedimentary thickness of each group and section in different geological periods and the depth of burial.According to the analysis of burial history, during the burial process of the Cenozoic strata in Leijia area, the sedimentation rate of the Paleogene Shahejie Formation was higher and the sedimentary strata was thicker, but the second Member of Shahejie Formation was uplifted and eroded and the strata was thin; During the Dongying Formation, the stratum deposition slowed down, and the stratum was uplifted and eroded at the end; in the Neogene and Quaternary, the stratum deposition rate was low, and the thickness of the deposited stratum was not large
    corecore