117 research outputs found

    Bit Rate Estimation for Cost Function of H.264/AVC

    Get PDF
    Non

    Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

    Full text link
    Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these problems and to enable the use of extremely large convolutional kernels in the attention modules of VAN, we propose a family of Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels. In contrast to the standard LKA design, the proposed decomposition enables the direct use of the depth-wise convolutional layer with large kernels in the attention module, without requiring any extra blocks. We demonstrate that the proposed LSKA module in VAN can achieve comparable performance with the standard LKA module and incur lower computational complexity and memory footprints. We also find that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size. Additionally, we benchmark the robustness of the LKA and LSKA in VAN, ViTs, and the recent ConvNeXt on the five corrupted versions of the ImageNet dataset that are largely unexplored in the previous works. Our extensive experimental results show that the proposed LSKA module in VAN provides a significant reduction in computational complexity and memory footprints with increasing kernel size while outperforming ViTs, ConvNeXt, and providing similar performance compared to the LKA module in VAN on object recognition, object detection, semantic segmentation, and robustness tests

    Single-nucleotide polymorphisms and haplotype of CYP2E1 gene associated with systemic lupus erythematosus in Chinese population

    Get PDF
    Introduction: Cytochrome P-450 2E1 (CYP2E1) is an important member of the CYP superfamily, which is involved in the metabolism and activation of many low molecular weight toxic compounds. We tried to investigate the possible association of CYP2E1 tag single nucleotide polymorphisms (SNPs) with susceptibility to systemic lupus erythematosus (SLE) in a Chinese Han population. Methods: The coding and flanking regions of the CYP2E1 gene were scanned for polymorphisms and tag SNPs were selected. A two-stage case-control study was performed to genotype a total of 876 SLE patients and 680 geographically matched healthy controls (265 cases and 288 controls in stage I and 611 cases and 392 controls in stage II). SLE associations of alleles, genotypes and haplotypes were tested by age and sex adjusted logistic regression. The gene transcription quantitation was carried out for peripheral blood mononuclear cell (PBMC) samples from 120 healthy controls. Results: Tag SNP rs2480256 was found significantly associated with SLE in both stages of the study. The "A" allele was associated with slightly higher risk (odds ratio (OR) = 1.165, 95% confidence interval (CI) 1.073 to 1.265, P = 2.75E-4) and "A/A" genotype carriers were with even higher SLE risk (OR = 1.464 95% CI 1.259 to 1.702, P = 7.48E-7). When combined with another tag SNP rs8192772, we identified haplotype "rs8192772-rs2480256/TA" over presented in SLE patients (OR 1.407, 95% CI 1.182 to 1.675, P = 0.0001) and haplotype "TG" over presented in the controls (OR 0.771, 95% CI 0.667 to 0.890, P = 0.0004). The gene transcription quantitation analysis further proved the dominant effect of rs2480256 as the "A/A" genotype showed highest transcription. Conclusions: Our results suggest the involvement of CYP2E1 as a susceptibility gene for SLE in the Chinese population

    Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer

    Full text link
    Video-based human pose transfer is a video-to-video generation task that animates a plain source human image based on a series of target human poses. Considering the difficulties in transferring highly structural patterns on the garments and discontinuous poses, existing methods often generate unsatisfactory results such as distorted textures and flickering artifacts. To address these issues, we propose a novel Deformable Motion Modulation (DMM) that utilizes geometric kernel offset with adaptive weight modulation to simultaneously perform feature alignment and style transfer. Different from normal style modulation used in style transfer, the proposed modulation mechanism adaptively reconstructs smoothed frames from style codes according to the object shape through an irregular receptive field of view. To enhance the spatio-temporal consistency, we leverage bidirectional propagation to extract the hidden motion information from a warped image sequence generated by noisy poses. The proposed feature propagation significantly enhances the motion prediction ability by forward and backward propagation. Both quantitative and qualitative experimental results demonstrate superiority over the state-of-the-arts in terms of image fidelity and visual continuity. The source code is publicly available at github.com/rocketappslab/bdmm.Comment: ICCV 202

    Six-Digit Stroke-based Chinese Input Method

    Get PDF
    Abstract-During the last three decades, more than one thousand Chinese input methods have been developed. However, people are still looking for better input methods in terms of easy to use, easy to remember, high input speed and small keypad implementation on handheld devices. The well-known strokebased Chinese input method using only five basic stroke types could achieve low learning curve and small numeric keypad implementation but its input speed is limited for complex Chinese characters with a lot of strokes. To tackle this problem, simplified stroke-based Chinese character and phrase coding methods using (3+3) rules are proposed in this paper. The proposed method only uses the first 3 stroke codes and the last 3 stroke codes to represent the first and last radical information of the character for achieving lower average code length and higher hit rate of first character on the candidate list. To further enhance the input speed, a very user-friendly (3+3) phrase coding rule is also proposed for inputting Chinese phrases in terms of 2-character, 3-character and long-character phrases. Three special key assignment designs are developed for practical implementation of the proposed Chinese character and phrase input method using conventional QWERTY keyboard, PC's numeric keypad and mobile phone 12-key keypad. Experimental results have shown that the proposed character coding can achieve lower average code length and higher Hit Rate of First Character as compared with conventional stroke-based method and some well-known Chinese input methods. The proposed coding rules are also very easy to use and remember

    Multiple Block-Size Search Algorithm for Fast Block Motion Estimation

    Get PDF
    Abstract-Although variable block-size motion estimation provides significant video quality and coding efficiency improvement, it requires much higher computational complexity compared with fixed block size motion estimation. The reason is that the current motion estimation algorithms are mainly designed for fixed block size. Current variable block-size motion estimation implementation simply applies these existing motion estimation algorithms independently for different block sizes to find the best block size and the corresponding motion vector. Substantial computation is wasted because distortion data reuse among motion searches of different block sizes is not considered. In this paper, a motion estimation algorithm intrinsically designed for variable block-size video coding is presented. The proposed multiple block-size search (MBSS) algorithm unifies the motion searches for different block sizes into a single searching process instead of independently performing the search for each block size. In this unified search, the suboptimal motion vectors for different block sizes are used to determine the next search steps. Its prediction quality is comparable with that obtained by performing motion search for different block sizes independently while the computational load is substantially reduced. Experimental results show that the prediction quality of MBSS is similar to full search. Block matching, motion estimation, video coding, search pattern, directional search
    corecore