32 research outputs found

    Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints

    Full text link
    A vision transformer (ViT) is the dominant model in the computer vision field. Despite numerous studies that mainly focus on dealing with inductive bias and complexity, there remains the problem of finding better transformer networks. For example, conventional transformer-based models usually use a projection layer for each query (Q), key (K), and value (V) embedding before multi-head self-attention. Insufficient consideration of semantic Q,KQ, K, and VV embedding may lead to a performance drop. In this paper, we propose three types of structures for QQ, KK, and VV embedding. The first structure utilizes two layers with ReLU, which is a non-linear embedding for Q,KQ, K, and VV. The second involves sharing one of the non-linear layers to share knowledge among Q,KQ, K, and VV. The third proposed structure shares all non-linear layers with code parameters. The codes are trainable, and the values determine the embedding process to be performed among QQ, KK, and VV. Hence, we demonstrate the superior image classification performance of the proposed approaches in experiments compared to several state-of-the-art approaches. The proposed method achieved 71.4%71.4\% with a few parameters (of 3.1M3.1M) on the ImageNet-1k dataset compared to that required by the original transformer model of XCiT-N12 (69.9%69.9\%). Additionally, the method achieved 93.3%93.3\% with only 2.9M2.9M parameters in transfer learning on average for the CIFAR-10, CIFAR-100, Stanford Cars datasets, and STL-10 datasets, which is better than the accuracy of 92.2%92.2\% obtained via the original XCiT-N12 model

    Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning

    Full text link
    Convolutional Neural Networks (CNNs) have a large number of parameters and take significantly large hardware resources to compute, so edge devices struggle to run high-level networks. This paper proposes a novel method to reduce the parameters and FLOPs for computational efficiency in deep learning models. We introduce accuracy and efficiency coefficients to control the trade-off between the accuracy of the network and its computing efficiency. The proposed Rewarded meta-pruning algorithm trains a network to generate weights for a pruned model chosen based on the approximate parameters of the final model by controlling the interactions using a reward function. The reward function allows more control over the metrics of the final pruned model. Extensive experiments demonstrate superior performances of the proposed method over the state-of-the-art methods in pruning ResNet-50, MobileNetV1, and MobileNetV2 networks

    Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using Diffusion Models

    Full text link
    Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, remains a challenge. Public WCE databases lack diversity and quantity. To address this, we propose a novel approach leveraging generative models, specifically the diffusion model (DM), for generating diverse WCE images. Our model incorporates semantic map resulted from visualization scale (VS) engine, enhancing the controllability and diversity of generated images. We evaluate our approach using visual inspection and visual Turing tests, demonstrating its effectiveness in generating realistic and diverse WCE images

    Co-occurrence matrix analysis-based semi-supervised training for object detection

    Full text link
    One of the most important factors in training object recognition networks using convolutional neural networks (CNNs) is the provision of annotated data accompanying human judgment. Particularly, in object detection or semantic segmentation, the annotation process requires considerable human effort. In this paper, we propose a semi-supervised learning (SSL)-based training methodology for object detection, which makes use of automatic labeling of un-annotated data by applying a network previously trained from an annotated dataset. Because an inferred label by the trained network is dependent on the learned parameters, it is often meaningless for re-training the network. To transfer a valuable inferred label to the unlabeled data, we propose a re-alignment method based on co-occurrence matrix analysis that takes into account one-hot-vector encoding of the estimated label and the correlation between the objects in the image. We used an MS-COCO detection dataset to verify the performance of the proposed SSL method and deformable neural networks (D-ConvNets) as an object detector for basic training. The performance of the existing state-of-the-art detectors (DConvNets, YOLO v2, and single shot multi-box detector (SSD)) can be improved by the proposed SSL method without using the additional model parameter or modifying the network architecture.Comment: Submitted to International Conference on Image Processing (ICIP) 201

    The expression and cellular localization of phospholipase D isozymes in the developing mouse testis

    Get PDF
    To examine the involvement of phospholipase D (PLD) isozymes in postnatal testis development, the expression of PLD1 and PLD2 was examined in the mouse testis at postnatal weeks 1, 2, 4, and 8 using Western blot analysis and immunohistochemistry. The expression of both PLD1 and PLD2 increased gradually with development from postnatal week 1 to 8. Immunohistochemically, PLD immunoreactivity was detected in some germ cells in the testis and interstitial Leydig cells at postnatal week 1. PLD was mainly detected in the spermatocytes and residual bodies of spermatids in the testis after 8 weeks after birth. The intense immunostaining of PLD in Leydig cells remained unchanged by postnatal week 8. These findings suggest that PLD isozymes are involved in the spermatogenesis of the mouse testis
    corecore