32 research outputs found
Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints
A vision transformer (ViT) is the dominant model in the computer vision
field. Despite numerous studies that mainly focus on dealing with inductive
bias and complexity, there remains the problem of finding better transformer
networks. For example, conventional transformer-based models usually use a
projection layer for each query (Q), key (K), and value (V) embedding before
multi-head self-attention. Insufficient consideration of semantic , and
embedding may lead to a performance drop. In this paper, we propose three
types of structures for , , and embedding. The first structure
utilizes two layers with ReLU, which is a non-linear embedding for , and
. The second involves sharing one of the non-linear layers to share
knowledge among , and . The third proposed structure shares all
non-linear layers with code parameters. The codes are trainable, and the values
determine the embedding process to be performed among , , and . Hence,
we demonstrate the superior image classification performance of the proposed
approaches in experiments compared to several state-of-the-art approaches. The
proposed method achieved with a few parameters (of ) on the
ImageNet-1k dataset compared to that required by the original transformer model
of XCiT-N12 (). Additionally, the method achieved with only
parameters in transfer learning on average for the CIFAR-10, CIFAR-100,
Stanford Cars datasets, and STL-10 datasets, which is better than the accuracy
of obtained via the original XCiT-N12 model
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Convolutional Neural Networks (CNNs) have a large number of parameters and
take significantly large hardware resources to compute, so edge devices
struggle to run high-level networks. This paper proposes a novel method to
reduce the parameters and FLOPs for computational efficiency in deep learning
models. We introduce accuracy and efficiency coefficients to control the
trade-off between the accuracy of the network and its computing efficiency. The
proposed Rewarded meta-pruning algorithm trains a network to generate weights
for a pruned model chosen based on the approximate parameters of the final
model by controlling the interactions using a reward function. The reward
function allows more control over the metrics of the final pruned model.
Extensive experiments demonstrate superior performances of the proposed method
over the state-of-the-art methods in pruning ResNet-50, MobileNetV1, and
MobileNetV2 networks
Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using Diffusion Models
Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the
gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However,
interpreting WCE results can be time-consuming and tiring. Existing studies
have employed deep neural networks (DNNs) for automatic GI tract lesion
detection, but acquiring sufficient training examples, particularly due to
privacy concerns, remains a challenge. Public WCE databases lack diversity and
quantity. To address this, we propose a novel approach leveraging generative
models, specifically the diffusion model (DM), for generating diverse WCE
images. Our model incorporates semantic map resulted from visualization scale
(VS) engine, enhancing the controllability and diversity of generated images.
We evaluate our approach using visual inspection and visual Turing tests,
demonstrating its effectiveness in generating realistic and diverse WCE images
Co-occurrence matrix analysis-based semi-supervised training for object detection
One of the most important factors in training object recognition networks
using convolutional neural networks (CNNs) is the provision of annotated data
accompanying human judgment. Particularly, in object detection or semantic
segmentation, the annotation process requires considerable human effort. In
this paper, we propose a semi-supervised learning (SSL)-based training
methodology for object detection, which makes use of automatic labeling of
un-annotated data by applying a network previously trained from an annotated
dataset. Because an inferred label by the trained network is dependent on the
learned parameters, it is often meaningless for re-training the network. To
transfer a valuable inferred label to the unlabeled data, we propose a
re-alignment method based on co-occurrence matrix analysis that takes into
account one-hot-vector encoding of the estimated label and the correlation
between the objects in the image. We used an MS-COCO detection dataset to
verify the performance of the proposed SSL method and deformable neural
networks (D-ConvNets) as an object detector for basic training. The performance
of the existing state-of-the-art detectors (DConvNets, YOLO v2, and single shot
multi-box detector (SSD)) can be improved by the proposed SSL method without
using the additional model parameter or modifying the network architecture.Comment: Submitted to International Conference on Image Processing (ICIP) 201
The expression and cellular localization of phospholipase D isozymes in the developing mouse testis
To examine the involvement of phospholipase D (PLD) isozymes in postnatal testis development, the expression of PLD1 and PLD2 was examined in the mouse testis at postnatal weeks 1, 2, 4, and 8 using Western blot analysis and immunohistochemistry. The expression of both PLD1 and PLD2 increased gradually with development from postnatal week 1 to 8. Immunohistochemically, PLD immunoreactivity was detected in some germ cells in the testis and interstitial Leydig cells at postnatal week 1. PLD was mainly detected in the spermatocytes and residual bodies of spermatids in the testis after 8 weeks after birth. The intense immunostaining of PLD in Leydig cells remained unchanged by postnatal week 8. These findings suggest that PLD isozymes are involved in the spermatogenesis of the mouse testis