698 research outputs found
A Novel Weight-Shared Multi-Stage CNN for Scale Robustness
Convolutional neural networks (CNNs) have demonstrated remarkable results in
image classification for benchmark tasks and practical applications. The CNNs
with deeper architectures have achieved even higher performance recently thanks
to their robustness to the parallel shift of objects in images as well as their
numerous parameters and the resulting high expression ability. However, CNNs
have a limited robustness to other geometric transformations such as scaling
and rotation. This limits the performance improvement of the deep CNNs, but
there is no established solution. This study focuses on scale transformation
and proposes a network architecture called the weight-shared multi-stage
network (WSMS-Net), which consists of multiple stages of CNNs. The proposed
WSMS-Net is easily combined with existing deep CNNs such as ResNet and DenseNet
and enables them to acquire robustness to object scaling. Experimental results
on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that existing
deep CNNs combined with the proposed WSMS-Net achieve higher accuracies for
image classification tasks with only a minor increase in the number of
parameters and computation time.Comment: accepted version, 13 page
Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model
Semantic editing of images is the fundamental goal of computer vision.
Although deep learning methods, such as generative adversarial networks (GANs),
are capable of producing high-quality images, they often do not have an
inherent way of editing generated images semantically. Recent studies have
investigated a way of manipulating the latent variable to determine the images
to be generated. However, methods that assume linear semantic arithmetic have
certain limitations in terms of the quality of image editing, whereas methods
that discover nonlinear semantic pathways provide non-commutative editing,
which is inconsistent when applied in different orders. This study proposes a
novel method called deep curvilinear editing (DeCurvEd) to determine semantic
commuting vector fields on the latent space. We theoretically demonstrate that
owing to commutativity, the editing of multiple attributes depends only on the
quantities and not on the order. Furthermore, we experimentally demonstrate
that compared to previous methods, the nonlinear and commutative nature of
DeCurvEd facilitates the disentanglement of image attributes and provides
higher-quality editing.Comment: 15 page
FINDE: Neural Differential Equations for Finding and Preserving Invariant Quantities
Many real-world dynamical systems are associated with first integrals (a.k.a.
invariant quantities), which are quantities that remain unchanged over time.
The discovery and understanding of first integrals are fundamental and
important topics both in the natural sciences and in industrial applications.
First integrals arise from the conservation laws of system energy, momentum,
and mass, and from constraints on states; these are typically related to
specific geometric structures of the governing equations. Existing neural
networks designed to ensure such first integrals have shown excellent accuracy
in modeling from data. However, these models incorporate the underlying
structures, and in most situations where neural networks learn unknown systems,
these structures are also unknown. This limitation needs to be overcome for
scientific discovery and modeling of unknown systems. To this end, we propose
first integral-preserving neural differential equation (FINDE). By leveraging
the projection method and the discrete gradient method, FINDE finds and
preserves first integrals from data, even in the absence of prior knowledge
about underlying structures. Experimental results demonstrate that FINDE can
predict future states of target systems much longer and find various quantities
consistent with well-known first integrals in a unified manner.Comment: 25 page
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Deep convolutional neural networks (CNNs) have achieved remarkable results in
image processing tasks. However, their high expression ability risks
overfitting. Consequently, data augmentation techniques have been proposed to
prevent overfitting while enriching datasets. Recent CNN architectures with
more parameters are rendering traditional data augmentation techniques
insufficient. In this study, we propose a new data augmentation technique
called random image cropping and patching (RICAP) which randomly crops four
images and patches them to create a new training image. Moreover, RICAP mixes
the class labels of the four images, resulting in an advantage similar to label
smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the
shake-shake regularization model) by comparison with competitive data
augmentation techniques such as cutout and mixup. RICAP achieves a new
state-of-the-art test error of on CIFAR-10. We also confirmed that
deep CNNs with RICAP achieve better results on classification tasks using
CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft
COCO.Comment: accepted version, 16 page
- …