260 research outputs found
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
Learn from Unpaired Data for Image Restoration: A Variational Bayes Approach
Collecting paired training data is difficult in practice, but the unpaired
samples broadly exist. Current approaches aim at generating synthesized
training data from the unpaired samples by exploring the relationship between
the corrupted and clean data. This work proposes LUD-VAE, a deep generative
method to learn the joint probability density function from data sampled from
marginal distributions. Our approach is based on a carefully designed
probabilistic graphical model in which the clean and corrupted data domains are
conditionally independent. Using variational inference, we maximize the
evidence lower bound (ELBO) to estimate the joint probability density function.
Furthermore, we show that the ELBO is computable without paired samples under
the inference invariant assumption. This property provides the mathematical
rationale of our approach in the unpaired setting. Finally, we apply our method
to real-world image denoising and super-resolution tasks and train the models
using the synthetic data generated by the LUD-VAE. Experimental results
validate the advantages of our method over other learnable approaches
YOLOrtho -- A Unified Framework for Teeth Enumeration and Dental Disease Detection
Detecting dental diseases through panoramic X-rays images is a standard
procedure for dentists. Normally, a dentist need to identify diseases and find
the infected teeth. While numerous machine learning models adopting this
two-step procedure have been developed, there has not been an end-to-end model
that can identify teeth and their associated diseases at the same time. To fill
the gap, we develop YOLOrtho, a unified framework for teeth enumeration and
dental disease detection. We develop our model on Dentex Challenge 2023 data,
which consists of three distinct types of annotated data. The first part is
labeled with quadrant, and the second part is labeled with quadrant and
enumeration and the third part is labeled with quadrant, enumeration and
disease. To further improve detection, we make use of Tufts Dental public
dataset. To fully utilize the data and learn both teeth detection and disease
identification simultaneously, we formulate diseases as attributes attached to
their corresponding teeth. Due to the nature of position relation in teeth
enumeration, We replace convolution layer with CoordConv in our model to
provide more position information for the model. We also adjust the model
architecture and insert one more upsampling layer in FPN in favor of large
object detection. Finally, we propose a post-process strategy for teeth layout
that corrects teeth enumeration based on linear sum assignment. Results from
experiments show that our model exceeds large Diffusion-based model
Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification
Multi-spectral vehicle re-identification aims to address the challenge of
identifying vehicles in complex lighting conditions by incorporating
complementary visible and infrared information. However, in harsh environments,
the discriminative cues in RGB and NIR modalities are often lost due to strong
flares from vehicle lamps or sunlight, and existing multi-modal fusion methods
are limited in their ability to recover these important cues. To address this
problem, we propose a Flare-Aware Cross-modal Enhancement Network that
adaptively restores flare-corrupted RGB and NIR features with guidance from the
flare-immunized thermal infrared spectrum. First, to reduce the influence of
locally degraded appearance due to intense flare, we propose a Mutual Flare
Mask Prediction module to jointly obtain flare-corrupted masks in RGB and NIR
modalities in a self-supervised manner. Second, to use the flare-immunized TI
information to enhance the masked RGB and NIR, we propose a Flare-Aware
Cross-modal Enhancement module that adaptively guides feature extraction of
masked RGB and NIR spectra with prior flare-immunized knowledge from the TI
spectrum. Third, to extract common informative semantic information from RGB
and NIR, we propose an Inter-modality Consistency loss that enforces semantic
consistency between the two modalities. Finally, to evaluate the proposed
FACENet in handling intense flare, we introduce a new multi-spectral vehicle
re-ID dataset, called WMVEID863, with additional challenges such as motion
blur, significant background changes, and particularly intense flare
degradation. Comprehensive experiments on both the newly collected dataset and
public benchmark multi-spectral vehicle re-ID datasets demonstrate the superior
performance of the proposed FACENet compared to state-of-the-art methods,
especially in handling strong flares. The code and dataset will be released
soon
Twisting and tweezing the spin wave: on vortices, skyrmions, helical waves, and the magnonic spiral phase plate
Spin waves are the low-energy excitations of magnetically ordered materials.
They are key elements in the stability analysis of the ordered phase and have a
wealth of technological applications. Recently, we showed that spin waves of a
magnetic nanowire may carry a definite amount of orbital angular momentum
components along the propagation direction. This helical, in addition to the
chiral, character of the spin waves is related to the spatial modulations of
the spin wave phase across the wire. It, however, remains a challenge to
generate and control such modes with conventional magnetic fields. Here, we
make the first proposal for a \textit{magnetic} spiral phase plate by
appropriately synthesizing two magnetic materials that have different speeds of
spin waves. It is demonstrated with full-numerical micromagnetic simulations
that despite the complicated structure of demagnetization fields, a homogeneous
spin wave passing through the spiral phase plate attains the required twist and
propagates further with the desired orbital angular momentum. While excitations
from the ordered phase may have a twist, the magnetization itself can be
twisted due to internal fields and forms what is known as a magnetic vortex. We
point out the differences between both types of magnetic phenomena and discuss
their possible interaction.Comment: 6 pages, 5 figure
- …