139 research outputs found
Invertible Rescaling Network and Its Extensions
Image rescaling is a commonly used bidirectional operation, which first
downscales high-resolution images to fit various display screens or to be
storage- and bandwidth-friendly, and afterward upscales the corresponding
low-resolution images to recover the original resolution or the details in the
zoom-in images. However, the non-injective downscaling mapping discards
high-frequency contents, leading to the ill-posed problem for the inverse
restoration task. This can be abstracted as a general image
degradation-restoration problem with information loss. In this work, we propose
a novel invertible framework to handle this general problem, which models the
bidirectional degradation and restoration from a new perspective, i.e.
invertible bijective transformation. The invertibility enables the framework to
model the information loss of pre-degradation in the form of distribution,
which could mitigate the ill-posed problem during post-restoration. To be
specific, we develop invertible models to generate valid degraded images and
meanwhile transform the distribution of lost contents to the fixed distribution
of a latent variable during the forward degradation. Then restoration is made
tractable by applying the inverse transformation on the generated degraded
image together with a randomly-drawn latent variable. We start from image
rescaling and instantiate the model as Invertible Rescaling Network (IRN),
which can be easily extended to the similar decolorization-colorization task.
We further propose to combine the invertible framework with existing
degradation methods such as image compression for wider applications.
Experimental results demonstrate the significant improvement of our model over
existing methods in terms of both quantitative and qualitative evaluations of
upscaling and colorizing reconstruction from downscaled and decolorized images,
and rate-distortion of image compression.Comment: Accepted by IJC
Your Transformer May Not be as Powerful as You Expect
Relative Positional Encoding (RPE), which encodes the relative distance
between any pair of tokens, is one of the most successful modifications to the
original Transformer. As far as we know, theoretical understanding of the
RPE-based Transformers is largely unexplored. In this work, we mathematically
analyze the power of RPE-based Transformers regarding whether the model is
capable of approximating any continuous sequence-to-sequence functions. One may
naturally assume the answer is in the affirmative -- RPE-based Transformers are
universal function approximators. However, we present a negative result by
showing there exist continuous sequence-to-sequence functions that RPE-based
Transformers cannot approximate no matter how deep and wide the neural network
is. One key reason lies in that most RPEs are placed in the softmax attention
that always generates a right stochastic matrix. This restricts the network
from capturing positional information in the RPEs and limits its capacity. To
overcome the problem and make the model more powerful, we first present
sufficient conditions for RPE-based Transformers to achieve universal function
approximation. With the theoretical guidance, we develop a novel attention
module, called Universal RPE-based (URPE) Attention, which satisfies the
conditions. Therefore, the corresponding URPE-based Transformers become
universal function approximators. Extensive experiments covering typical
architectures and tasks demonstrate that our model is parameter-efficient and
can achieve superior performance to strong baselines in a wide range of
applications. The code will be made publicly available at
https://github.com/lsj2408/URPE.Comment: 22 pages; NeurIPS 2022, Camera Ready Versio
One Transformer Can Understand Both 2D & 3D Molecular Data
Unlike vision and language data which usually has a unique format, molecules
can naturally be characterized using different chemical formulations. One can
view a molecule as a 2D graph or define it as a collection of atoms located in
a 3D space. For molecular representation learning, most previous works designed
neural networks only for a particular data format, making the learned models
likely to fail for other data formats. We believe a general-purpose neural
network model for chemistry should be able to handle molecular tasks across
data modalities. To achieve this goal, in this work, we develop a novel
Transformer-based Molecular model called Transformer-M, which can take
molecular data of 2D or 3D formats as input and generate meaningful semantic
representations. Using the standard Transformer as the backbone architecture,
Transformer-M develops two separated channels to encode 2D and 3D structural
information and incorporate them with the atom features in the network modules.
When the input data is in a particular format, the corresponding channel will
be activated, and the other will be disabled. By training on 2D and 3D
molecular data with properly designed supervised signals, Transformer-M
automatically learns to leverage knowledge from different data modalities and
correctly capture the representations. We conducted extensive experiments for
Transformer-M. All empirical results show that Transformer-M can simultaneously
achieve strong performance on 2D and 3D tasks, suggesting its broad
applicability. The code and models will be made publicly available at
https://github.com/lsj2408/Transformer-M.Comment: 20 pages; ICLR 2023, Camera Ready Version; Code:
https://github.com/lsj2408/Transformer-
Consensus under Misaligned Orientations
This paper presents a consensus algorithm under misaligned orientations,
which is defined as (i) misalignment to global coordinate frame of local
coordinate frames, (ii) biases in control direction or sensing direction, or
(iii) misaligned virtual global coordinate frames. After providing a
mathematical formulation, we provide some sufficient conditions for consensus
or for divergence. Besides the stability analysis, we also conduct some
analysis for convergence characteristics in terms of locations of eigenvalues.
Through a number of numerical simulations, we would attempt to understand the
behaviors of misaligned consensus dynamics.Comment: 23 pages, 9 figure
Feature extraction method based on VMD and MFDFA for fault diagnosis of reciprocating compressor valve
Aiming at the nonlinearity, nonstationarity and multi-component coupling characteristics of reciprocating compressor vibration signals, an integrated feature extraction method based on the variational mode decomposition (VMD) and multi-fractal detrended fluctuation analysis (MFDFA) is proposed for a fault diagnosis for a reciprocating compressor valve. Firstly, to eliminate the noise interference, a novel VMD method with superior anti-interference performance was utilized to obtain several components of the quasi-orthogonal band-limited intrinsic mode function (BLIMF) from a strong non-stationarity vibration signal, and a consistent number K of BLIMFs was selected based on a novel criterion for all fault states. Secondly, the MFDFA method, which can describe the multi-fractal structure feature of non-stationary time series, was applied to analyze each BLIMF component, and the parameters of MFDFA were employed as the eigenvectors to reflect the structure characteristics and local scale behavior of the vibration signal. Then, the principal component analysis (PCA) was introduced to refine the eigenvectors for a higher recognition efficiency and accuracy. Finally, the vibration signals of four types of reciprocating compressor valve faults were analyzed by this method, and the faults were identified correctly by pattern classifiers of BTSVM and CNN. Further results comparison with other feature extraction methods verifies the superiority of the proposed method
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Fast screening and diagnosis are critical in COVID-19 patient treatment. In
addition to the gold standard RT-PCR, radiological imaging like X-ray and CT
also works as an important means in patient screening and follow-up. However,
due to the excessive number of patients, writing reports becomes a heavy burden
for radiologists. To reduce the workload of radiologists, we propose DeltaNet
to generate medical reports automatically. Different from typical image
captioning approaches that generate reports with an encoder and a decoder,
DeltaNet applies a conditional generation process. In particular, given a
medical image, DeltaNet employs three steps to generate a report: 1) first
retrieving related medical reports, i.e., the historical reports from the same
or similar patients; 2) then comparing retrieved images and current image to
find the differences; 3) finally generating a new report to accommodate
identified differences based on the conditional report. We evaluate DeltaNet on
a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches.
Besides COVID-19, the proposed DeltaNet can be applied to other diseases as
well. We validate its generalization capabilities on the public IU-Xray and
MIMIC-CXR datasets for chest-related diseases. Code is available at
\url{https://github.com/LX-doctorAI1/DeltaNet}
- …