1,315 research outputs found
Multi-Energy Blended CBCT Spectral Imaging Using a Spectral Modulator with Flying Focal Spot (SMFFS)
Cone-beam CT (CBCT) spectral imaging has great potential in medical and
industrial applications, but it is very challenging as scatter and spectral
effects are seriously twisted. In this work, we present the first attempt to
develop a stationary spectral modulator with flying focal spot (SMFFS)
technology as a promising, low-cost approach to accurately solving the X-ray
scattering problem and physically enabling spectral imaging in a unified
framework, and with no significant misalignment in data sampling of spectral
projections. Based on an in-depth analysis of optimal energy separation from
different combinations of modulator materials and thicknesses, we present a
practical design of a mixed two-dimensional spectral modulator that can
generate multi-energy blended CBCT spectral projections. To deal with the
twisted scatter-spectral challenge, we propose a novel scatter-decoupled
material decomposition (SDMD) method by taking advantage of a scatter
similarity in SMFFS. A Monte Carlo simulation is conducted to validate the
strong similarity of X-ray scatter distributions across the flying focal spot
positions. Both numerical simulations using a clinical abdominal CT dataset,
and physics experiments on a tabletop CBCT system using a GAMMEX multi-energy
CT phantom, are carried out to demonstrate the feasibility of our proposed SDMD
method for CBCT spectral imaging with SMFFS. In the physics experiments, the
mean relative errors in selected ROI for virtual monochromatic image (VMI) are
0.9\% for SMFFS, and 5.3\% and 16.9\% for 80/120 kV dual-energy cone-beam scan
with and without scatter correction, respectively. Our preliminary results show
that SMFFS can effectively improve the quantitative imaging performance of
CBCT.Comment: 10 pages, 13 figure
LumiGAN: Unconditional Generation of Relightable 3D Human Faces
Unsupervised learning of 3D human faces from unstructured 2D image data is an
active research area. While recent works have achieved an impressive level of
photorealism, they commonly lack control of lighting, which prevents the
generated assets from being deployed in novel environments. To this end, we
introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D
human faces with a physically based lighting module that enables relighting
under novel illumination at inference time. Unlike prior work, LumiGAN can
create realistic shadow effects using an efficient visibility formulation that
is learned in a self-supervised manner. LumiGAN generates plausible physical
properties for relightable faces, including surface normals, diffuse albedo,
and specular tint without any ground truth data. In addition to relightability,
we demonstrate significantly improved geometry generation compared to
state-of-the-art non-relightable 3D GANs and notably better photorealism than
existing relightable GANs.Comment: Project page: https://boyangdeng.com/projects/lumiga
Higher superconducting transition temperature by breaking the universal pressure relation
By investigating the bulk superconducting state via dc magnetization
measurements, we have discovered a common resurgence of the superconductive
transition temperatures (Tcs) of the monolayer Bi2Sr2CuO6+{\delta} (Bi2201) and
bilayer Bi2Sr2CaCu2O8+{\delta} (Bi2212) to beyond the maximum Tcs (Tc-maxs)
predicted by the universal relation between Tc and doping (p) or pressure (P)
at higher pressures. The Tc of under-doped Bi2201 initially increases from 9.6
K at ambient to a peak at ~ 23 K at ~ 26 GPa and then drops as expected from
the universal Tc-P relation. However, at pressures above ~ 40 GPa, Tc rises
rapidly without any sign of saturation up to ~ 30 K at ~ 51 GPa. Similarly, the
Tc for the slightly overdoped Bi2212 increases after passing a broad valley
between 20-36 GPa and reaches ~ 90 K without any sign of saturation at ~ 56
GPa. We have therefore attributed this Tc-resurgence to a possible
pressure-induced electronic transition in the cuprate compounds due to a charge
transfer between the Cu 3d_(x^2-y^2 ) and the O 2p bands projected from a
hybrid bonding state, leading to an increase of the density of states at the
Fermi level, in agreement with our density functional theory calculations.
Similar Tc-P behavior has also been reported in the trilayer
Br2Sr2Ca2Cu3O10+{\delta} (Bi2223). These observations suggest that higher Tcs
than those previously reported for the layered cuprate high temperature
superconductors can be achieved by breaking away from the universal Tc-P
relation through the application of higher pressures.Comment: 13 pages, including 5 figure
Improving Detection in Aerial Images by Capturing Inter-Object Relationships
In many image domains, the spatial distribution of objects in a scene
exhibits meaningful patterns governed by their semantic relationships. In most
modern detection pipelines, however, the detection proposals are processed
independently, overlooking the underlying relationships between objects. In
this work, we introduce a transformer-based approach to capture these
inter-object relationships to refine classification and regression outcomes for
detected objects. Building on two-stage detectors, we tokenize the region of
interest (RoI) proposals to be processed by a transformer encoder. Specific
spatial and geometric relations are incorporated into the attention weights and
adaptively modulated and regularized. Experimental results demonstrate that the
proposed method achieves consistent performance improvement on three benchmarks
including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both
DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59
mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016,
respectively, compared to the baselines
Going Deeper With Directly-Trained Larger Spiking Neural Networks
Spiking neural networks (SNNs) are promising in a bio-plausible coding for
spatio-temporal information and event-driven signal processing, which is very
suited for energy-efficient implementation in neuromorphic hardware. However,
the unique working mode of SNNs makes them more difficult to train than
traditional networks. Currently, there are two main routes to explore the
training of deep SNNs with high performance. The first is to convert a
pre-trained ANN model to its SNN version, which usually requires a long coding
window for convergence and cannot exploit the spatio-temporal features during
training for solving temporal tasks. The other is to directly train SNNs in the
spatio-temporal domain. But due to the binary spike activity of the firing
function and the problem of gradient vanishing or explosion, current methods
are restricted to shallow architectures and thereby difficult in harnessing
large-scale datasets (e.g. ImageNet). To this end, we propose a
threshold-dependent batch normalization (tdBN) method based on the emerging
spatio-temporal backpropagation, termed "STBP-tdBN", enabling direct training
of a very deep SNN and the efficient implementation of its inference on
neuromorphic hardware. With the proposed method and elaborated shortcut
connection, we significantly extend directly-trained SNNs from a shallow
structure ( < 10 layer) to a very deep structure (50 layers). Furthermore, we
theoretically analyze the effectiveness of our method based on "Block Dynamical
Isometry" theory. Finally, we report superior accuracy results including 93.15
% on CIFAR-10, 67.8 % on DVS-CIFAR10, and 67.05% on ImageNet with very few
timesteps. To our best knowledge, it's the first time to explore the
directly-trained deep SNNs with high performance on ImageNet.Comment: 12 pages, 6 figures, conference or other essential inf
Fine-grained Recognition with Learnable Semantic Data Augmentation
Fine-grained image recognition is a longstanding computer vision challenge
that focuses on differentiating objects belonging to multiple subordinate
categories within the same meta-category. Since images belonging to the same
meta-category usually share similar visual appearances, mining discriminative
visual cues is the key to distinguishing fine-grained categories. Although
commonly used image-level data augmentation techniques have achieved great
success in generic image classification problems, they are rarely applied in
fine-grained scenarios, because their random editing-region behavior is prone
to destroy the discriminative visual cues residing in the subtle regions. In
this paper, we propose diversifying the training data at the feature-level to
alleviate the discriminative region loss problem. Specifically, we produce
diversified augmented samples by translating image features along semantically
meaningful directions. The semantic directions are estimated with a covariance
prediction network, which predicts a sample-wise covariance matrix to adapt to
the large intra-class variation inherent in fine-grained images. Furthermore,
the covariance prediction network is jointly optimized with the classification
network in a meta-learning manner to alleviate the degenerate solution problem.
Experiments on four competitive fine-grained recognition benchmarks
(CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our
method significantly improves the generalization performance on several popular
classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and
ViT). Combined with a recently proposed method, our semantic data augmentation
approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The
source code will be released
Rethinking Triplet Loss for Domain Adaptation
The gap in data distribution motivates domain
adaptation research. In this area, image classification intrinsically
requires the source and target features to be co-located if
they are of the same class. However, many works only take
a global view of the domain gap. That is, to make the data
distributions globally overlap; and this does not necessarily lead
to feature co-location at the class level. To resolve this problem,
we study metric learning in the context of domain adaptation.
Specifically, we introduce a similarity guided constraint (SGC).
In the implementation, SGC takes the form of a triplet loss.
The triplet loss is integrated into the network as an additional
objective term. Here, an image triplet consists of two images of
the same class and another image of a different class. Albeit
simple, the working mechanism of our method is interesting
and insightful. Importantly, images in the triplets are sampled
from the source and target domains. From a micro perspective,
by enforcing this constraint on every possible triplet, images from
different domains but of the same class are mapped nearby, and
those of different classes are far apart. From a macro perspective,
our method ensures that cross-domain similarities are preserved,
leading to intra-class compactness and inter-class separability.
Extensive experiment on four datasets shows our method yields
significant improvement over the baselines and has a competitive
accuracy with the state-of-the-art results.This research
was conducted by the Australian Research Council Centre of Excellence for
Robotic Vision (project number CE140100016). Liang Zheng is the recipient
of an Australian Research Council Discovery Early Career Award (project
number DE200101283) funded by the Australian Government. Jianbin Jiao is
supported by the NSFC under Grant 61771447. This article was recommended
by Associate Editor H. Men
Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution
Current Scene text image super-resolution approaches primarily focus on
extracting robust features, acquiring text information, and complex training
strategies to generate super-resolution images. However, the upsampling module,
which is crucial in the process of converting low-resolution images to
high-resolution ones, has received little attention in existing works. To
address this issue, we propose the Pixel Adapter Module (PAM) based on graph
attention to address pixel distortion caused by upsampling. The PAM effectively
captures local structural information by allowing each pixel to interact with
its neighbors and update features. Unlike previous graph attention mechanisms,
our approach achieves 2-3 orders of magnitude improvement in efficiency and
memory utilization by eliminating the dependency on sparse adjacency matrices
and introducing a sliding window approach for efficient parallel computation.
Additionally, we introduce the MLP-based Sequential Residual Block (MSRB) for
robust feature extraction from text images, and a Local Contour Awareness loss
() to enhance the model's perception of details.
Comprehensive experiments on TextZoom demonstrate that our proposed method
generates high-quality super-resolution images, surpassing existing methods in
recognition accuracy. For single-stage and multi-stage strategies, we achieved
improvements of 0.7\% and 2.6\%, respectively, increasing the performance from
52.6\% and 53.7\% to 53.3\% and 56.3\%. The code is available at
https://github.com/wenyu1009/RTSRN
- …