1,315 research outputs found

    Multi-Energy Blended CBCT Spectral Imaging Using a Spectral Modulator with Flying Focal Spot (SMFFS)

    Full text link
    Cone-beam CT (CBCT) spectral imaging has great potential in medical and industrial applications, but it is very challenging as scatter and spectral effects are seriously twisted. In this work, we present the first attempt to develop a stationary spectral modulator with flying focal spot (SMFFS) technology as a promising, low-cost approach to accurately solving the X-ray scattering problem and physically enabling spectral imaging in a unified framework, and with no significant misalignment in data sampling of spectral projections. Based on an in-depth analysis of optimal energy separation from different combinations of modulator materials and thicknesses, we present a practical design of a mixed two-dimensional spectral modulator that can generate multi-energy blended CBCT spectral projections. To deal with the twisted scatter-spectral challenge, we propose a novel scatter-decoupled material decomposition (SDMD) method by taking advantage of a scatter similarity in SMFFS. A Monte Carlo simulation is conducted to validate the strong similarity of X-ray scatter distributions across the flying focal spot positions. Both numerical simulations using a clinical abdominal CT dataset, and physics experiments on a tabletop CBCT system using a GAMMEX multi-energy CT phantom, are carried out to demonstrate the feasibility of our proposed SDMD method for CBCT spectral imaging with SMFFS. In the physics experiments, the mean relative errors in selected ROI for virtual monochromatic image (VMI) are 0.9\% for SMFFS, and 5.3\% and 16.9\% for 80/120 kV dual-energy cone-beam scan with and without scatter correction, respectively. Our preliminary results show that SMFFS can effectively improve the quantitative imaging performance of CBCT.Comment: 10 pages, 13 figure

    LumiGAN: Unconditional Generation of Relightable 3D Human Faces

    Full text link
    Unsupervised learning of 3D human faces from unstructured 2D image data is an active research area. While recent works have achieved an impressive level of photorealism, they commonly lack control of lighting, which prevents the generated assets from being deployed in novel environments. To this end, we introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D human faces with a physically based lighting module that enables relighting under novel illumination at inference time. Unlike prior work, LumiGAN can create realistic shadow effects using an efficient visibility formulation that is learned in a self-supervised manner. LumiGAN generates plausible physical properties for relightable faces, including surface normals, diffuse albedo, and specular tint without any ground truth data. In addition to relightability, we demonstrate significantly improved geometry generation compared to state-of-the-art non-relightable 3D GANs and notably better photorealism than existing relightable GANs.Comment: Project page: https://boyangdeng.com/projects/lumiga

    Higher superconducting transition temperature by breaking the universal pressure relation

    Full text link
    By investigating the bulk superconducting state via dc magnetization measurements, we have discovered a common resurgence of the superconductive transition temperatures (Tcs) of the monolayer Bi2Sr2CuO6+{\delta} (Bi2201) and bilayer Bi2Sr2CaCu2O8+{\delta} (Bi2212) to beyond the maximum Tcs (Tc-maxs) predicted by the universal relation between Tc and doping (p) or pressure (P) at higher pressures. The Tc of under-doped Bi2201 initially increases from 9.6 K at ambient to a peak at ~ 23 K at ~ 26 GPa and then drops as expected from the universal Tc-P relation. However, at pressures above ~ 40 GPa, Tc rises rapidly without any sign of saturation up to ~ 30 K at ~ 51 GPa. Similarly, the Tc for the slightly overdoped Bi2212 increases after passing a broad valley between 20-36 GPa and reaches ~ 90 K without any sign of saturation at ~ 56 GPa. We have therefore attributed this Tc-resurgence to a possible pressure-induced electronic transition in the cuprate compounds due to a charge transfer between the Cu 3d_(x^2-y^2 ) and the O 2p bands projected from a hybrid bonding state, leading to an increase of the density of states at the Fermi level, in agreement with our density functional theory calculations. Similar Tc-P behavior has also been reported in the trilayer Br2Sr2Ca2Cu3O10+{\delta} (Bi2223). These observations suggest that higher Tcs than those previously reported for the layered cuprate high temperature superconductors can be achieved by breaking away from the universal Tc-P relation through the application of higher pressures.Comment: 13 pages, including 5 figure

    Improving Detection in Aerial Images by Capturing Inter-Object Relationships

    Full text link
    In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to refine classification and regression outcomes for detected objects. Building on two-stage detectors, we tokenize the region of interest (RoI) proposals to be processed by a transformer encoder. Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized. Experimental results demonstrate that the proposed method achieves consistent performance improvement on three benchmarks including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59 mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016, respectively, compared to the baselines

    Going Deeper With Directly-Trained Larger Spiking Neural Networks

    Full text link
    Spiking neural networks (SNNs) are promising in a bio-plausible coding for spatio-temporal information and event-driven signal processing, which is very suited for energy-efficient implementation in neuromorphic hardware. However, the unique working mode of SNNs makes them more difficult to train than traditional networks. Currently, there are two main routes to explore the training of deep SNNs with high performance. The first is to convert a pre-trained ANN model to its SNN version, which usually requires a long coding window for convergence and cannot exploit the spatio-temporal features during training for solving temporal tasks. The other is to directly train SNNs in the spatio-temporal domain. But due to the binary spike activity of the firing function and the problem of gradient vanishing or explosion, current methods are restricted to shallow architectures and thereby difficult in harnessing large-scale datasets (e.g. ImageNet). To this end, we propose a threshold-dependent batch normalization (tdBN) method based on the emerging spatio-temporal backpropagation, termed "STBP-tdBN", enabling direct training of a very deep SNN and the efficient implementation of its inference on neuromorphic hardware. With the proposed method and elaborated shortcut connection, we significantly extend directly-trained SNNs from a shallow structure ( < 10 layer) to a very deep structure (50 layers). Furthermore, we theoretically analyze the effectiveness of our method based on "Block Dynamical Isometry" theory. Finally, we report superior accuracy results including 93.15 % on CIFAR-10, 67.8 % on DVS-CIFAR10, and 67.05% on ImageNet with very few timesteps. To our best knowledge, it's the first time to explore the directly-trained deep SNNs with high performance on ImageNet.Comment: 12 pages, 6 figures, conference or other essential inf

    Fine-grained Recognition with Learnable Semantic Data Augmentation

    Full text link
    Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The source code will be released

    Rethinking Triplet Loss for Domain Adaptation

    Get PDF
    The gap in data distribution motivates domain adaptation research. In this area, image classification intrinsically requires the source and target features to be co-located if they are of the same class. However, many works only take a global view of the domain gap. That is, to make the data distributions globally overlap; and this does not necessarily lead to feature co-location at the class level. To resolve this problem, we study metric learning in the context of domain adaptation. Specifically, we introduce a similarity guided constraint (SGC). In the implementation, SGC takes the form of a triplet loss. The triplet loss is integrated into the network as an additional objective term. Here, an image triplet consists of two images of the same class and another image of a different class. Albeit simple, the working mechanism of our method is interesting and insightful. Importantly, images in the triplets are sampled from the source and target domains. From a micro perspective, by enforcing this constraint on every possible triplet, images from different domains but of the same class are mapped nearby, and those of different classes are far apart. From a macro perspective, our method ensures that cross-domain similarities are preserved, leading to intra-class compactness and inter-class separability. Extensive experiment on four datasets shows our method yields significant improvement over the baselines and has a competitive accuracy with the state-of-the-art results.This research was conducted by the Australian Research Council Centre of Excellence for Robotic Vision (project number CE140100016). Liang Zheng is the recipient of an Australian Research Council Discovery Early Career Award (project number DE200101283) funded by the Australian Government. Jianbin Jiao is supported by the NSFC under Grant 61771447. This article was recommended by Associate Editor H. Men

    Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

    Full text link
    Current Scene text image super-resolution approaches primarily focus on extracting robust features, acquiring text information, and complex training strategies to generate super-resolution images. However, the upsampling module, which is crucial in the process of converting low-resolution images to high-resolution ones, has received little attention in existing works. To address this issue, we propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling. The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features. Unlike previous graph attention mechanisms, our approach achieves 2-3 orders of magnitude improvement in efficiency and memory utilization by eliminating the dependency on sparse adjacency matrices and introducing a sliding window approach for efficient parallel computation. Additionally, we introduce the MLP-based Sequential Residual Block (MSRB) for robust feature extraction from text images, and a Local Contour Awareness loss (Llca\mathcal{L}_{lca}) to enhance the model's perception of details. Comprehensive experiments on TextZoom demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy. For single-stage and multi-stage strategies, we achieved improvements of 0.7\% and 2.6\%, respectively, increasing the performance from 52.6\% and 53.7\% to 53.3\% and 56.3\%. The code is available at https://github.com/wenyu1009/RTSRN
    corecore