34 research outputs found
More ferroelectrics discovered by switching spectroscopy piezoresponse force microscopy?
The local hysteresis loop obtained by switching spectroscopy piezoresponse
force microscopy (SS-PFM) is usually regarded as a typical signature of
ferroelectric switching. However, such hysteresis loops were also observed in a
broad variety of non-ferroelectric materials in the past several years, which
casts doubts on the viewpoint that the local hysteresis loops in SS-PFM
originate from ferroelectricity. Therefore, it is crucial to explore the
mechanism of local hysteresis loops obtained in SS-PFM testing. Here we
proposed that non-ferroelectric materials can also exhibit amplitude butterfly
loops and phase hysteresis loops in SS-PFM testing due to the Maxwell force as
long as the material can show macroscopic D-E hysteresis loops under cyclic
electric field loading, no matter what the inherent physical mechanism is. To
verify our viewpoint, both the macroscopic D-E and microscopic SS-PFM testing
are conducted on a soda-lime glass and a non-ferroelectric dielectric material
Ba0.4Sr0.6TiO3. Results show that both materials can exhibit D-E hysteresis
loops and SS-PFM phase hysteresis loops, which can well support our viewpoint.Comment: 12 pages,4 figure
ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
We propose a novel Text-to-Image Generation Network, Adaptive Layout
Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the
layout of synthesized images without any auxiliary information. The ALR-GAN
includes an Adaptive Layout Refinement (ALR) module and a Layout Visual
Refinement (LVR) loss. The ALR module aligns the layout structure (which refers
to locations of objects and background) of a synthesized image with that of its
corresponding real image. In ALR module, we proposed an Adaptive Layout
Refinement (ALR) loss to balance the matching of hard and easy features, for
more efficient layout structure matching. Based on the refined layout
structure, the LVR loss further refines the visual representation within the
layout area. Experimental results on two widely-used datasets show that ALR-GAN
performs competitively at the Text-to-Image generation task.Comment: Accepted by TM
MHSA-Net: Multi-Head Self-Attention Network for Occluded Person Re-Identification
This paper presents a novel person re-identification model, named Multi-Head
Self-Attention Network (MHSA-Net), to prune unimportant information and capture
key local information from person images. MHSA-Net contains two main novel
components: Multi-Head Self-Attention Branch (MHSAB) and Attention Competition
Mechanism (ACM). The MHSAM adaptively captures key local person information,
and then produces effective diversity embeddings of an image for the person
matching. The ACM further helps filter out attention noise and non-key
information. Through extensive ablation studies, we verified that the
Structured Self-Attention Branch and Attention Competition Mechanism both
contribute to the performance improvement of the MHSA-Net. Our MHSA-Net
achieves state-of-the-art performance especially on images with occlusions. We
have released our models (and will release the source codes after the paper is
accepted) on https://github.com/hongchenphd/MHSA-Net.Comment: Submitted to IEEE Transactions on Image Processing (TIP
Fine-grained Text and Image Guided Point Cloud Completion with CLIP Model
This paper focuses on the recently popular task of point cloud completion
guided by multimodal information. Although existing methods have achieved
excellent performance by fusing auxiliary images, there are still some
deficiencies, including the poor generalization ability of the model and
insufficient fine-grained semantic information for extracted features. In this
work, we propose a novel multimodal fusion network for point cloud completion,
which can simultaneously fuse visual and textual information to predict the
semantic and geometric characteristics of incomplete shapes effectively.
Specifically, to overcome the lack of prior information caused by the
small-scale dataset, we employ a pre-trained vision-language model that is
trained with a large amount of image-text pairs. Therefore, the textual and
visual encoders of this large-scale model have stronger generalization ability.
Then, we propose a multi-stage feature fusion strategy to fuse the textual and
visual features into the backbone network progressively. Meanwhile, to further
explore the effectiveness of fine-grained text descriptions for point cloud
completion, we also build a text corpus with fine-grained descriptions, which
can provide richer geometric details for 3D shapes. The rich text descriptions
can be used for training and evaluating our network. Extensive quantitative and
qualitative experiments demonstrate the superior performance of our method
compared to state-of-the-art point cloud completion networks
SSPNet: Predicting visual saliency shifts
When images undergo quality degradation caused by editing, compression or transmission, their saliency tends to shift away from its original position. Saliency shifts indicate visual behaviour change and therefore contain vital information regarding perception of visual content and its distortions. Given a pristine image and its distorted format, we want to be able to detect saliency shifts induced by distortions. The resulting saliency shift map (SSM) can be used to identify the region and degree of visual distraction caused by distortions, and consequently to perceptually optimise image coding or enhancement algorithms. To this end, we first create a largest-of-its-kind eye-tracking database, comprising 60 pristine images and their associated 540 distorted formats viewed by 96 subjects. We then propose a computational model to predict the saliency shift map (SSM), utilising transformers and convolutional neural networks. Experimental results demonstrate that the proposed model is highly effective in detecting distortion-induced saliency shifts in natural images
Blind image quality assessment via adaptive graph attention
Recent advancements in blind image quality assessment (BIQA) are primarily propelled by deep learning technologies. While leveraging transformers can effectively capture long-range dependencies and contextual details in images, the significance of local information in image quality assessment can be undervalued. To address this challenging problem, we propose a novel feature enhancement framework tailored for BIQA. Specifically, we devise an Adaptive Graph Attention (AGA) module to simultaneously augment both local and contextual information. It not only refines the post-transformer features into an adaptive graph, facilitating local information enhancement, but also exploits interactions amongst diverse feature channels. The proposed technique can better reduce redundant information introduced during feature updates compared to traditional convolution layers, streamlining the self-updating process for feature maps. Experimental results show that our proposed model outperforms state-of-the-art BIQA models in predicting the perceived quality of images. The code of the model will be made publicly available
Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis
Synthesizing photo-realistic images based on text descriptions is a challenging image generation problem. Although many recent approaches have significantly advanced the performance of text-to-image generation, to guarantee semantic matchings between the text description and synthesized image remains very challenging. In this paper, we propose a new model, Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), to improve the semantic consistency between text description and synthesized image for a fine-grained text-to-image generation. Two new modules are proposed in CSM-GAN: Text Encoder Module (TEM) and Textual-Visual Semantic Matching Module (TVSMM). TVSMM is aimed at making the distance of the pairs of synthesized image and its corresponding text description closer, in global semantic embedding space, than those of mismatched pairs. This improves the semantic consistency and consequently, the generalizability of CSM-GAN. In TEM, we introduce Text Convolutional Neural Networks (Text_CNNs) to capture and highlight local visual features in textual descriptions. Thorough experiments on two public benchmark datasets demonstrated the superiority of CSM-GAN over other representative state-of-the-art methods