10 research outputs found
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Single Image Super-Resolution is a classic computer vision problem that
involves estimating high-resolution (HR) images from low-resolution (LR) ones.
Although deep neural networks (DNNs), especially Transformers for
super-resolution, have seen significant advancements in recent years,
challenges still remain, particularly in limited receptive field caused by
window-based self-attention. To address these issues, we introduce a group of
auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR
method. The introduced token dictionary could learn prior information from
training data and adapt the learned prior to specific testing image through an
adaptive refinement step. The refinement strategy could not only provide global
information to all input tokens but also group image tokens into categories.
Based on category partitions, we further propose a category-based
self-attention mechanism designed to leverage distant but similar tokens for
enhancing input features. The experimental results show that our method
achieves the best performance on various single image super-resolution
benchmarks.Comment: 15 pages, 9 figure
Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss
Contrastive Learning (CL) has achieved impressive performance in
self-supervised learning tasks, showing superior generalization ability.
Inspired by the success, adopting CL into collaborative filtering (CF) is
prevailing in semi-supervised top-K recommendations. The basic idea is to
routinely conduct heuristic-based data augmentation and apply contrastive
losses (e.g., InfoNCE) on the augmented views. Yet, some CF-tailored challenges
make this adoption suboptimal, such as the issue of out-of-distribution, the
risk of false negatives, and the nature of top-K evaluation. They necessitate
the CL-based CF scheme to focus more on mining hard negatives and
distinguishing false negatives from the vast unlabeled user-item interactions,
for informative contrast signals. Worse still, there is limited understanding
of contrastive loss in CF methods, especially w.r.t. its generalization
ability. To bridge the gap, we delve into the reasons underpinning the success
of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss
(AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods.
AdvInfoNCE adaptively explores and assigns hardness to each negative instance
in an adversarial fashion and further utilizes a fine-grained hardness-aware
ranking criterion to empower the recommender's generalization ability. Training
CF models with AdvInfoNCE, we validate the effectiveness of AdvInfoNCE on both
synthetic and real-world benchmark datasets, thus showing its generalization
ability to mitigate out-of-distribution problems. Given the theoretical
guarantees and empirical superiority of AdvInfoNCE over most contrastive loss
functions, we advocate its adoption as a standard loss in recommender systems,
particularly for the out-of-distribution tasks. Codes are available at
https://github.com/LehengTHU/AdvInfoNCE.Comment: Accepted to NeurIPS 202
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Recently, Vision Transformer has achieved great success in recovering missing
details in low-resolution sequences, i.e., the video super-resolution (VSR)
task. Despite its superiority in VSR accuracy, the heavy computational burden
as well as the large memory footprint hinder the deployment of
Transformer-based VSR models on constrained devices. In this paper, we address
the above issue by proposing a novel feature-level masked processing framework:
VSR with Masked Intra and inter frame Attention (MIA-VSR). The core of MIA-VSR
is leveraging feature-level temporal continuity between adjacent frames to
reduce redundant computations and make more rational use of previously enhanced
SR features. Concretely, we propose an intra-frame and inter-frame attention
block which takes the respective roles of past features and input features into
consideration and only exploits previously enhanced features to provide
supplementary information. In addition, an adaptive block-wise mask prediction
module is developed to skip unimportant computations according to feature
similarity between adjacent frames. We conduct detailed ablation studies to
validate our contributions and compare the proposed method with recent
state-of-the-art VSR approaches. The experimental results demonstrate that
MIA-VSR improves the memory and computation efficiency over state-of-the-art
methods, without trading off PSNR accuracy. The code is available at
https://github.com/LabShuHangGU/MIA-VSR.Comment: Accepted by CVPR 202
Design and Application of Intelligent Transportation Multi-Source Data Collaboration Framework Based on Digital Twins
The increasing urban traffic problems have made the transportation system require a large amount of data. Aiming at the current problems of data types redundancy and low coordination rate of intelligent transportation systems (ITS), this paper proposes an improved digital twin architecture applicable to ITS. Based on the improved digital twin architecture, a framework for dynamic and static data collaboration in ITS is constructed. For various collaboration methods, this paper specifically describes the collaboration methods and scopes, and designs the framework and interfaces for data mapping. Finally, the effectiveness of the framework is verified by case studies to mine the spatiotemporal distribution characteristics of data, capture human travel characteristics, and visualize intersections using digital twins. This paper provides a new data fusion idea for digital twin systems in ITS, and the framework covers all data types in digital twin systems for cross-integration analysis
Design and Application of Intelligent Transportation Multi-Source Data Collaboration Framework Based on Digital Twins
The increasing urban traffic problems have made the transportation system require a large amount of data. Aiming at the current problems of data types redundancy and low coordination rate of intelligent transportation systems (ITS), this paper proposes an improved digital twin architecture applicable to ITS. Based on the improved digital twin architecture, a framework for dynamic and static data collaboration in ITS is constructed. For various collaboration methods, this paper specifically describes the collaboration methods and scopes, and designs the framework and interfaces for data mapping. Finally, the effectiveness of the framework is verified by case studies to mine the spatiotemporal distribution characteristics of data, capture human travel characteristics, and visualize intersections using digital twins. This paper provides a new data fusion idea for digital twin systems in ITS, and the framework covers all data types in digital twin systems for cross-integration analysis
Experimental Study on the Comparison between Network Microstructure Titanium Matrix Composites and Ti6Al4V on EDM Milling
Network microstructure titanium matrix composites (NMTMCs), featuring Ti6Al4V as the matrix and network-distributed TiB whiskers (TiBw) as reinforcement, exhibit remarkable potential for diverse applications due to their superior physical properties. Due to the difficulty in machining titanium matrix composites, electrical discharge machining (EDM) stands as one of the preferred machining techniques for NMTMCs. Nevertheless, the compromised surface quality and the recast layer significantly impact the performance of the workpiece machined by EDM. Therefore, for the purpose of enhancing the surface quality and restraining the defects of NMTMCs, this study conducted comparative EDM milling experiments between NMTMCs and Ti6Al4V to analyze the effects of discharge capacitance, charging current, and pulse interval on the surface roughness, recast layer thickness, recast layer uniformity, and surface microcrack density of both materials. The results indicated that machining energy significantly influences workpiece surface quality. Furthermore, comparative experiments exploring the influence of network reinforcement on EDM milling revealed that NMTMCs have a higher melting point, leading to an accumulation phenomenon in low-energy machining where the reinforcement could not be completely removed. The residual reinforcement in the recasting layer had an adsorption effect on molten metal affecting the thermal conductivity and uniformity within the recasting layer. Finally, specific guidelines are put forward for optimizing the material’s surface roughness, recast layer thickness, and uniformity, along with minimizing microcrack density, which attain a processing effect that features a roughness of Ra 0.9 μm, an average recast layer thickness of 6 μm with a range of 8 μm, and a surface microcrack density of 0.08 μm−1
Enhancement of Imaging Quality of Interferenceless Coded Aperture Correlation Holography Based on Physics-Informed Deep Learning
Interferenceless coded aperture correlation holography (I-COACH) was recently introduced for recording incoherent holograms without two-wave interference. In I-COACH, the light radiated from an object is modulated by a pseudo-randomly-coded phase mask and recorded as a hologram by a digital camera without interfering with any other beams. The image reconstruction is conducted by correlating the object hologram with the point spread hologram. However, the image reconstructed by the conventional correlation algorithm suffers from serious background noise, which leads to poor imaging quality. In this work, via an effective combination of the speckle correlation and neural network, we propose a high-quality reconstruction strategy based on physics-informed deep learning. Specifically, this method takes the autocorrelation of the speckle image as the input of the network, and switches from establishing a direct mapping between the object and the image into a mapping between the autocorrelations of the two. This method improves the interpretability of neural networks through prior physics knowledge, thereby remedying the data dependence and computational cost. In addition, once a final model is obtained, the image reconstruction can be completed by one camera exposure. Experimental results demonstrate that the background noise can be effectively suppressed, and the resolution of the reconstructed images can be enhanced by three times
Flexible Image Reconstruction in the Orbital Angular Momentum Holography with Binarized Airy Lens
The orbital angular momentum (OAM) holography has been marked a path to achieving ultrahigh capacity holographic information systems. However, the practical applicability of the OAM holography is limited by the complicated optical setup and unadjustable image intensity and position. Here, a decoding method is proposed by using a binarized phase map derived from an autofocusing Airy beam. By adjusting the parameters of the phase map, the position and intensity distribution of the reconstructed image become flexibly adjustable. In addition, the cross-talk between different image channels can be effectively reduced thanks to the abruptly autofocusing capability of the Airy beams. As a result, the quality and practicability of the OAM holography can be greatly enhanced