10 research outputs found

    Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

    Full text link
    Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.Comment: 15 pages, 9 figure

    Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss

    Full text link
    Contrastive Learning (CL) has achieved impressive performance in self-supervised learning tasks, showing superior generalization ability. Inspired by the success, adopting CL into collaborative filtering (CF) is prevailing in semi-supervised top-K recommendations. The basic idea is to routinely conduct heuristic-based data augmentation and apply contrastive losses (e.g., InfoNCE) on the augmented views. Yet, some CF-tailored challenges make this adoption suboptimal, such as the issue of out-of-distribution, the risk of false negatives, and the nature of top-K evaluation. They necessitate the CL-based CF scheme to focus more on mining hard negatives and distinguishing false negatives from the vast unlabeled user-item interactions, for informative contrast signals. Worse still, there is limited understanding of contrastive loss in CF methods, especially w.r.t. its generalization ability. To bridge the gap, we delve into the reasons underpinning the success of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss (AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods. AdvInfoNCE adaptively explores and assigns hardness to each negative instance in an adversarial fashion and further utilizes a fine-grained hardness-aware ranking criterion to empower the recommender's generalization ability. Training CF models with AdvInfoNCE, we validate the effectiveness of AdvInfoNCE on both synthetic and real-world benchmark datasets, thus showing its generalization ability to mitigate out-of-distribution problems. Given the theoretical guarantees and empirical superiority of AdvInfoNCE over most contrastive loss functions, we advocate its adoption as a standard loss in recommender systems, particularly for the out-of-distribution tasks. Codes are available at https://github.com/LehengTHU/AdvInfoNCE.Comment: Accepted to NeurIPS 202

    Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

    Full text link
    Recently, Vision Transformer has achieved great success in recovering missing details in low-resolution sequences, i.e., the video super-resolution (VSR) task. Despite its superiority in VSR accuracy, the heavy computational burden as well as the large memory footprint hinder the deployment of Transformer-based VSR models on constrained devices. In this paper, we address the above issue by proposing a novel feature-level masked processing framework: VSR with Masked Intra and inter frame Attention (MIA-VSR). The core of MIA-VSR is leveraging feature-level temporal continuity between adjacent frames to reduce redundant computations and make more rational use of previously enhanced SR features. Concretely, we propose an intra-frame and inter-frame attention block which takes the respective roles of past features and input features into consideration and only exploits previously enhanced features to provide supplementary information. In addition, an adaptive block-wise mask prediction module is developed to skip unimportant computations according to feature similarity between adjacent frames. We conduct detailed ablation studies to validate our contributions and compare the proposed method with recent state-of-the-art VSR approaches. The experimental results demonstrate that MIA-VSR improves the memory and computation efficiency over state-of-the-art methods, without trading off PSNR accuracy. The code is available at https://github.com/LabShuHangGU/MIA-VSR.Comment: Accepted by CVPR 202

    Design and Application of Intelligent Transportation Multi-Source Data Collaboration Framework Based on Digital Twins

    No full text
    The increasing urban traffic problems have made the transportation system require a large amount of data. Aiming at the current problems of data types redundancy and low coordination rate of intelligent transportation systems (ITS), this paper proposes an improved digital twin architecture applicable to ITS. Based on the improved digital twin architecture, a framework for dynamic and static data collaboration in ITS is constructed. For various collaboration methods, this paper specifically describes the collaboration methods and scopes, and designs the framework and interfaces for data mapping. Finally, the effectiveness of the framework is verified by case studies to mine the spatiotemporal distribution characteristics of data, capture human travel characteristics, and visualize intersections using digital twins. This paper provides a new data fusion idea for digital twin systems in ITS, and the framework covers all data types in digital twin systems for cross-integration analysis

    Design and Application of Intelligent Transportation Multi-Source Data Collaboration Framework Based on Digital Twins

    No full text
    The increasing urban traffic problems have made the transportation system require a large amount of data. Aiming at the current problems of data types redundancy and low coordination rate of intelligent transportation systems (ITS), this paper proposes an improved digital twin architecture applicable to ITS. Based on the improved digital twin architecture, a framework for dynamic and static data collaboration in ITS is constructed. For various collaboration methods, this paper specifically describes the collaboration methods and scopes, and designs the framework and interfaces for data mapping. Finally, the effectiveness of the framework is verified by case studies to mine the spatiotemporal distribution characteristics of data, capture human travel characteristics, and visualize intersections using digital twins. This paper provides a new data fusion idea for digital twin systems in ITS, and the framework covers all data types in digital twin systems for cross-integration analysis

    Experimental Study on the Comparison between Network Microstructure Titanium Matrix Composites and Ti6Al4V on EDM Milling

    No full text
    Network microstructure titanium matrix composites (NMTMCs), featuring Ti6Al4V as the matrix and network-distributed TiB whiskers (TiBw) as reinforcement, exhibit remarkable potential for diverse applications due to their superior physical properties. Due to the difficulty in machining titanium matrix composites, electrical discharge machining (EDM) stands as one of the preferred machining techniques for NMTMCs. Nevertheless, the compromised surface quality and the recast layer significantly impact the performance of the workpiece machined by EDM. Therefore, for the purpose of enhancing the surface quality and restraining the defects of NMTMCs, this study conducted comparative EDM milling experiments between NMTMCs and Ti6Al4V to analyze the effects of discharge capacitance, charging current, and pulse interval on the surface roughness, recast layer thickness, recast layer uniformity, and surface microcrack density of both materials. The results indicated that machining energy significantly influences workpiece surface quality. Furthermore, comparative experiments exploring the influence of network reinforcement on EDM milling revealed that NMTMCs have a higher melting point, leading to an accumulation phenomenon in low-energy machining where the reinforcement could not be completely removed. The residual reinforcement in the recasting layer had an adsorption effect on molten metal affecting the thermal conductivity and uniformity within the recasting layer. Finally, specific guidelines are put forward for optimizing the material’s surface roughness, recast layer thickness, and uniformity, along with minimizing microcrack density, which attain a processing effect that features a roughness of Ra 0.9 μm, an average recast layer thickness of 6 μm with a range of 8 μm, and a surface microcrack density of 0.08 μm−1

    Enhancement of Imaging Quality of Interferenceless Coded Aperture Correlation Holography Based on Physics-Informed Deep Learning

    No full text
    Interferenceless coded aperture correlation holography (I-COACH) was recently introduced for recording incoherent holograms without two-wave interference. In I-COACH, the light radiated from an object is modulated by a pseudo-randomly-coded phase mask and recorded as a hologram by a digital camera without interfering with any other beams. The image reconstruction is conducted by correlating the object hologram with the point spread hologram. However, the image reconstructed by the conventional correlation algorithm suffers from serious background noise, which leads to poor imaging quality. In this work, via an effective combination of the speckle correlation and neural network, we propose a high-quality reconstruction strategy based on physics-informed deep learning. Specifically, this method takes the autocorrelation of the speckle image as the input of the network, and switches from establishing a direct mapping between the object and the image into a mapping between the autocorrelations of the two. This method improves the interpretability of neural networks through prior physics knowledge, thereby remedying the data dependence and computational cost. In addition, once a final model is obtained, the image reconstruction can be completed by one camera exposure. Experimental results demonstrate that the background noise can be effectively suppressed, and the resolution of the reconstructed images can be enhanced by three times

    Flexible Image Reconstruction in the Orbital Angular Momentum Holography with Binarized Airy Lens

    No full text
    The orbital angular momentum (OAM) holography has been marked a path to achieving ultrahigh capacity holographic information systems. However, the practical applicability of the OAM holography is limited by the complicated optical setup and unadjustable image intensity and position. Here, a decoding method is proposed by using a binarized phase map derived from an autofocusing Airy beam. By adjusting the parameters of the phase map, the position and intensity distribution of the reconstructed image become flexibly adjustable. In addition, the cross-talk between different image channels can be effectively reduced thanks to the abruptly autofocusing capability of the Airy beams. As a result, the quality and practicability of the OAM holography can be greatly enhanced
    corecore