593 research outputs found

    GAGA: Deciphering Age-path of Generalized Self-paced Regularizer

    Full text link
    Nowadays self-paced learning (SPL) is an important machine learning paradigm that mimics the cognitive process of humans and animals. The SPL regime involves a self-paced regularizer and a gradually increasing age parameter, which plays a key role in SPL but where to optimally terminate this process is still non-trivial to determine. A natural idea is to compute the solution path w.r.t. age parameter (i.e., age-path). However, current age-path algorithms are either limited to the simplest regularizer, or lack solid theoretical understanding as well as computational efficiency. To address this challenge, we propose a novel \underline{G}eneralized \underline{Ag}e-path \underline{A}lgorithm (GAGA) for SPL with various self-paced regularizers based on ordinary differential equations (ODEs) and sets control, which can learn the entire solution spectrum w.r.t. a range of age parameters. To the best of our knowledge, GAGA is the first exact path-following algorithm tackling the age-path for general self-paced regularizer. Finally the algorithmic steps of classic SVM and Lasso are described in detail. We demonstrate the performance of GAGA on real-world datasets, and find considerable speedup between our algorithm and competing baselines.Comment: 33 pages. Published as a conference paper at NeurIPS 202

    SGE: Structured Light System Based on Gray Code with an Event Camera

    Full text link
    Fast and accurate depth sensing has long been a significant research challenge. Event camera, as a device that quickly responds to intensity changes, provides a new solution for structured light (SL) systems. In this paper, we introduce Gray code into event-based SL systems for the first time. Our setup includes an event camera and Digital Light Processing (DLP) projector, enabling depth estimation through high-speed projection and decoding of Gray code patterns. By employing spatio-temporal encoding for point matching, our method is immune to timestamp noise, realizing high-speed depth estimation without loss of accuracy. The binary nature of events and Gray code minimizes data redundancy, enabling us to fully utilize sensor bandwidth at 100%. Experimental results show that our approach achieves accuracy comparable to state-of-the-art scanning methods while surpassing them in data acquisition speed (up to 41 times improvement) without sacrificing accuracy. Our proposed approach offers a highly promising solution for ultra-fast, real-time, and high-precision dense depth estimation. Code and dataset will be publicly available

    Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

    Full text link
    Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.Comment: 15 pages, 9 figure

    Post-Quantum κ\kappa-to-1 Trapdoor Claw-free Functions from Extrapolated Dihedral Cosets

    Full text link
    \emph{Noisy trapdoor claw-free function} (NTCF) as a powerful post-quantum cryptographic tool can efficiently constrain actions of untrusted quantum devices. However, the original NTCF is essentially \emph{2-to-1} one-way function (NTCF21^1_2). In this work, we attempt to further extend the NTCF21^1_2 to achieve \emph{many-to-one} trapdoor claw-free functions with polynomial bounded preimage size. Specifically, we focus on a significant extrapolation of NTCF21^1_2 by drawing on extrapolated dihedral cosets, thereby giving a model of NTCFκ1^1_{\kappa} where κ\kappa is a polynomial integer. Then, we present an efficient construction of NTCFκ1^1_{\kappa} assuming \emph{quantum hardness of the learning with errors (LWE)} problem. We point out that NTCF can be used to bridge the LWE and the dihedral coset problem (DCP). By leveraging NTCF21^1_2 (resp. NTCFκ1^1_{\kappa}), our work reveals a new quantum reduction path from the LWE problem to the DCP (resp. extrapolated DCP). Finally, we demonstrate the NTCFκ1^1_{\kappa} can naturally be reduced to the NTCF21^1_2, thereby achieving the same application for proving the quantumness.Comment: 34 pages, 7 figure

    AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention

    Full text link
    Multi-modal medical image fusion is essential for the precise clinical diagnosis and surgical navigation since it can merge the complementary information in multi-modalities into a single image. The quality of the fused image depends on the extracted single modality features as well as the fusion rules for multi-modal information. Existing deep learning-based fusion methods can fully exploit the semantic features of each modality, they cannot distinguish the effective low and high frequency information of each modality and fuse them adaptively. To address this issue, we propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism based on Fourier transform. Specifically, we propose the cross-attention fusion (CAF) block, which adaptively fuses features of two modalities in the spatial and frequency domains by exchanging key and query values, and then calculates the cross-attention scores between the spatial and frequency features to further guide the spatial-frequential information fusion. The CAF block enhances the high-frequency features of the different modalities so that the details in the fused images can be retained. Moreover, we design a novel loss function composed of structure loss and content loss to preserve both low and high frequency information. Extensive comparison experiments on several datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics. The ablation experiments also validate the effectiveness of the proposed loss and fusion strategy
    • …
    corecore