100 research outputs found

    Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

    Full text link
    Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.Comment: 17 pages, 6 figures, conferenc

    Learning to Improve Image Compression without Changing the Standard Decoder

    Full text link
    In recent years we have witnessed an increasing interest in applying Deep Neural Networks (DNNs) to improve the rate-distortion performance in image compression. However, the existing approaches either train a post-processing DNN on the decoder side, or propose learning for image compression in an end-to-end manner. This way, the trained DNNs are required in the decoder, leading to the incompatibility to the standard image decoders (e.g., JPEG) in personal computers and mobiles. Therefore, we propose learning to improve the encoding performance with the standard decoder. In this paper, We work on JPEG as an example. Specifically, a frequency-domain pre-editing method is proposed to optimize the distribution of DCT coefficients, aiming at facilitating the JPEG compression. Moreover, we propose learning the JPEG quantization table jointly with the pre-editing network. Most importantly, we do not modify the JPEG decoder and therefore our approach is applicable when viewing images with the widely used standard JPEG decoder. The experiments validate that our approach successfully improves the rate-distortion performance of JPEG in terms of various quality metrics, such as PSNR, MS-SSIM and LPIPS. Visually, this translates to better overall color retention especially when strong compression is applied. The codes are available at https://github.com/YannickStruempler/LearnedJPEG.Comment: Accepted to ECCV AIM Worksho

    Audio-Visual Egocentric Action Recognition

    Get PDF

    Image Synthesis under Limited Data: A Survey and Taxonomy

    Full text link
    Deep generative models, which target reproducing the given data distribution to produce novel samples, have made unprecedented advancements in recent years. Their technical breakthroughs have enabled unparalleled quality in the synthesis of visual content. However, one critical prerequisite for their tremendous success is the availability of a sufficient number of training samples, which requires massive computation resources. When trained on limited data, generative models tend to suffer from severe performance deterioration due to overfitting and memorization. Accordingly, researchers have devoted considerable attention to develop novel models that are capable of generating plausible and diverse images from limited training data recently. Despite numerous efforts to enhance training stability and synthesis quality in the limited data scenarios, there is a lack of a systematic survey that provides 1) a clear problem definition, critical challenges, and taxonomy of various tasks; 2) an in-depth analysis on the pros, cons, and remain limitations of existing literature; as well as 3) a thorough discussion on the potential applications and future directions in the field of image synthesis under limited data. In order to fill this gap and provide a informative introduction to researchers who are new to this topic, this survey offers a comprehensive review and a novel taxonomy on the development of image synthesis under limited data. In particular, it covers the problem definition, requirements, main solutions, popular benchmarks, and remain challenges in a comprehensive and all-around manner.Comment: 230 references, 25 pages. GitHub: https://github.com/kobeshegu/awesome-few-shot-generatio
    corecore