1,051 research outputs found

    Few-shot Neural Radiance Fields Under Unconstrained Illumination

    Full text link
    In this paper, we introduce a new challenge for synthesizing novel view images in practical environments with limited input multi-view images and varying lighting conditions. Neural radiance fields (NeRF), one of the pioneering works for this task, demand an extensive set of multi-view images taken under constrained illumination, which is often unattainable in real-world settings. While some previous works have managed to synthesize novel views given images with different illumination, their performance still relies on a substantial number of input multi-view images. To address this problem, we suggest ExtremeNeRF, which utilizes multi-view albedo consistency, supported by geometric alignment. Specifically, we extract intrinsic image components that should be illumination-invariant across different views, enabling direct appearance comparison between the input and novel view under unconstrained illumination. We offer thorough experimental results for task evaluation, employing the newly created NeRF Extreme benchmark-the first in-the-wild benchmark for novel view synthesis under multiple viewing directions and varying illuminations.Comment: Project Page: https://seokyeong94.github.io/ExtremeNeRF

    DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network

    Full text link
    Unsupervised approaches for video anomaly detection may not perform as good as supervised approaches. However, learning unknown types of anomalies using an unsupervised approach is more practical than a supervised approach as annotation is an extra burden. In this paper, we use isolation tree-based unsupervised clustering to partition the deep feature space of the video segments. The RGB- stream generates a pseudo anomaly score and the flow stream generates a pseudo dynamicity score of a video segment. These scores are then fused using a majority voting scheme to generate preliminary bags of positive and negative segments. However, these bags may not be accurate as the scores are generated only using the current segment which does not represent the global behavior of a typical anomalous event. We then use a refinement strategy based on a cross-branch feed-forward network designed using a popular I3D network to refine both scores. The bags are then refined through a segment re-mapping strategy. The intuition of adding the dynamicity score of a segment with the anomaly score is to enhance the quality of the evidence. The method has been evaluated on three popular video anomaly datasets, i.e., UCF-Crime, CCTV-Fights, and UBI-Fights. Experimental results reveal that the proposed framework achieves competitive accuracy as compared to the state-of-the-art video anomaly detection methods.Comment: 10 pages, 8 figures, and 4 tables. (ACCEPTED AT WACV 2023

    Person Re-identification in Videos by Analyzing Spatio-temporal Tubes

    Get PDF
    Typical person re-identification frameworks search for k best matches in a gallery of images that are often collected in varying conditions. The gallery usually contains image sequences for video re-identification applications. However, such a process is time consuming as video re-identification involves carrying out the matching process multiple times. In this paper, we propose a new method that extracts spatio-temporal frame sequences or tubes of moving persons and performs the re-identification in quick time. Initially, we apply a binary classifier to remove noisy images from the input query tube. In the next step, we use a key-pose detection-based query minimization technique. Finally, a hierarchical re-identification framework is proposed and used to rank the output tubes. Experiments with publicly available video re-identification datasets reveal that our framework is better than existing methods. It ranks the tubes with an average increase in the CMC accuracy of 6-8% across multiple datasets. Also, our method significantly reduces the number of false positives. A new video re-identification dataset, named Tube-based Re-identification Video Dataset (TRiViD), has been prepared with an aim to help the re-identification research community

    MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation

    Full text link
    We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, a SVBRDF, and 3D spatially-varying lighting. Because multi-view images provide a variety of information about the scene, multi-view images in object-level inverse rendering have been taken for granted. However, owing to the absence of multi-view HDR synthetic dataset, scene-level inverse rendering has mainly been studied using single-view image. We were able to successfully perform scene-level inverse rendering using multi-view images by expanding OpenRooms dataset and designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Our experiments show that the proposed method not only achieves better performance than single-view-based methods, but also achieves robust performance on unseen real-world scene. Also, our sophisticated 3D spatially-varying lighting volume allows for photorealistic object insertion in any 3D location.Comment: Accepted by CVPR 2023; Project Page is https://bring728.github.io/mair.project

    Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation

    Full text link
    Referring Image Segmentation (RIS) aims to segment target objects expressed in natural language within a scene at the pixel level. Various recent RIS models have achieved state-of-the-art performance by generating contextual tokens to model multimodal features from pretrained encoders and effectively fusing them using transformer-based cross-modal attention. While these methods match language features with image features to effectively identify likely target objects, they often struggle to correctly understand contextual information in complex and ambiguous sentences and scenes. To address this issue, we propose a novel bidirectional token-masking autoencoder (BTMAE) inspired by the masked autoencoder (MAE). The proposed model learns the context of image-to-language and language-to-image by reconstructing missing features in both image and language features at the token level. In other words, this approach involves mutually complementing across the features of images and language, with a focus on enabling the network to understand interconnected deep contextual information between the two modalities. This learning method enhances the robustness of RIS performance in complex sentences and scenes. Our BTMAE achieves state-of-the-art performance on three popular datasets, and we demonstrate the effectiveness of the proposed method through various ablation studies

    The impact of baryonic physics and massive neutrinos on weak lensing peak statistics

    Get PDF
    We study the impact of baryonic processes and massive neutrinos on weak lensing peak statistics that can be used to constrain cosmological parameters. We use the BAHAMAS suite of cosmological simulations, which self-consistently include baryonic processes and the effect of massive neutrino free-streaming on the evolution of structure formation. We construct synthetic weak lensing catalogues by ray-tracing through light-cones, and use the aperture mass statistic for the analysis. The peaks detected on the maps reflect the cumulative signal from massive bound objects and general large-scale structure. We present the first study of weak lensing peaks in simulations that include both baryonic physics and massive neutrinos (summed neutrino mass Mν=M_{\nu} = 0.06, 0.12, 0.24, and 0.48 eV assuming normal hierarchy), so that the uncertainty due to physics beyond the gravity of dark matter can be factored into constraints on cosmological models. Assuming a fiducial model of baryonic physics, we also investigate the correlation between peaks and massive haloes, over a range of summed neutrino mass values. As higher neutrino mass tends to suppress the formation of massive structures in the Universe, the halo mass function and lensing peak counts are therefore modified as a function of MνM_{\nu}. Over most of the S/N range, the impact of fiducial baryonic physics is greater (less) than neutrinos for 0.06 and 0.12 (0.24 and 0.48) eV models. Both baryonic physics and massive neutrinos should be accounted for when deriving cosmological parameters from weak lensing observations

    On stable higher spin states in Heterotic String Theories

    Full text link
    We study properties of 1/2 BPS Higher Spin states in heterotic compactifications with extended supersymmetry. We also analyze non BPS Higher Spin states and give explicit expressions for physical vertex operators of the first two massive levels. We then study on-shell tri-linear couplings of these Higher Spin states and confirm that BPS states with arbitrary spin cannot decay into lower spin states in perturbation theory. Finally, we consider scattering of vector bosons off higher spin BPS states and extract form factors and polarization effects in various limits.Comment: 38 page

    A compendium and functional characterization of mammalian genes involved in adaptation to Arctic or Antarctic environments

    Get PDF
    Many mammals are well adapted to surviving in extremely cold environments. These species have likely accumulated genetic changes that help them efficiently cope with low temperatures. It is not known whether the same genes related to cold adaptation in one species would be under selection in another species. The aims of this study therefore were: to create a compendium of mammalian genes related to adaptations to a low temperature environment; to identify genes related to cold tolerance that have been subjected to independent positive selection in several species; to determine promising candidate genes/pathways/organs for further empirical research on cold adaptation in mammals
    • …
    corecore