1,051 research outputs found
Few-shot Neural Radiance Fields Under Unconstrained Illumination
In this paper, we introduce a new challenge for synthesizing novel view
images in practical environments with limited input multi-view images and
varying lighting conditions. Neural radiance fields (NeRF), one of the
pioneering works for this task, demand an extensive set of multi-view images
taken under constrained illumination, which is often unattainable in real-world
settings. While some previous works have managed to synthesize novel views
given images with different illumination, their performance still relies on a
substantial number of input multi-view images. To address this problem, we
suggest ExtremeNeRF, which utilizes multi-view albedo consistency, supported by
geometric alignment. Specifically, we extract intrinsic image components that
should be illumination-invariant across different views, enabling direct
appearance comparison between the input and novel view under unconstrained
illumination. We offer thorough experimental results for task evaluation,
employing the newly created NeRF Extreme benchmark-the first in-the-wild
benchmark for novel view synthesis under multiple viewing directions and
varying illuminations.Comment: Project Page: https://seokyeong94.github.io/ExtremeNeRF
DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network
Unsupervised approaches for video anomaly detection may not perform as good
as supervised approaches. However, learning unknown types of anomalies using an
unsupervised approach is more practical than a supervised approach as
annotation is an extra burden. In this paper, we use isolation tree-based
unsupervised clustering to partition the deep feature space of the video
segments. The RGB- stream generates a pseudo anomaly score and the flow stream
generates a pseudo dynamicity score of a video segment. These scores are then
fused using a majority voting scheme to generate preliminary bags of positive
and negative segments. However, these bags may not be accurate as the scores
are generated only using the current segment which does not represent the
global behavior of a typical anomalous event. We then use a refinement strategy
based on a cross-branch feed-forward network designed using a popular I3D
network to refine both scores. The bags are then refined through a segment
re-mapping strategy. The intuition of adding the dynamicity score of a segment
with the anomaly score is to enhance the quality of the evidence. The method
has been evaluated on three popular video anomaly datasets, i.e., UCF-Crime,
CCTV-Fights, and UBI-Fights. Experimental results reveal that the proposed
framework achieves competitive accuracy as compared to the state-of-the-art
video anomaly detection methods.Comment: 10 pages, 8 figures, and 4 tables. (ACCEPTED AT WACV 2023
Person Re-identification in Videos by Analyzing Spatio-temporal Tubes
Typical person re-identification frameworks search for k best matches in a gallery of images that are often collected in varying conditions. The gallery usually contains image sequences for video re-identification applications. However, such a process is time consuming as video re-identification involves carrying out the matching process multiple times. In this paper, we propose a new method that extracts spatio-temporal frame sequences or tubes of moving persons and performs the re-identification in quick time. Initially, we apply a binary classifier to remove noisy images from the input query tube. In the next step, we use a key-pose detection-based query minimization technique. Finally, a hierarchical re-identification framework is proposed and used to rank the output tubes. Experiments with publicly available video re-identification datasets reveal that our framework is better than existing methods. It ranks the tubes with an average increase in the CMC accuracy of 6-8% across multiple datasets. Also, our method significantly reduces the number of false positives. A new video re-identification dataset, named Tube-based Re-identification Video Dataset (TRiViD), has been prepared with an aim to help the re-identification research community
MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation
We propose a scene-level inverse rendering framework that uses multi-view
images to decompose the scene into geometry, a SVBRDF, and 3D spatially-varying
lighting. Because multi-view images provide a variety of information about the
scene, multi-view images in object-level inverse rendering have been taken for
granted. However, owing to the absence of multi-view HDR synthetic dataset,
scene-level inverse rendering has mainly been studied using single-view image.
We were able to successfully perform scene-level inverse rendering using
multi-view images by expanding OpenRooms dataset and designing efficient
pipelines to handle multi-view images, and splitting spatially-varying
lighting. Our experiments show that the proposed method not only achieves
better performance than single-view-based methods, but also achieves robust
performance on unseen real-world scene. Also, our sophisticated 3D
spatially-varying lighting volume allows for photorealistic object insertion in
any 3D location.Comment: Accepted by CVPR 2023; Project Page is
https://bring728.github.io/mair.project
Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation
Referring Image Segmentation (RIS) aims to segment target objects expressed
in natural language within a scene at the pixel level. Various recent RIS
models have achieved state-of-the-art performance by generating contextual
tokens to model multimodal features from pretrained encoders and effectively
fusing them using transformer-based cross-modal attention. While these methods
match language features with image features to effectively identify likely
target objects, they often struggle to correctly understand contextual
information in complex and ambiguous sentences and scenes. To address this
issue, we propose a novel bidirectional token-masking autoencoder (BTMAE)
inspired by the masked autoencoder (MAE). The proposed model learns the context
of image-to-language and language-to-image by reconstructing missing features
in both image and language features at the token level. In other words, this
approach involves mutually complementing across the features of images and
language, with a focus on enabling the network to understand interconnected
deep contextual information between the two modalities. This learning method
enhances the robustness of RIS performance in complex sentences and scenes. Our
BTMAE achieves state-of-the-art performance on three popular datasets, and we
demonstrate the effectiveness of the proposed method through various ablation
studies
The impact of baryonic physics and massive neutrinos on weak lensing peak statistics
We study the impact of baryonic processes and massive neutrinos on weak lensing peak statistics that can be used to constrain cosmological parameters. We use the BAHAMAS suite of cosmological simulations, which self-consistently include baryonic processes and the effect of massive neutrino free-streaming on the evolution of structure formation. We construct synthetic weak lensing catalogues by ray-tracing through light-cones, and use the aperture mass statistic for the analysis. The peaks detected on the maps reflect the cumulative signal from massive bound objects and general large-scale structure. We present the first study of weak lensing peaks in simulations that include both baryonic physics and massive neutrinos (summed neutrino mass 0.06, 0.12, 0.24, and 0.48 eV assuming normal hierarchy), so that the uncertainty due to physics beyond the gravity of dark matter can be factored into constraints on cosmological models. Assuming a fiducial model of baryonic physics, we also investigate the correlation between peaks and massive haloes, over a range of summed neutrino mass values. As higher neutrino mass tends to suppress the formation of massive structures in the Universe, the halo mass function and lensing peak counts are therefore modified as a function of . Over most of the S/N range, the impact of fiducial baryonic physics is greater (less) than neutrinos for 0.06 and 0.12 (0.24 and 0.48) eV models. Both baryonic physics and massive neutrinos should be accounted for when deriving cosmological parameters from weak lensing observations
On stable higher spin states in Heterotic String Theories
We study properties of 1/2 BPS Higher Spin states in heterotic
compactifications with extended supersymmetry. We also analyze non BPS Higher
Spin states and give explicit expressions for physical vertex operators of the
first two massive levels. We then study on-shell tri-linear couplings of these
Higher Spin states and confirm that BPS states with arbitrary spin cannot decay
into lower spin states in perturbation theory. Finally, we consider scattering
of vector bosons off higher spin BPS states and extract form factors and
polarization effects in various limits.Comment: 38 page
A compendium and functional characterization of mammalian genes involved in adaptation to Arctic or Antarctic environments
Many mammals are well adapted to surviving in extremely cold environments. These species have likely accumulated genetic changes that help them efficiently cope with low temperatures. It is not known whether the same genes related to cold adaptation in one species would be under selection in another species. The aims of this study therefore were: to create a compendium of mammalian genes related to adaptations to a low temperature environment; to identify genes related to cold tolerance that have been subjected to independent positive selection in several species; to determine promising candidate genes/pathways/organs for further empirical research on cold adaptation in mammals
- …