67 research outputs found
Highlighted depth-of-field photography: Shining light on focus
We present a photographic method to enhance intensity differences between
objects at varying distances from the focal plane. By combining a unique
capture procedure with simple image processing techniques, the detected
brightness of an object is decreased proportional to its degree of defocus. A
camera-projector system casts distinct grid patterns onto a scene to generate
a spatial distribution of point reflections. These point reflections relay a
relative measure of defocus that is utilized in postprocessing to generate a
highlighted DOF photograph. Trade-offs between three different projectorprocessing
pairs are analyzed, and a model is developed to help describe a
new intensity-dependent depth of field that is controlled by the pattern of
illumination. Results are presented for a primary single snapshot design as
well as a scanning method and a comparison method. As an application,
automatic matting results are presented.Alfred P. Sloan Foundatio
Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation
Existing techniques to adapt semantic segmentation networks across the source
and target domains within deep convolutional neural networks (CNNs) deal with
all the samples from the two domains in a global or category-aware manner. They
do not consider an inter-class variation within the target domain itself or
estimated category, providing the limitation to encode the domains having a
multi-modal data distribution. To overcome this limitation, we introduce a
learnable clustering module, and a novel domain adaptation framework called
cross-domain grouping and alignment. To cluster the samples across domains with
an aim to maximize the domain alignment without forgetting precise segmentation
ability on the source domain, we present two loss functions, in particular, for
encouraging semantic consistency and orthogonality among the clusters. We also
present a loss so as to solve a class imbalance problem, which is the other
limitation of the previous methods. Our experiments show that our method
consistently boosts the adaptation performance in semantic segmentation,
outperforming the state-of-the-arts on various domain adaptation settings.Comment: AAAI 202
DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network
Unsupervised approaches for video anomaly detection may not perform as good
as supervised approaches. However, learning unknown types of anomalies using an
unsupervised approach is more practical than a supervised approach as
annotation is an extra burden. In this paper, we use isolation tree-based
unsupervised clustering to partition the deep feature space of the video
segments. The RGB- stream generates a pseudo anomaly score and the flow stream
generates a pseudo dynamicity score of a video segment. These scores are then
fused using a majority voting scheme to generate preliminary bags of positive
and negative segments. However, these bags may not be accurate as the scores
are generated only using the current segment which does not represent the
global behavior of a typical anomalous event. We then use a refinement strategy
based on a cross-branch feed-forward network designed using a popular I3D
network to refine both scores. The bags are then refined through a segment
re-mapping strategy. The intuition of adding the dynamicity score of a segment
with the anomaly score is to enhance the quality of the evidence. The method
has been evaluated on three popular video anomaly datasets, i.e., UCF-Crime,
CCTV-Fights, and UBI-Fights. Experimental results reveal that the proposed
framework achieves competitive accuracy as compared to the state-of-the-art
video anomaly detection methods.Comment: 10 pages, 8 figures, and 4 tables. (ACCEPTED AT WACV 2023
Person Re-identification in Videos by Analyzing Spatio-temporal Tubes
Typical person re-identification frameworks search for k best matches in a gallery of images that are often collected in varying conditions. The gallery usually contains image sequences for video re-identification applications. However, such a process is time consuming as video re-identification involves carrying out the matching process multiple times. In this paper, we propose a new method that extracts spatio-temporal frame sequences or tubes of moving persons and performs the re-identification in quick time. Initially, we apply a binary classifier to remove noisy images from the input query tube. In the next step, we use a key-pose detection-based query minimization technique. Finally, a hierarchical re-identification framework is proposed and used to rank the output tubes. Experiments with publicly available video re-identification datasets reveal that our framework is better than existing methods. It ranks the tubes with an average increase in the CMC accuracy of 6-8% across multiple datasets. Also, our method significantly reduces the number of false positives. A new video re-identification dataset, named Tube-based Re-identification Video Dataset (TRiViD), has been prepared with an aim to help the re-identification research community
MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation
We propose a scene-level inverse rendering framework that uses multi-view
images to decompose the scene into geometry, a SVBRDF, and 3D spatially-varying
lighting. Because multi-view images provide a variety of information about the
scene, multi-view images in object-level inverse rendering have been taken for
granted. However, owing to the absence of multi-view HDR synthetic dataset,
scene-level inverse rendering has mainly been studied using single-view image.
We were able to successfully perform scene-level inverse rendering using
multi-view images by expanding OpenRooms dataset and designing efficient
pipelines to handle multi-view images, and splitting spatially-varying
lighting. Our experiments show that the proposed method not only achieves
better performance than single-view-based methods, but also achieves robust
performance on unseen real-world scene. Also, our sophisticated 3D
spatially-varying lighting volume allows for photorealistic object insertion in
any 3D location.Comment: Accepted by CVPR 2023; Project Page is
https://bring728.github.io/mair.project
- …