285 research outputs found
Reflectance Hashing for Material Recognition
We introduce a novel method for using reflectance to identify materials.
Reflectance offers a unique signature of the material but is challenging to
measure and use for recognizing materials due to its high-dimensionality. In
this work, one-shot reflectance is captured using a unique optical camera
measuring {\it reflectance disks} where the pixel coordinates correspond to
surface viewing angles. The reflectance has class-specific stucture and angular
gradients computed in this reflectance space reveal the material class.
These reflectance disks encode discriminative information for efficient and
accurate material recognition. We introduce a framework called reflectance
hashing that models the reflectance disks with dictionary learning and binary
hashing. We demonstrate the effectiveness of reflectance hashing for material
recognition with a number of real-world materials
Tracking with Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes
Tracking individuals in extremely crowded scenes is a challenging task, primarily due to the motion and appearance variability produced by the large number of people within the scene. The individual pedestrians, however, collectively form a crowd that exhibits a spatially and temporally structured pattern within the scene. In this paper, we extract this steady-state but dynamically evolving motion of the crowd and leverage it to track individuals in videos of the same scene. We capture the spatial and temporal variations in the crowd’s motion by training a collection of hidden Markov models on the motion patterns within the scene. Using these models, we predict the local spatio-temporal motion patterns that describe the pedestrian movement at each space-time location in the video. Based on these predictions, we hypothesize the target’s movement between frames as it travels through the local space-time volume. In addition, we robustly model the individual’s unique motion and appearance to discern them from surrounding pedestrians. The results show that we may track individuals in scenes that present extreme difficulty to previous techniques. 1
DeepShaRM: Multi-View Shape and Reflectance Map Recovery Under Unknown Lighting
Geometry reconstruction of textureless, non-Lambertian objects under unknown
natural illumination (i.e., in the wild) remains challenging as correspondences
cannot be established and the reflectance cannot be expressed in simple
analytical forms. We derive a novel multi-view method, DeepShaRM, that achieves
state-of-the-art accuracy on this challenging task. Unlike past methods that
formulate this as inverse-rendering, i.e., estimation of reflectance,
illumination, and geometry from images, our key idea is to realize that
reflectance and illumination need not be disentangled and instead estimated as
a compound reflectance map. We introduce a novel deep reflectance map
estimation network that recovers the camera-view reflectance maps from the
surface normals of the current geometry estimate and the input multi-view
images. The network also explicitly estimates per-pixel confidence scores to
handle global light transport effects. A deep shape-from-shading network then
updates the geometry estimate expressed with a signed distance function using
the recovered reflectance maps. By alternating between these two, and, most
important, by bypassing the ill-posed problem of reflectance and illumination
decomposition, the method accurately recovers object geometry in these
challenging settings. Extensive experiments on both synthetic and real-world
data clearly demonstrate its state-of-the-art accuracy.Comment: 3DV 202
Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
Computer vision has long relied on two kinds of correspondences: pixel
correspondences in images and 3D correspondences on object surfaces. Is there
another kind, and if there is, what can they do for us? In this paper, we
introduce correspondences of the third kind we call reflection correspondences
and show that they can help estimate camera pose by just looking at objects
without relying on the background. Reflection correspondences are point
correspondences in the reflected world, i.e., the scene reflected by the object
surface. The object geometry and reflectance alters the scene geometrically and
radiometrically, respectively, causing incorrect pixel correspondences.
Geometry recovered from each image is also hampered by distortions, namely
generalized bas-relief ambiguity, leading to erroneous 3D correspondences. We
show that reflection correspondences can resolve the ambiguities arising from
these distortions. We introduce a neural correspondence estimator and a RANSAC
algorithm that fully leverages all three kinds of correspondences for robust
and accurate joint camera pose and object shape estimation just from the object
appearance. The method expands the horizon of numerous downstream tasks,
including camera pose estimation for appearance modeling (e.g., NeRF) and
motion estimation of reflective objects (e.g., cars on the road), to name a
few, as it relieves the requirement of overlapping background
DeePoint: Pointing Recognition and Direction Estimation From A Fixed View
In this paper, we realize automatic visual recognition and direction
estimation of pointing. We introduce the first neural pointing understanding
method based on two key contributions. The first is the introduction of a
first-of-its-kind large-scale dataset for pointing recognition and direction
estimation, which we refer to as the DP Dataset. DP Dataset consists of more
than 2 million frames of over 33 people pointing in various styles annotated
for each frame with pointing timings and 3D directions. The second is DeePoint,
a novel deep network model for joint recognition and 3D direction estimation of
pointing. DeePoint is a Transformer-based network which fully leverages the
spatio-temporal coordination of the body parts, not just the hands. Through
extensive experiments, we demonstrate the accuracy and efficiency of DeePoint.
We believe DP Dataset and DeePoint will serve as a sound foundation for visual
human intention understanding
- …