775 research outputs found
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
Recommended from our members
Visual perception of solid shape from occluding contours
The relative motion of object and observer induces a motion field in the observer's visual image that is smooth everywhere except along the object's occluding contours. Thus, occluding contours and smooth motion fields can be viewed as complementary and as separate sources of information about an object's shape. I studied how the human visual system perceives solid shape from the occluding contours of rotating objects and from the smooth motion field induced by moving planar surface patches.I propose a three-stage model for the perception of solid shape from the occluding contours of a rotating object. First, the object's motion is determined. I argue that this is only possible using points of correspondence and only when the object's axis of rotation is frontoparallel. In the second stage, the motion field along the contour is used to compute relative depth and surface curvature along the rim, the contour's pre-image. Third, local shape descriptors are propagated inside the figure to yield a global percept of solid shape. To determine which shape descriptors are computed by human subjects, I used a novel task in which subjects have to discriminate between flat ellipses and solid ellipsoids with varying thickness. I found that discriminability is proportional to the inverse of radial curvature but is not proportional to Gaussian or mean curvature. Certain slants of the axis of rotation decrease discriminability. Subjects who could discriminate ellipsoids and ellipses perceived the ellipsoids' angular velocity more veridically than did subjects who could not discriminate the two.Any smooth motion field can locally be described by divergence, curl, and deformation. If the motion field is induced by a rotating plane, the amount of deformation is proportional to the plane's slant and its angular velocity. Similarly, for translating planes, deformation is proportional to slant and image motion. Slant judgments of human observers were to a first-order approximation proportional to deformation per se, that is, observers do not take object motion into account. Recent psychophysical evidence suggests that human subjects need motion discontinuities for this. Thus, contours might be necessary to correctly perceive slant from smooth motion fields
Vision-based techniques for gait recognition
Global security concerns have raised a proliferation of video surveillance
devices. Intelligent surveillance systems seek to discover possible threats
automatically and raise alerts. Being able to identify the surveyed object can
help determine its threat level. The current generation of devices provide
digital video data to be analysed for time varying features to assist in the
identification process. Commonly, people queue up to access a facility and
approach a video camera in full frontal view. In this environment, a variety of
biometrics are available - for example, gait which includes temporal features
like stride period. Gait can be measured unobtrusively at a distance. The video
data will also include face features, which are short-range biometrics. In this
way, one can combine biometrics naturally using one set of data. In this paper
we survey current techniques of gait recognition and modelling with the
environment in which the research was conducted. We also discuss in detail the
issues arising from deriving gait data, such as perspective and occlusion
effects, together with the associated computer vision challenges of reliable
tracking of human movement. Then, after highlighting these issues and
challenges related to gait processing, we proceed to discuss the frameworks
combining gait with other biometrics. We then provide motivations for a novel
paradigm in biometrics-based human recognition, i.e. the use of the
fronto-normal view of gait as a far-range biometrics combined with biometrics
operating at a near distance
Registration using Graphics Processor Unit
Data point set registration is an important operation in coordinate metrology. Registration is the operation by which sampled point clouds are aligned with a CAD model by a 4X4 homogeneous transformation (e.g., rotation and translation). This alignment permits validation of the produced artifact\u27s geometry. State-of-the-art metrology systems are now capable of generating thousands, if not millions, of data points during an inspection operation, resulting in increased computational power to fully utilize these larger data sets. The registration process is an iterative nonlinear optimization operation having an execution time directly related to the number of points processed and CAD model complexity. The objective function to be minimized by this optimization is the sum of the square distances between each point in the point cloud and the closest surface in the CAD model. A brute force approach to registration, which is often used, is to compute the minimum distance between each point and each surface in the CAD model. As point cloud sizes and CAD model complexity increase, this approach becomes intractable and inefficient. Highly efficient numerical and analytical gradient based algorithms exist and their goal is to convergence to an optimal solution in minimum time. This thesis presents a new approach to efficiently perform the registration process by employing readily available computer hardware, the graphical processor unit (GPU). The data point set registration time for the GPU shows a significant improvement (around 15-20 times) over typical CPU performance. Efficient GPU programming decreases the complexity of the steps and improves the rate of convergence of the existing algorithms. The experimental setup reveals the exponential increasing nature of the CPU and the linear performance of the GPU in various aspects of an algorithm. The importance of CPU in the GPU programming is highlighted. The future implementations disclose the possible extensions of a GPU for higher order and complex coordinate metrology algorithms
Characterization of multiphase flows integrating X-ray imaging and virtual reality
Multiphase flows are used in a wide variety of industries, from energy production to pharmaceutical manufacturing. However, because of the complexity of the flows and difficulty measuring them, it is challenging to characterize the phenomena inside a multiphase flow. To help overcome this challenge, researchers have used numerous types of noninvasive measurement techniques to record the phenomena that occur inside the flow. One technique that has shown much success is X-ray imaging. While capable of high spatial resolutions, X-ray imaging generally has poor temporal resolution.
This research improves the characterization of multiphase flows in three ways. First, an X-ray image intensifier is modified to use a high-speed camera to push the temporal limits of what is possible with current tube source X-ray imaging technology. Using this system, sample flows were imaged at 1000 frames per second without a reduction in spatial resolution. Next, the sensitivity of X-ray computed tomography (CT) measurements to changes in acquisition parameters is analyzed. While in theory CT measurements should be stable over a range of acquisition parameters, previous research has indicated otherwise. The analysis of this sensitivity shows that, while raw CT values are strongly affected by changes to acquisition parameters, if proper calibration techniques are used, acquisition parameters do not significantly influence the results for multiphase flow imaging. Finally, two algorithms are analyzed for their suitability to reconstruct an approximate tomographic slice from only two X-ray projections. These algorithms increase the spatial error in the measurement, as compared to traditional CT; however, they allow for very high temporal resolutions for 3D imaging. The only limit on the speed of this measurement technique is the image intensifier-camera setup, which was shown to be capable of imaging at a rate of at least 1000 FPS.
While advances in measurement techniques for multiphase flows are one part of improving multiphase flow characterization, the challenge extends beyond measurement techniques. For improved measurement techniques to be useful, the data must be accessible to scientists in a way that maximizes the comprehension of the phenomena. To this end, this work also presents a system for using the Microsoft Kinect sensor to provide natural, non-contact interaction with multiphase flow data. Furthermore, this system is constructed so that it is trivial to add natural, non-contact interaction to immersive visualization applications. Therefore, multiple visualization applications can be built that are optimized to specific types of data, but all leverage the same natural interaction. Finally, the research is concluded by proposing a system that integrates the improved X-ray measurements, with the Kinect interaction system, and a CAVE automatic virtual environment (CAVE) to present scientists with the multiphase flow measurements in an intuitive and inherently three-dimensional manner
Optical flow estimation via steered-L1 norm
Global variational methods for estimating optical flow are among the best performing methods due to the subpixel accuracy and the ‘fill-in’ effect they provide. The fill-in effect allows optical flow displacements to be estimated even in low and untextured areas of the image. The estimation of such displacements are induced by the smoothness term. The L1 norm provides a robust regularisation term for the optical flow energy function with a very good performance for edge-preserving. However this norm suffers from several issues, among these is the isotropic nature of this norm which reduces the fill-in effect and eventually the accuracy of estimation in areas near motion boundaries. In this paper we propose an enhancement to the L1 norm that improves the fill-in effect for this smoothness term. In order to do this we analyse the structure tensor matrix and use its eigenvectors to steer the smoothness term into components that are ‘orthogonal to’ and ‘aligned with’ image structures. This is done in primal-dual formulation. Results show a reduced end-point error and improved accuracy compared to the conventional L1 norm
Optical flow estimation via steered-L1 norm
Global variational methods for estimating optical flow are among the best performing methods due to the subpixel accuracy and the ‘fill-in’ effect they provide. The fill-in effect allows optical flow displacements to be estimated even in low and untextured areas of the image. The estimation of such displacements are induced by the smoothness term. The L1 norm provides a robust regularisation term for the optical flow energy function with a very good performance for edge-preserving. However this norm suffers from several issues, among these is the isotropic nature of this norm which reduces the fill-in effect and eventually the accuracy of estimation in areas near motion boundaries. In this paper we propose an enhancement to the L1 norm that improves the fill-in effect for this smoothness term. In order to do this we analyse the structure tensor matrix and use its eigenvectors to steer the smoothness term into components that are ‘orthogonal to’ and ‘aligned with’ image structures. This is done in primal-dual formulation. Results show a reduced end-point error and improved accuracy compared to the conventional L1 norm
Cognitive Robots for Social Interactions
One of my goals is to work towards developing Cognitive Robots, especially with regard to improving the functionalities that facilitate the interaction with human beings and their surrounding objects. Any cognitive system designated for serving human beings must be capable of processing the social signals and eventually enable efficient prediction and planning of appropriate responses.
My main focus during my PhD study is to bridge the gap between the motoric space and the visual space. The discovery of the mirror neurons ([RC04]) shows that the visual perception of human motion (visual space) is directly associated to the motor control of the human body (motor space). This discovery poses a large number of challenges in different fields such as computer vision, robotics and neuroscience. One of the fundamental challenges is the understanding of the mapping between 2D visual space and 3D motoric control, and further developing building blocks (primitives) of human motion in the visual space as well as in the motor space.
First, I present my study on the visual-motoric mapping of human actions. This study aims at mapping human actions in 2D videos to 3D skeletal representation. Second, I present an automatic algorithm to decompose motion capture (MoCap) sequences into synergies along with the times at which they are executed (or "activated") for each joint. Third, I proposed to use the Granger Causality as a tool to study the coordinated actions performed by at least two units. Recent scientific studies suggest that the above "action mirroring circuit" might be tuned to action coordination rather than single action mirroring. Fourth, I present the extraction of key poses in visual space. These key poses facilitate the further study of the "action mirroring circuit". I conclude the dissertation by describing the future of cognitive robotics study
- …