2,671 research outputs found

    Study Of Human Activity In Video Data With An Emphasis On View-invariance

    Get PDF
    The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations, and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, this means that in order to define a new action, the user has to record a number of videos from different viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is broken down into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point iii planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on “fundamental ratios.” We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based on the concept of “Projective Depth”. Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using “projective depth:” (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the “projective depth” of the body points with respect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person - We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of body points, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants. i

    Metric 3D-reconstruction from Unordered and Uncalibrated Image Collections

    Get PDF
    In this thesis the problem of Structure from Motion (SfM) for uncalibrated and unordered image collections is considered. The proposed framework is an adaptation of the framework for calibrated SfM proposed by Olsson-Enqvist (2011) to the uncalibrated case. Olsson-Enqvist's framework consists of three main steps; pairwise relative rotation estimation, rotation averaging, and geometry estimation with known rotations. For this to work with uncalibrated images we also perform auto-calibration during the first step. There is a well-known degeneracy for pairwise auto-calibration which occurs when the two principal axes meet in a point. This is unfortunately common for real images. To mitigate this the rotation estimation is instead performed by estimating image triplets. For image triplets the degenerate congurations are less likely to occur in practice. This is followed by estimation of the pairs which did not get a successful relative rotation from the previous step. The framework is successfully applied to an uncalibrated and unordered collection of images of the cathedral in Lund. It is also applied to the well-known Oxford dinosaur sequence which consists of turntable motion. Image pairs from the turntable motion are in a degenerate conguration for auto-calibration since they both view the same point on the rotation axis

    Estimating Epipolar Geometry With The Use of a Camera Mounted Orientation Sensor

    Get PDF
    Context: Image processing and computer vision are rapidly becoming more and more commonplace, and the amount of information about a scene, such as 3D geometry, that can be obtained from an image, or multiple images of the scene is steadily increasing due to increasing resolutions and availability of imaging sensors, and an active research community. In parallel, advances in hardware design and manufacturing are allowing for devices such as gyroscopes, accelerometers and magnetometers and GPS receivers to be included alongside imaging devices at a consumer level. Aims: This work aims to investigate the use of orientation sensors in the field of computer vision as sources of data to aid with image processing and the determination of a scene’s geometry, in particular, the epipolar geometry of a pair of images - and devises a hybrid methodology from two sets of previous works in order to exploit the information available from orientation sensors alongside data gathered from image processing techniques. Method: A readily available consumer-level orientation sensor was used alongside a digital camera to capture images of a set of scenes and record the orientation of the camera. The fundamental matrix of these pairs of images was calculated using a variety of techniques - both incorporating data from the orientation sensor and excluding its use Results: Some methodologies could not produce an acceptable result for the Fundamental Matrix on certain image pairs, however, a method described in the literature that used an orientation sensor always produced a result - however in cases where the hybrid or purely computer vision methods also produced a result - this was found to be the least accurate. Conclusion: Results from this work show that the use of an orientation sensor to capture information alongside an imaging device can be used to improve both the accuracy and reliability of calculations of the scene’s geometry - however noise from the orientation sensor can limit this accuracy and further research would be needed to determine the magnitude of this problem and methods of mitigation

    Towards automated capture of 3D foot geometry for custom orthoses

    Get PDF
    This thesis presents a novel method of capturing 3D foot geometry from images for custom shoe insole manufacture. Orthopedic footwear plays an important role as a treatment and prevention of foot conditions associated with diabetes. Through the use of customized shoe insoles, a podiatrist can provide a means to better distribute the pressure around the foot, and can also correct the biomechanics of the foot. Different foot scanners are used to obtain the geometric plantar surface of foot, but are expensive and more generic in nature. The focus of this thesis is to build 3D foot structure from a pair of calibrated images. The process begins with considering a pair of good images of the foot, obtained from the scanner utility frame. The next step involves identifying corners or features in the images. Correlation between the selected features forms the fundamental part of epipolar analysis. Rigorous techniques are implemented for robust feature matching. A 3D point cloud is then obtained by applying the 8-point algorithm and linear 3D triangulation method. The advantage of this system is quick capture of foot geometry and minimal intervention from the user. A reconstructed 3D point cloud of foot is presented to verify this method as inexpensive and more suited to the needs of the podiatrist

    Video Stabilization Algorithm from Low Frame Rate Video for Hyperlapse Applications

    Get PDF
    There are several methods that one can use to visualize image sequences. One such method, called timelapse, is based on synthesizing a video from the image sequence. One sub category of timelapses is the so-called hyperlapse, which is defined as a timelapse with a camera movement over great space. A problem with combining camera movement with speeding up the frame rate per second is that camera shakes appear magnified. One way to minimize this problem is to stabilize the video, using estimated relative camera movement. Such estimates can be obtained using computer vision methods based on epipolar geometry. Choosing how to compensate for camera shakes and calculate a new, more smooth camera path is essential to the video stabilization algorithm. One aim of this thesis is to create such a video stabilization algorithm. Another aim is to examine how performance degrades with decreased frame rate for the input sequence. Along with this thesis we have collected a set of benchmark image sequences. Several different video stabilization algorithms have been developed in the project. These have all been tested on the benchmark data sets and evaluated with promising results.I dagens samhälle är vi alltmer ivriga att dokumentera och dela våra upplevelser och vår vardag med andra genom sociala medier. Ett nytt sätt att göra detta har utvecklats av Narrative som med sin smidiga kamera, vilken kan fästas på dina kläder, erbjuder dig ett verktyg att dokumentera händelser utan att du behöver anstränga dig. Men om man vill presentera bilderna som en video, går det? Det är frågan som har legat bakom vårt examensarbete

    Analysis and Exploitation of Automatically Generated Scene Structure from Aerial Imagery

    Get PDF
    The recent advancements made in the field of computer vision, along with the ever increasing rate of computational power has opened up opportunities in the field of automated photogrammetry. Many researchers have focused on using these powerful computer vision algorithms to extract three-dimensional point clouds of scenes from multi-view imagery, with the ultimate goal of creating a photo-realistic scene model. However, geographically accurate three-dimensional scene models have the potential to be exploited for much more than just visualization. This work looks at utilizing automatically generated scene structure from near-nadir aerial imagery to identify and classify objects within the structure, through the analysis of spatial-spectral information. The limitation to this type of imagery is imposed due to the common availability of this type of aerial imagery. Popular third-party computer-vision algorithms are used to generate the scene structure. A voxel-based approach for surface estimation is developed using Manhattan-world assumptions. A surface estimation confidence metric is also presented. This approach provides the basis for further analysis of surface materials, incorporating spectral information. Two cases of spectral analysis are examined: when additional hyperspectral imagery of the reconstructed scene is available, and when only R,G,B spectral information can be obtained. A method for registering the surface estimation to hyperspectral imagery, through orthorectification, is developed. Atmospherically corrected hyperspectral imagery is used to assign reflectance values to estimated surface facets for physical simulation with DIRSIG. A spatial-spectral region growing-based segmentation algorithm is developed for the R,G,B limited case, in order to identify possible materials for user attribution. Finally, an analysis of the geographic accuracy of automatically generated three-dimensional structure is performed. An end-to-end, semi-automated, workflow is developed, described, and made available for use

    Relating Multimodal Imagery Data in 3D

    Get PDF
    This research develops and improves the fundamental mathematical approaches and techniques required to relate imagery and imagery derived multimodal products in 3D. Image registration, in a 2D sense, will always be limited by the 3D effects of viewing geometry on the target. Therefore, effects such as occlusion, parallax, shadowing, and terrain/building elevation can often be mitigated with even a modest amounts of 3D target modeling. Additionally, the imaged scene may appear radically different based on the sensed modality of interest; this is evident from the differences in visible, infrared, polarimetric, and radar imagery of the same site. This thesis develops a `model-centric\u27 approach to relating multimodal imagery in a 3D environment. By correctly modeling a site of interest, both geometrically and physically, it is possible to remove/mitigate some of the most difficult challenges associated with multimodal image registration. In order to accomplish this feat, the mathematical framework necessary to relate imagery to geometric models is thoroughly examined. Since geometric models may need to be generated to apply this `model-centric\u27 approach, this research develops methods to derive 3D models from imagery and LIDAR data. Of critical note, is the implementation of complimentary techniques for relating multimodal imagery that utilize the geometric model in concert with physics based modeling to simulate scene appearance under diverse imaging scenarios. Finally, the often neglected final phase of mapping localized image registration results back to the world coordinate system model for final data archival are addressed. In short, once a target site is properly modeled, both geometrically and physically, it is possible to orient the 3D model to the same viewing perspective as a captured image to enable proper registration. If done accurately, the synthetic model\u27s physical appearance can simulate the imaged modality of interest while simultaneously removing the 3-D ambiguity between the model and the captured image. Once registered, the captured image can then be archived as a texture map on the geometric site model. In this way, the 3D information that was lost when the image was acquired can be regained and properly related with other datasets for data fusion and analysis
    corecore