1,196 research outputs found

    Combined object recognition approaches for mobile robotics

    Get PDF
    There are numerous solutions to simple object recognition problems when the machine is operating under strict environmental conditions (such as lighting). Object recognition in real-world environments poses greater difficulty however. Ideally mobile robots will function in real-world environments without the aid of fiduciary identifiers. More robust methods are therefore needed to perform object recognition reliably. A combined approach of multiple techniques improves recognition results. Active vision and peripheral-foveal vision—systems that are designed to improve the information gathered for the purposes of object recognition—are examined. In addition to active vision and peripheral-foveal vision, five object recognition methods that either make use of some form of active vision or could leverage active vision and/or peripheral-foveal vision systems are also investigated: affine-invariant image patches, perceptual organization, 3D morphable models (3DMMs), active viewpoint, and adaptive color segmentation. The current state-of-the-art in these areas of vision research and observations on areas of future research are presented. Examples of state-of-theart methods employed in other vision applications that have not been used for object recognition are also mentioned. Lastly, the future direction of the research field is hypothesized

    A very simple framework for 3D human poses estimation using a single 2D image: Comparison of geometric moments descriptors.

    Get PDF
    In this paper, we propose a framework in order to automatically extract the 3D pose of an individual from a single silhouette image obtained with a classical low-cost camera without any depth information. By pose, we mean the configuration of human bones in order to reconstruct a 3D skeleton representing the 3D posture of the detected human. Our approach combines prior learned correspondences between silhouettes and skeletons extracted from simulated 3D human models publicly available on the internet. The main advantages of such approach are that silhouettes can be very easily extracted from video, and 3D human models can be animated using motion capture data in order to quickly build any movement training data. In order to match detected silhouettes with simulated silhouettes, we compared geometrics invariants moments. According to our results, we show that the proposed method provides very promising results with a very low time processing

    Study Of Human Activity In Video Data With An Emphasis On View-invariance

    Get PDF
    The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations, and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, this means that in order to define a new action, the user has to record a number of videos from different viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is broken down into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point iii planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on “fundamental ratios.” We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based on the concept of “Projective Depth”. Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using “projective depth:” (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the “projective depth” of the body points with respect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person - We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of body points, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants. i

    Radially-Distorted Conjugate Translations

    Full text link
    This paper introduces the first minimal solvers that jointly solve for affine-rectification and radial lens distortion from coplanar repeated patterns. Even with imagery from moderately distorted lenses, plane rectification using the pinhole camera model is inaccurate or invalid. The proposed solvers incorporate lens distortion into the camera model and extend accurate rectification to wide-angle imagery, which is now common from consumer cameras. The solvers are derived from constraints induced by the conjugate translations of an imaged scene plane, which are integrated with the division model for radial lens distortion. The hidden-variable trick with ideal saturation is used to reformulate the constraints so that the solvers generated by the Grobner-basis method are stable, small and fast. Rectification and lens distortion are recovered from either one conjugately translated affine-covariant feature or two independently translated similarity-covariant features. The proposed solvers are used in a \RANSAC-based estimator, which gives accurate rectifications after few iterations. The proposed solvers are evaluated against the state-of-the-art and demonstrate significantly better rectifications on noisy measurements. Qualitative results on diverse imagery demonstrate high-accuracy undistortions and rectifications. The source code is publicly available at https://github.com/prittjam/repeats

    Towards automated capture of 3D foot geometry for custom orthoses

    Get PDF
    This thesis presents a novel method of capturing 3D foot geometry from images for custom shoe insole manufacture. Orthopedic footwear plays an important role as a treatment and prevention of foot conditions associated with diabetes. Through the use of customized shoe insoles, a podiatrist can provide a means to better distribute the pressure around the foot, and can also correct the biomechanics of the foot. Different foot scanners are used to obtain the geometric plantar surface of foot, but are expensive and more generic in nature. The focus of this thesis is to build 3D foot structure from a pair of calibrated images. The process begins with considering a pair of good images of the foot, obtained from the scanner utility frame. The next step involves identifying corners or features in the images. Correlation between the selected features forms the fundamental part of epipolar analysis. Rigorous techniques are implemented for robust feature matching. A 3D point cloud is then obtained by applying the 8-point algorithm and linear 3D triangulation method. The advantage of this system is quick capture of foot geometry and minimal intervention from the user. A reconstructed 3D point cloud of foot is presented to verify this method as inexpensive and more suited to the needs of the podiatrist
    • …
    corecore