7,266 research outputs found

    Inverse scale invariant feature transform models for object recognition and image tagging.

    Get PDF
    This thesis presents three novel image models based on Scale Invariant Feature Transform (SIFT) features and the k-Nearest Neighbors (k-NN) machine learning methodology. While SIFT features characterize an image with distinctive keypoints, the k-NN filters away and normalizes the keypoints with a two-fold goal: (i) compressing the image size, and (ii) reducing the bias that is induced by the variance of keypoint numbers among object classes. Object recognition is approached as a supervised machine learning problem, and the models have been formulated using Support Vector Machines (SVMs). These object recognition models have been tested for single and multiple object detection, and for asymmetrical rotational recognition. Finally, a hierarchical probabilistic framework with basic object classification methodology is formulated as a multi-class learning framework. This framework has been tested for automatic image annotation generation. Object recognition models were evaluated using recognition rate (rank 1) whereas the annotation task was evaluated using the well-known Information Retrieval measures: precision, recall, average precision and average recall.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b163702

    Detect-and-Track: Efficient Pose Estimation in Videos

    Full text link
    This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video. We propose an extremely lightweight yet highly effective approach that builds upon the latest advancements in human detection and video understanding. Our method operates in two-stages: keypoint estimation in frames or short clips, followed by lightweight tracking to generate keypoint predictions linked over the entire video. For frame-level pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D extension of this model, which leverages temporal information over small clips to generate more robust frame predictions. We conduct extensive ablative experiments on the newly released multi-person video pose estimation benchmark, PoseTrack, to validate various design choices of our model. Our approach achieves an accuracy of 55.2% on the validation and 51.8% on the test set using the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack and webpage: https://rohitgirdhar.github.io/DetectAndTrack
    • …
    corecore