7,266 research outputs found
Inverse scale invariant feature transform models for object recognition and image tagging.
This thesis presents three novel image models based on Scale Invariant Feature Transform (SIFT) features and the k-Nearest Neighbors (k-NN) machine learning methodology. While SIFT features characterize an image with distinctive keypoints, the k-NN filters away and normalizes the keypoints with a two-fold goal: (i) compressing the image size, and (ii) reducing the bias that is induced by the variance of keypoint numbers among object classes. Object recognition is approached as a supervised machine learning problem, and the models have been formulated using Support Vector Machines (SVMs). These object recognition models have been tested for single and multiple object detection, and for asymmetrical rotational recognition. Finally, a hierarchical probabilistic framework with basic object classification methodology is formulated as a multi-class learning framework. This framework has been tested for automatic image annotation generation. Object recognition models were evaluated using recognition rate (rank 1) whereas the annotation task was evaluated using the well-known Information Retrieval measures: precision, recall, average precision and average recall.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b163702
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
- …