24,848 research outputs found
Automated annotation of landmark images using community contributed datasets and web resources
A novel solution to the challenge of automatic image annotation is described. Given an image with GPS data of its location of capture, our system returns a semantically-rich annotation comprising tags which both identify the landmark in the image, and provide an interesting fact about it, e.g. "A view of the Eiffel Tower, which was built in 1889 for an international exhibition in Paris". This exploits visual and textual web mining in combination with content-based image
analysis and natural language processing. In the first stage, an input image is matched to a set of community contributed images (with keyword tags) on the basis of its GPS information and image classification techniques. The depicted landmark is inferred from the keyword tags for the matched set. The system then takes advantage of the information written about landmarks available on the web at large to extract a fact about the landmark in the image. We report component evaluation results from an implementation of our solution on a mobile device. Image localisation and matching oers 93.6% classication accuracy; the selection of appropriate tags for use in annotation performs well (F1M of
0.59), and it subsequently automatically identies a correct toponym for use in captioning and fact extraction in 69.0% of the tested cases; finally the fact extraction returns an interesting caption in 78% of cases
Recommended from our members
Active learning of an action detector on untrimmed videos
textCollecting and annotating videos of realistic human actions is tedious, yet critical for training action recognition systems. We propose a method to actively request the most useful video annotations among a large set of unlabeled videos. Predicting the utility of annotating unlabeled video is not trivial, since any given clip may contain multiple actions of interest, and it need not be trimmed to temporal regions of interest. To deal with this problem, we propose a detection-based active learner to train action category models. We develop a voting-based framework to localize likely intervals of interest in an unlabeled clip, and use them to estimate the total reduction in uncertainty that annotating that clip would yield. On three datasets, we show our approach can learn accurate action detectors more efficiently than alternative active learning strategies that fail to accommodate the "untrimmed" nature of real video data.Computer Science
VITALAS at TRECVID-2008
In this paper, we present our experiments in TRECVID 2008 about High-Level feature extraction task. This is the first year for our participation in TRECVID, our system adopts some popular approaches that other workgroups proposed before. We proposed 2 advanced low-level features NEW Gabor texture descriptor and the Compact-SIFT Codeword histogram. Our system applied well-known LIBSVM to train the SVM classifier for the basic classifier. In fusion step, some methods were employed such as the Voting, SVM-base, HCRF and Bootstrap Average AdaBoost(BAAB)
Automatic Action Annotation in Weakly Labeled Videos
Manual spatio-temporal annotation of human action in videos is laborious,
requires several annotators and contains human biases. In this paper, we
present a weakly supervised approach to automatically obtain spatio-temporal
annotations of an actor in action videos. We first obtain a large number of
action proposals in each video. To capture a few most representative action
proposals in each video and evade processing thousands of them, we rank them
using optical flow and saliency in a 3D-MRF based framework and select a few
proposals using MAP based proposal subset selection method. We demonstrate that
this ranking preserves the high quality action proposals. Several such
proposals are generated for each video of the same action. Our next challenge
is to iteratively select one proposal from each video so that all proposals are
globally consistent. We formulate this as Generalized Maximum Clique Graph
problem using shape, global and fine grained similarity of proposals across the
videos. The output of our method is the most action representative proposals
from each video. Our method can also annotate multiple instances of the same
action in a video. We have validated our approach on three challenging action
datasets: UCF Sport, sub-JHMDB and THUMOS'13 and have obtained promising
results compared to several baseline methods. Moreover, on UCF Sports, we
demonstrate that action classifiers trained on these automatically obtained
spatio-temporal annotations have comparable performance to the classifiers
trained on ground truth annotation
- ā¦