396 research outputs found

    Imagination Based Sample Construction for Zero-Shot Learning

    Full text link
    Zero-shot learning (ZSL) which aims to recognize unseen classes with no labeled training sample, efficiently tackles the problem of missing labeled data in image retrieval. Nowadays there are mainly two types of popular methods for ZSL to recognize images of unseen classes: probabilistic reasoning and feature projection. Different from these existing types of methods, we propose a new method: sample construction to deal with the problem of ZSL. Our proposed method, called Imagination Based Sample Construction (IBSC), innovatively constructs image samples of target classes in feature space by mimicking human associative cognition process. Based on an association between attribute and feature, target samples are constructed from different parts of various samples. Furthermore, dissimilarity representation is employed to select high-quality constructed samples which are used as labeled data to train a specific classifier for those unseen classes. In this way, zero-shot learning is turned into a supervised learning problem. As far as we know, it is the first work to construct samples for ZSL thus, our work is viewed as a baseline for future sample construction methods. Experiments on four benchmark datasets show the superiority of our proposed method.Comment: Accepted as a short paper in ACM SIGIR 201

    TagBook: A Semantic Video Representation without Supervision for Event Detection

    Get PDF
    We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.Comment: accepted for publication as a regular paper in the IEEE Transactions on Multimedi

    Adaptive Tag Selection for Image Annotation

    Full text link
    Not all tags are relevant to an image, and the number of relevant tags is image-dependent. Although many methods have been proposed for image auto-annotation, the question of how to determine the number of tags to be selected per image remains open. The main challenge is that for a large tag vocabulary, there is often a lack of ground truth data for acquiring optimal cutoff thresholds per tag. In contrast to previous works that pre-specify the number of tags to be selected, we propose in this paper adaptive tag selection. The key insight is to divide the vocabulary into two disjoint subsets, namely a seen set consisting of tags having ground truth available for optimizing their thresholds and a novel set consisting of tags without any ground truth. Such a division allows us to estimate how many tags shall be selected from the novel set according to the tags that have been selected from the seen set. The effectiveness of the proposed method is justified by our participation in the ImageCLEF 2014 image annotation task. On a set of 2,065 test images with ground truth available for 207 tags, the benchmark evaluation shows that compared to the popular top-kk strategy which obtains an F-score of 0.122, adaptive tag selection achieves a higher F-score of 0.223. Moreover, by treating the underlying image annotation system as a black box, the new method can be used as an easy plug-in to boost the performance of existing systems

    3D Object Detection for Autonomous Driving: A Survey

    Full text link
    Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of such perception system especially for the sake of path planning, motion prediction, collision avoidance, etc. Generally, stereo or monocular images with corresponding 3D point clouds are already standard layout for 3D object detection, out of which point clouds are increasingly prevalent with accurate depth information being provided. Despite existing efforts, 3D object detection on point clouds is still in its infancy due to high sparseness and irregularity of point clouds by nature, misalignment view between camera view and LiDAR bird's eye of view for modality synergies, occlusions and scale variations at long distances, etc. Recently, profound progress has been made in 3D object detection, with a large body of literature being investigated to address this vision task. As such, we present a comprehensive review of the latest progress in this field covering all the main topics including sensors, fundamentals, and the recent state-of-the-art detection methods with their pros and cons. Furthermore, we introduce metrics and provide quantitative comparisons on popular public datasets. The avenues for future work are going to be judiciously identified after an in-deep analysis of the surveyed works. Finally, we conclude this paper.Comment: 3D object detection, Autonomous driving, Point cloud

    Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

    Get PDF
    Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, i.e. estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this paper introduces a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison between the state-of-the-art, a new experimental protocol is presented, with training sets containing 10k, 100k and 1m images and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.Comment: to appear in ACM Computing Survey
    • …
    corecore