290 research outputs found

    Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

    Full text link
    We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We propose to utilize two different kinds of video features, one to capture the video content in terms of objects and attributes, and the other to capture the motion and action information. Using these diverse features we train models specializing in two separate input sub-domains. We then train an evaluator model which is used to pick the best caption from the pool of candidates generated by these domain expert models. We argue that this approach is better suited for the current video captioning task, compared to using a single model, due to the diversity in the dataset. Efficacy of our method is proven by the fact that it was rated best in MSR Video to Language Challenge, as per human evaluation. Additionally, we were ranked second in the automatic evaluation metrics based table

    Definition of enriched relevance feedback in PicSOM : deliverable D1.3.1 of FP7 project nº 216529 PinView

    Get PDF
    This report defines and implements communication principles and data formats for transferring enriched relevance feedback to the PicSOM content-based image retrieval system used in the PinView project. The modalities of enriched relevance feedback include recorded eye movements, pointer and keyboard events and audio including speech. The communication is based on the AJAX technology, where the client and server exchange XML formatted content by using the XMLHttpRequest method

    Evaluation of pointer click relevance feedback in PicSOM : deliverable D1.2 of FP7 project nº 216529 PinView

    Get PDF
    This report presents the results of a series of experiments where knowledge of the most relevant part of images is given as additional information to a content-based image retrieval system. The most relevant parts have been identified by search-task-dependent pointer clicks on the images. As such they provide a rudimentary form of explicit enriched relevance feedback and to some extent mimic genuine implicit eye movement measurements which are essential ingredients of the PinView project

    Techniques for image classification, object detection and object segmentation

    Get PDF
    In this paper we document the techniques which we used to participate in the PASCAL NoE VOC Challenge 2007 image analysis performance evaluation campaign. We took part in three of the image analysis competitions: image classification, object detection and object segmentation. In the classification task of the evaluation our method produced comparatively good performance, the 4th best of 19 submissions. In contrast, our detection results were quite modest. Our method's segmentation accuracy was the best of all submissions. Our approach for the classification task is based on fused classifications by numerous global image features, including histograms of local features. The object detection combines similar classification of automatically extracted image segments and the previously obtained scene type classifications. The object segmentations are obtained in a straightforward fashion from the detection results

    Specification of information interfaces in PinView : deliverable D8.1 of FP7 project nº 216529 PinView

    Get PDF
    This report defines the information interfaces for the PinView project to facilitate the planned research of the project. Successful collaborative research between the multiple project sites requires that the individual efforts can directly support each other. The report contains definitions for the used file system structure, for various file formats, and for data transfer between the project sites. The report will be updated regularly during the project

    Objektintunnistus ja suomalainen elokuva

    Get PDF
    corecore