596 research outputs found

    ANVIL: a system for the retrieval of captioned images using NLP techniques

    Get PDF
    ANVIL is a system designed for the retrieval of images annotated with short captions. It uses NLP techniques to extract dependency structures from captions and queries, and then applies a robust matching algorithm to recursively explore and compare them. There are currently two main interfaces to ANVIL: a list-based display and a 2D spatial layout that allows users to interact with and navigate between similar images. ANVIL was designed to operate as part of a publicly accessible, WWW-based image retrieval server. Consequently, product-level engineering standards were required. This paper examines both the research aspects of the system and also looks at some of the design and evaluation issues.

    Automated annotation of landmark images using community contributed datasets and web resources

    Get PDF
    A novel solution to the challenge of automatic image annotation is described. Given an image with GPS data of its location of capture, our system returns a semantically-rich annotation comprising tags which both identify the landmark in the image, and provide an interesting fact about it, e.g. "A view of the Eiffel Tower, which was built in 1889 for an international exhibition in Paris". This exploits visual and textual web mining in combination with content-based image analysis and natural language processing. In the first stage, an input image is matched to a set of community contributed images (with keyword tags) on the basis of its GPS information and image classification techniques. The depicted landmark is inferred from the keyword tags for the matched set. The system then takes advantage of the information written about landmarks available on the web at large to extract a fact about the landmark in the image. We report component evaluation results from an implementation of our solution on a mobile device. Image localisation and matching oers 93.6% classication accuracy; the selection of appropriate tags for use in annotation performs well (F1M of 0.59), and it subsequently automatically identies a correct toponym for use in captioning and fact extraction in 69.0% of the tested cases; finally the fact extraction returns an interesting caption in 78% of cases

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Video Classification:A Literature Survey

    Get PDF
    At present, so much videos are available from many resources. But viewers want video of their interest. So for users to find a video of interest work has started for video classification. Video Classification literature is presented in this paper. There are mainly three approaches by which process of video classification can be done. For video classification, features are derived from three different modalities: Audio, Text and Visual. From these features, classification has been done. At last, these different approaches are compared. Advantages and Dis-advantages of each approach/method are described in this paper with appropriate applications

    Semantic Metadata Extraction from Dense Video Captioning

    Full text link
    Annotation of multimedia data by humans is time-consuming and costly, while reliable automatic generation of semantic metadata is a major challenge. We propose a framework to extract semantic metadata from automatically generated video captions. As metadata, we consider entities, the entities' properties, relations between entities, and the video category. We employ two state-of-the-art dense video captioning models with masked transformer (MT) and parallel decoding (PVDC) to generate captions for videos of the ActivityNet Captions dataset. Our experiments show that it is possible to extract entities, their properties, relations between entities, and the video category from the generated captions. We observe that the quality of the extracted information is mainly influenced by the quality of the event localization in the video as well as the performance of the event caption generation
    corecore