636 research outputs found

    Automated annotation of landmark images using community contributed datasets and web resources

    Get PDF
    A novel solution to the challenge of automatic image annotation is described. Given an image with GPS data of its location of capture, our system returns a semantically-rich annotation comprising tags which both identify the landmark in the image, and provide an interesting fact about it, e.g. "A view of the Eiffel Tower, which was built in 1889 for an international exhibition in Paris". This exploits visual and textual web mining in combination with content-based image analysis and natural language processing. In the first stage, an input image is matched to a set of community contributed images (with keyword tags) on the basis of its GPS information and image classification techniques. The depicted landmark is inferred from the keyword tags for the matched set. The system then takes advantage of the information written about landmarks available on the web at large to extract a fact about the landmark in the image. We report component evaluation results from an implementation of our solution on a mobile device. Image localisation and matching oers 93.6% classication accuracy; the selection of appropriate tags for use in annotation performs well (F1M of 0.59), and it subsequently automatically identies a correct toponym for use in captioning and fact extraction in 69.0% of the tested cases; finally the fact extraction returns an interesting caption in 78% of cases

    Learning about Large Scale Image Search: Lessons from Global Scale Hotel Recognition to Fight Sex Trafficking

    Get PDF
    Hotel recognition is a sub-domain of scene recognition that involves determining what hotel is seen in a photograph taken in a hotel. The hotel recognition task is a challenging computer vision task due to the properties of hotel rooms, including low visual similarity between rooms in the same hotel and high visual similarity between rooms in different hotels, particularly those from the same chain. Building accurate approaches for hotel recognition is important to investigations of human trafficking. Images of human trafficking victims are often shared by traffickers among criminal networks and posted in online advertisements. These images are often taken in hotels. Using hotel recognition approaches to determine the hotel a victim was photographed in can assist in investigations and prosecutions of human traffickers. In this dissertation, I present an application for the ongoing capture of hotel imagery by the public, a large-scale curated dataset of hotel room imagery, deep learning approaches to hotel recognition based on this imagery, a visualization approach that provides insight into what networks trained on image similarity are learning, and an approach to image search focused on specific objects in scenes. Taken together, these contributions have resulted in a first in the world system that offers a solution to answering the question, `What hotel was this photograph taken in?\u27 at a global scale

    Information retrieval challenges of maintaining a context-aware human digital memory

    Get PDF
    The volume of personal digital data captured from today's content creation devices, such as digital cameras, digital video recorders and sensecams pose many challenges for organising and retrieving content for users. By utilising content and contextual analysis along with an understanding of the usage scenarios involved, it is possible to develop effective information retrieval technologies for these personal archives. In this talk I will discuss how we, at the Centre for Digital Video Processing, Dublin City University, have employed both content and contextual analysis to automatically organise human digital memory (sensecam) collections and I will focus specifically on how we have employed techniques from photo and video retrieval in the novel domain of human digital memories

    Texture Synthesis Guided Deep Hashing for Texture Image Retrieval

    Full text link
    With the large-scale explosion of images and videos over the internet, efficient hashing methods have been developed to facilitate memory and time efficient retrieval of similar images. However, none of the existing works uses hashing to address texture image retrieval mostly because of the lack of sufficiently large texture image databases. Our work addresses this problem by developing a novel deep learning architecture that generates binary hash codes for input texture images. For this, we first pre-train a Texture Synthesis Network (TSN) which takes a texture patch as input and outputs an enlarged view of the texture by injecting newer texture content. Thus it signifies that the TSN encodes the learnt texture specific information in its intermediate layers. In the next stage, a second network gathers the multi-scale feature representations from the TSN's intermediate layers using channel-wise attention, combines them in a progressive manner to a dense continuous representation which is finally converted into a binary hash code with the help of individual and pairwise label information. The new enlarged texture patches also help in data augmentation to alleviate the problem of insufficient texture data and are used to train the second stage of the network. Experiments on three public texture image retrieval datasets indicate the superiority of our texture synthesis guided hashing approach over current state-of-the-art methods.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Video Presentation: https://www.youtube.com/watch?v=tXaXTGhzaJ
    corecore