83,650 research outputs found

    Automatic human face detection for content-based image annotation

    Get PDF
    In this paper, an automatic human face detection approach using colour analysis is applied for content-based image annotation. In the face detection, the probable face region is detected by adaptive boosting algorithm, and then combined with a colour filtering classifier to enhance the accuracy in face detection. The initial experimental benchmark shows the proposed scheme can be efficiently applied for image annotation with higher fidelity

    Automatic Image Annotation Using CMRM with Scene Information

    Get PDF
    Searching of digital images in a disorganized image collection is a challenging problem. One step of image searching is automatic image annotation. Automatic image annotation refers to the process of automatically assigning relevant text keywords to any given image, reflecting its content. In the past decade many automatic image annotation methods have been proposed and achieved promising result. However, annotation prediction from the methods is still far from accurate. To tackle this problem, in this paper we propose an automatic annotation method using relevance model and scene information. CMRM proposed by [5] is one of automatic image annotation method based on relevance model approach. CMRM method assumes that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from segmentation, feature extraction, and clustering. Given a training set of images with annotations, this method predicts the probability of generating a word given the blobs in an image. To improve annotation prediction accuracy of CMRM, in this paper we utilize scene information incorporate with CMRM. Our proposed method is called scene-CMRM. Global image region can be represented by features which indicate type of scene shown in the image. Thus, annotation prediction of CMRM could be more accurate based on that scene type. Our experiments showed that, the methods provides prediction with better precision than CMRM does, where precision represents the percentage of words that is correctly predicted

    Region-based annotation tool using partition trees

    Get PDF
    This paper presents an annotation tool for the manual and region-based annotation of still images. The selection of regions is achieved by navigating through a Partition Tree, a data structure that offers a multiscale representation of the image. The user interface provides a framework for the annotation of both atomic and composite semantic classes and generates an MPEG-7 XML compliant file.Postprint (published version

    Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

    Full text link
    We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles: (I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions. The edit operations are also assisted by the model. (II) Full image annotation in a single pass. As opposed to performing a series of small annotation tasks in isolation, we propose a unified interface for full image annotation in a single pass. (III) Empower the annotator. We empower the annotator to choose what to annotate and in which order. This enables concentrating on what the machine does not already know, i.e. putting human effort only on the errors it made. This helps using the annotation budget effectively. Through extensive experiments on the COCO+Stuff dataset, we demonstrate that Fluid Annotation leads to accurate annotations very efficiently, taking three times less annotation time than the popular LabelMe interface.Comment: ACM MultiMedia 2018. Live demo is available at fluidann.appspot.co

    Visual-Linguistic Semantic Alignment: Fusing Human Gaze and Spoken Narratives for Image Region Annotation

    Get PDF
    Advanced image-based application systems such as image retrieval and visual question answering depend heavily on semantic image region annotation. However, improvements in image region annotation are limited because of our inability to understand how humans, the end users, process these images and image regions. In this work, we expand a framework for capturing image region annotations where interpreting an image is influenced by the end user\u27s visual perception skills, conceptual knowledge, and task-oriented goals. Human image understanding is reflected by individuals\u27 visual and linguistic behaviors, but the meaningful computational integration and interpretation of their multimodal representations (e.g. gaze, text) remain a challenge. Our work explores the hypothesis that eye movements can help us understand experts\u27 perceptual processes and that spoken language descriptions can reveal conceptual elements of image inspection tasks. We propose that there exists a meaningful relation between gaze, spoken narratives, and image content. Using unsupervised bitext alignment, we create meaningful mappings between participants\u27 eye movements (which reveal key areas of images) and spoken descriptions of those images. The resulting alignments are then used to annotate image regions with concept labels. Our alignment accuracy exceeds baseline alignments that are obtained using both simultaneous and a fixed-delay temporal correspondence. Additionally, comparison of alignment accuracy between a method that identifies clusters in the images based on eye movements and a method that identifies clusters using image features shows that the two approaches perform well on different types of images and concept labels. This suggests that an image annotation framework could integrate information from more than one technique to handle heterogeneous images. The resulting alignments can be used to create a database of low-level image features and high-level semantic annotations corresponding to perceptually important image regions. We demonstrate the applicability of the proposed framework with two datasets: one consisting of general-domain images and another with images from the domain of medicine. This work is an important contribution toward the highly challenging problem of fusing human-elicited multimodal data sources, a problem that will become increasingly important as low-resource scenarios become more common

    Del pĂ­xel a las resonancias visuales: la imagen con voz propia

    Get PDF
    The objective of our research is to develop a series of computer vision programs to search for analogies in large datasetsÂżin this case, collections of images of abstract paintingsÂż based solely on their visual content without textual annotation. We have programmed an algorithm based on a specific model of image description used in computer vision. This approach involves placing a regular grid over the image and selecting a pixel region around each node. Dense features computed over this regular grid with overlapping patches are used to represent the images. Analysing the distances between the whole set of image descriptors we are able to group them according to their similarity and each resulting group will determines what we call 'visual words'. This model is called Bag-of-Words representation Given the frequency with which each visual word occurs in each image, we apply the method pLSA (Probabilistic Latent Semantic Analysis), a statistical model that classifies fully automatically, without any textual annotation, images according to their formal patterns. I
    • …
    corecore