1,637 research outputs found

    Landmark Image Retrieval Using Visual Synonyms

    Get PDF
    In this paper, we consider the incoherence problem of the visual words in bag-of-words vocabularies. Different from existing work, which performs assignment of words based solely on closeness in descriptor space, we focus on identifying pairs of independent, distant words - the visual synonyms - that are still likely to host image patches with similar appearance. To study this problems we focus on landmark images, where we can examine whether image geometry is an appropriate vehicle for detecting visual synonyms. We propose an algorithm for the extraction of visual synonyms in landmark images. To show the merit of visual synonyms, we perform two experiments. We examine closeness of synonyms in descriptor space and we show a first application of visual synonyms in a landmark image retrieval setting. Using visual synonyms, we perform on par with the state-of-the-art, but with six times less visual words

    Location Estimation of a Photo: A Geo-signature MapReduce Workflow

    Get PDF
    Location estimation of a photo is the method to find the location where the photo was taken that is a new branch of image retrieval. Since a large number of photos are shared on the social multimedia. Some photos are without geo-tagging which can be estimated their location with the help of million geo-tagged photos from the social multimedia. Recent researches about the location estimation of a photo are available. However, most of them are neglectful to define the uniqueness of one place that is able to be totally distinguished from other places. In this paper, we design a workflow named G-sigMR (Geo-signature MapReduce) for the improvement of recognition performance. Our workflow generates the uniqueness of a location named Geo-signature which is summarized from the visual synonyms with the MapReduce structure for indexing to the large-scale dataset. In light of the validity for image retrieval, our G-sigMR was quantitatively evaluated using the standard benchmark specific for location estimation; to compare with other well-known approaches (IM2GPS, SC, CS, MSER, VSA and VCG) in term of average recognition rate. From the results, G-sigMR outperformed previous approaches.Location estimation of a photo is the method to find the location where the photo was taken that is a new branch of image retrieval. Since a large number of photos are shared on the social multimedia. Some photos are without geo-tagging which can be estimated their location with the help of million geo-tagged photos from the social multimedia. Recent researches about the location estimation of a photo are available. However, most of them are neglectful to define the uniqueness of one place that is able to be totally distinguished from other places. In this paper, we design a workflow named G-sigMR (Geo-signature MapReduce) for the improvement of recognition performance. Our workflow generates the uniqueness of a location named Geo-signature which is summarized from the visual synonyms with the MapReduce structure for indexing to the large-scale dataset. In light of the validity for image retrieval, our G-sigMR was quantitatively evaluated using the standard benchmark specific for location estimation; to compare with other well-known approaches (IM2GPS, SC, CS, MSER, VSA and VCG) in term of average recognition rate. From the results, G-sigMR outperformed previous approaches

    Automatic Image Tagging based on Context Information

    Get PDF
    People love to take images, but are not so willing to annotate the images af-terwards with relevant tags. Manually tagging images is both subjective (dependent on annotator) and time consuming. It would be nice if the tag-ging process could be done automatically. A requirement for effective searching and retrieval of images in rapid growing online image databases is that each image has accurate and useful annotation. This thesis shows that automatic tagging of images with relevant tags is possible by using a combination of the capture location, the date/time when the image was captured and an image category. The use of image categories (together with location and date/time) ensures that many relevant tags are returned and restrict the occurrence of noisy tags to a very low level despite using a noisy image database (Flickr). Other methods used for further re-stricting noise are to restrict usage of more than one image from same user (as basis for tagging the query image) and a dynamic approach for using many images when possible, and fewer images when not many relevant im-ages are found. The designed system is able to tag an image as long as there are a sufficient number of geo-referenced and already tagged images that is relevant for the query image available on Flickr. The query image must also have been geo-referenced and it is assumed that the user provides an image category. Im-ages are processed based on which category the images belongs to, i.e. an image is processed with the best method to handle images belonging to that specific category. In short, this means that images of objects or places are processed differently than images from events. The evaluation of the system indicates that usage of image categories is very helpful when tagging images. The system finds more relevant tags and fewer noisy tags than baseline systems using only location. It also performs good compared to a system using both location and content-based image analysis

    Content-Based Visual Landmark Search via Multimodal Hypergraph Learning

    Get PDF
    Formerly IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics</p

    Voice processing in dementia: a neuropsychological and neuroanatomical analysis

    Get PDF
    Voice processing in neurodegenerative disease is poorly understood. Here we undertook a systematic investigation of voice processing in a cohort of patients with clinical diagnoses representing two canonical dementia syndromes: temporal variant frontotemporal lobar degeneration (n = 14) and Alzheimerā€™s disease (n = 22). Patient performance was compared with a healthy matched control group (n = 35). All subjects had a comprehensive neuropsychological assessment including measures of voice perception (vocal size, gender, speaker discrimination) and voice recognition (familiarity, identification, naming and cross-modal matching) and equivalent measures of face and name processing. Neuroanatomical associations of voice processing performance were assessed using voxel-based morphometry. Both disease groups showed deficits on all aspects of voice recognition and impairment was more severe in the temporal variant frontotemporal lobar degeneration group than the Alzheimerā€™s disease group. Face and name recognition were also impaired in both disease groups and name recognition was significantly more impaired than other modalities in the temporal variant frontotemporal lobar degeneration group. The Alzheimerā€™s disease group showed additional deficits of vocal gender perception and voice discrimination. The neuroanatomical analysis across both disease groups revealed common grey matter associations of familiarity, identification and cross-modal recognition in all modalities in the right temporal pole and anterior fusiform gyrus; while in the Alzheimerā€™s disease group, voice discrimination was associated with grey matter in the right inferior parietal lobe. The findings suggest that impairments of voice recognition are significant in both these canonical dementia syndromes but particularly severe in temporal variant frontotemporal lobar degeneration, whereas impairments of voice perception may show relative specificity for Alzheimerā€™s disease. The right anterior temporal lobe is likely to have a critical role in the recognition of voices and other modalities of person knowledge

    Identifying related landmark tags in urban scenes using spatial and semantic clustering

    Get PDF
    There is considerable interest in developing landmark saliency models as a basis for describing urban landscapes, and in constructing wayfinding instructions, for text and spoken dialogue based systems. The challenge lies in knowing the truthfulness of such models; is what the model considers salient the same as what is perceived by the user? This paper presents a web based experiment in which users were asked to tag and label the most salient features from urban images for the purposes of navigation and exploration. In order to rank landmark popularity in each scene it was necessary to determine which tags related to the same object (e.g. tags relating to a particular caf&eacute;). Existing clustering techniques did not perform well for this task, and it was therefore necessary to develop a new spatial-semantic clustering method which considered the proximity of nearby tags and the similarity of their label content. The annotation similarity was initially calculated using trigrams in conjunction with a synonym list, generating a set of networks formed from the links between related tags. These networks were used to build related word lists encapsulating conceptual connections (e.g. church tower related to clock) so that during a secondary pass of the data related network segments could be merged. This approach gives interesting insight into the partonomic relationships between the constituent parts of landmarks and the range and frequency of terms used to describe them. The knowledge gained from this will be used to help calibrate a landmark saliency model, and to gain a deeper understanding of the terms typically associated with different types of landmarks
    • ā€¦
    corecore