15 research outputs found
Multimedia geocoding: the RECOD 2014 approach
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)This work describes the approach proposed by the RECOD team for the Placing Task of MediaEval 2014. This task requires the definition of automatic schemes to assign geographical locations to images and videos. Our approach is based on the use of as much evidences as possible (textual, visual, and/or audio descriptors) to geocode a given image/video. We estimate the location of test items by clustering the geographic coordinates of top-ranked items in one or more ranked lists defined in terms of different criteria.This work describes the approach proposed by the RECOD team for the Placing Task of MediaEval 2014. This task requires the definition of automatic schemes to assign geographical locations to images and videos. Our approach is based on the use of as much e1263FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)2013/08645-0 ; 2013/11359-0306580/2012-8 ; 484254/2012-0sem informaçãoMediaEval 2014 Worksho
Towards better exploiting convolutional neural networks for remote sensing scene classification
We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets or CNNs) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires a considerable amount of labeled data and demands high computational costs. Therefore, it is important to understand how to better use existing ConvNets. We perform experiments with six popular ConvNets using three remote sensing datasets. We also compare ConvNets in each strategy with existing descriptors and with state-of-the-art baselines. Results point that fine tuning tends to be the best performing strategy. In fact, using the features from the fine-tuned ConvNet with linear SVM obtains the best results. We also achieved state-of-the-art results for the three datasets used
Unsupervised distance learning by reciprocal kNN distance for image retrieval
This paper presents a novel unsupervised learning approach that takes into account the intrinsic dataset structure, which is represented in terms of the reciprocal neighborhood references found in different ranked lists. The proposed Reciprocal kNN Distance defines a more effective distance between two images, and is used to improve the effectiveness of image retrieval systems. Several experiments were conducted for different image retrieval tasks involving shape, color, and texture descriptors. The proposed approach is also evaluated on multimodal retrieval tasks, considering visual and textual descriptors. Experimental results demonstrate the effectiveness of proposed approach. The Reciprocal kNN Distance yields better results in terms of effectiveness than various state-of-the-art algorithms. Copyright © 2014 ACM.This paper presents a novel unsupervised learning approach that takes into account the intrinsic dataset structure, which is represented in terms of the reciprocal neighborhood references found in different ranked lists. The proposed Reciprocal kNN Distan345352FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIOR2013/08645-0306580/2012-8 ; 484254/2012-0sem informação4. International Conference on Multimedia Retrieva
A rank aggregation framework for video multimodal geocoding
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to define appropriate locations for videos. An architecture that implements the proposed framework is designed. In this architecture, there are specific modules for each modality (e.g, textual and visual) that can be developed and evolved independently. Another component is a data fusion module responsible for combining seamlessly the ranked lists defined for each modality. We have validated the proposed framework in the context of the MediaEval 2012 Placing Task, whose objective is to automatically assign geographical coordinates to videos. Obtained results show how our multimodal approach improves the geocoding results when compared to methods that rely on a single modality (either textual or visual descriptors). We also show that the proposed multimodal approach yields comparable results to the best submissions to the Placing Task in 2012 using no extra information besides the available development/training data. Another contribution of this work is related to the proposal of a new effectiveness evaluation measure. The proposed measure is based on distance scores that summarize how effective a designed/tested approach is, considering its overall result for a test dataset.This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to define73313231359CAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)2011/11171-5 ; 2009/10554-8306580/2012-8 ; 484254/2012-