Search CORE

1,054 research outputs found

Multi-modal gated recurrent units for image description

Author: Li Xuelong
Lu Xiaoqiang
Yuan Aihong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/04/2019
Field of study

Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets.Comment: 25 pages, 7 figures, 6 tables, magazin

arXiv.org e-Print Archive

Institutional Repository of Xi'an Institute of Optics and Precision Mechanics, CAS

Dynamic scene understanding using deep neural networks

Author: Lyu Y.
Publication venue: University of Twente, Faculty of Geo-Information Science and Earth Observation (ITC)
Publication date: 08/09/2021
Field of study

University of Twente Research Information

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

Author: Tien Alex
Wang Liya
Publication venue
Publication date: 24/03/2023
Field of study

Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the Machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked Image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained Vision Transformer (ViTs) backbones outperform other alternatives (up to 18% on top 1 accuracy) and that the MIM technique can learn better feature representation than the supervised learning counterparts (up to 5% on top 1 accuracy). Moreover, we show that the general-purpose MIM-pretrained ViTs can achieve competitive performance as the specially designed yet complicated Transformer for Remote Sensing (TRS) framework. Our experiment results also provide a performance baseline for future studies.Comment: arXiv admin note: text overlap with arXiv:2301.1205

arXiv.org e-Print Archive

Automatic Caption Generation for Aerial Images: A Survey

Author: Kharate Gajanan K.
Mondhe Parag Jayant
Satone Manisha P.
Publication venue: IAES Indonesia Section
Publication date: 25/03/2023
Field of study

Aerial images have attracted attention from researcher community since long time. Generating a caption for an aerial image describing its content in comprehensive way is less studied but important task as it has applications in agriculture, defence, disaster management and many more areas. Though different approaches were followed for natural image caption generation, generating a caption for aerial image remains a challenging task due to its special nature. Use of emerging techniques from Artificial Intelligence (AI) and Natural Language Processing (NLP) domains have resulted in generation of accepted quality captions for aerial images. However lot needs to be done to fully utilize potential of aerial image caption generation task. This paper presents detail survey of the various approaches followed by researchers for aerial image caption generation task. The datasets available for experimentation, criteria used for performance evaluation and future directions are also discussed

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Deep Learning for Aerial Scene Understanding in High Resolution Remote Sensing Imagery from the Lab to the Wild

Author: Hua Yuansheng
Publication venue
Publication date: 01/01/2022
Field of study

Diese Arbeit präsentiert die Anwendung von Deep Learning beim Verständnis von Luftszenen, z. B. Luftszenenerkennung, Multi-Label-Objektklassifizierung und semantische Segmentierung. Abgesehen vom Training tiefer Netzwerke unter Laborbedingungen bietet diese Arbeit auch Lernstrategien für praktische Szenarien, z. B. werden Daten ohne Einschränkungen gesammelt oder Annotationen sind knapp

Institute of Transport Research:Publications