3 research outputs found

    Overview of the ImageCLEFmed 2019 concept detection task

    Get PDF
    This paper describes the ImageCLEF 2019 Concept Detection Task. This is the 3rd edition of the medical caption task, after it was first proposed in ImageCLEF 2017. Concept detection from medical images remains a challenging task. In 2019, the format changed to a single subtask and it is part of the medical tasks, alongside the tuberculosis and visual question and answering tasks. To reduce noisy labels and limit variety, the data set focuses solely on radiology images rather than biomedical figures, extracted from the biomedical open access literature (PubMed Central). The development data consists of 56,629 training and 14,157 validation images, with corresponding Unified Medical Language System (UMLSR) concepts, extracted from the image captions. In 2019 the participation is higher, regarding the number of participating teams as well as the number of submitted runs. Several approaches were used by the teams, mostly deep learning techniques. Long short-term memory (LSTM) recurrent neural networks (RNN), adversarial auto-encoder, convolutional neural networks (CNN) image encoders and transfer learning-based multi-label classification models were the frequently used approaches. Evaluation uses F1-scores computed per image and averaged across all 10,000 test images

    Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning

    Get PDF
    High-resolution remote sensing images are now available with the progress of remote sensing technology. With respect to popular remote sensing tasks, such as scene classification, image captioning provides comprehensible information about such images by summarizing the image content in human-readable text. Most existing remote sensing image captioning methods are based on deep learning-based encoder–decoder frameworks, using convolutional neural network or recurrent neural network as the backbone of such frameworks. Such frameworks show a limited capability to analyze sequential data and cope with the lack of captioned remote sensing training images. Recently introduced Transformer architecture exploits self-attention to obtain superior performance for sequence-analysis tasks. Inspired by this, in this work, we employ a Transformer as an encoder–decoder for remote sensing image captioning. Moreover, to deal with the limited training data, an auxiliary decoder is used that further helps the encoder in the training process. The auxiliary decoder is trained for multilabel scene classification due to its conceptual similarity to image captioning and capability of highlighting semantic classes. To the best of our knowledge, this is the first work exploiting multilabel classification to improve remote sensing image captioning. Experimental results on the University of California (UC)-Merced caption dataset show the efficacy of the proposed method. The implementation details can be found in https://gitlab.lrz.de/ai4eo/captioningMultilabel
    corecore