Search CORE

407 research outputs found

RCDT: Relational Remote Sensing Change Detection with Transformer

Author: Huang Xiao
Lu Kaixuan
Publication venue
Publication date: 09/12/2022
Field of study

Deep learning based change detection methods have received wide attentoion, thanks to their strong capability in obtaining rich features from images. However, existing AI-based CD methods largely rely on three functionality-enhancing modules, i.e., semantic enhancement, attention mechanisms, and correspondence enhancement. The stacking of these modules leads to great model complexity. To unify these three modules into a simple pipeline, we introduce Relational Change Detection Transformer (RCDT), a novel and simple framework for remote sensing change detection tasks. The proposed RCDT consists of three major components, a weight-sharing Siamese Backbone to obtain bi-temporal features, a Relational Cross Attention Module (RCAM) that implements offset cross attention to obtain bi-temporal relation-aware features, and a Features Constrain Module (FCM) to achieve the final refined predictions with high-resolution constraints. Extensive experiments on four different publically available datasets suggest that our proposed RCDT exhibits superior change detection performance compared with other competing methods. The therotical, methodogical, and experimental knowledge of this study is expected to benefit future change detection efforts that involve the cross attention mechanism.Comment: 18 pages, 11 figures

arXiv.org e-Print Archive

Learning Social Image Embedding with Deep Multimodal Attention Networks

Author: He Yueying
Huang Feiran
Li Zhoujun
Mei Tao
Zhang Xiaoming
Zhao Zhonghua
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network structure for embedding learning, a novel Siamese-Triplet neural network is proposed to model the links among images. With the joint deep model, the learnt embedding can capture both the multimodal contents and the nonlinear network information. Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search. Compared to state-of-the-art image embeddings, our proposed DMAN achieves significant improvement in the tasks of multi-label classification and cross-modal search

arXiv.org e-Print Archive

Crossref

A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

Author: Babu R. Venkatesh
Kruthiventi Srinivas S S
Mopuri Konda Reddy
Prabhu Nikita
Sarvadevabhatla Ravi Kiran
Srinivas Suraj
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm

arXiv.org e-Print Archive

Frontiers - Publisher Connector

High-resolution triplet network with dynamic multiscale feature for change detection on satellite images

Author: Bai Yunpeng
Hou Xuan
Li Ying
Shang Changjing
Shen Qiang
Publication venue
Publication date: 01/07/2021
Field of study

Aberystwyth Research Portal

Going Deeper into Action Recognition: A Survey

Author: Harandi Mehrtash
Herath Samitha
Porikli Fatih
Publication venue
Publication date: 01/01/2017
Field of study

Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

arXiv.org e-Print Archive

The Australian National University