Search CORE

88 research outputs found

MirrorNet: Bio-Inspired Camouflaged Object Segmentation

Author: Do Thanh-Toan
Le Trung-Nghia
Nguyen Khanh-Duy
Nguyen Tam V.
Tran Minh-Triet
Yan Jinnan
Publication venue
Publication date: 31/10/2020
Field of study

Camouflaged objects are generally difficult to be detected in their natural environment even for human beings. In this paper, we propose a novel bio-inspired network, named the MirrorNet, that leverages both instance segmentation and mirror stream for the camouflaged object segmentation. Differently from existing networks for segmentation, our proposed network possesses two segmentation streams: the main stream and the mirror stream corresponding with the original image and its flipped image, respectively. The output from the mirror stream is then fused into the main stream's result for the final camouflage map to boost up the segmentation accuracy. Extensive experiments conducted on the public CAMO dataset demonstrate the effectiveness of our proposed network. Our proposed method achieves 89% in accuracy, outperforming the state-of-the-arts. Project Page: https://sites.google.com/view/ltnghia/research/camoComment: Under Revie

arXiv.org e-Print Archive

Monash University Research Portal

Integrating Part-Object Relationship and Contrast for Camouflaged Object Detection

Author: Han Jungong
Liu Yi
Zhang Dingwen
Zhang Qiang
Publication venue
Publication date: 11/11/2021
Field of study

Aberystwyth Research Portal

Learning Transferable Representations for Visual Recognition

Author: Zhang Yang
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

In the last half-decade, a new renaissance of machine learning originates from the applications of convolutional neural networks to visual recognition tasks. It is believed that a combination of big curated data and novel deep learning techniques can lead to unprecedented results. However, the increasingly large training data is still a drop in the ocean compared with scenarios in the wild. In this literature, we focus on learning transferable representation in the neural networks to ensure the models stay robust, even given different data distributions. We present three exemplar topics in three chapters, respectively: zero-shot learning, domain adaptation, and generalizable adversarial attack. By zero-shot learning, we enable models to predict labels not seen in the training phase. By domain adaptation, we improve a model\u27s performance on the target domain by mitigating its discrepancy from a labeled source model, without any target annotation. Finally, the generalization adversarial attack focuses on learning an adversarial camouflage that ideally would work in every possible scenario. Despite sharing the same transfer learning philosophy, each of the proposed topics poses a unique challenge requiring a unique solution. In each chapter, we introduce the problem as well as present our solution to the problem. We also discuss some other researchers\u27 approaches and compare our solution to theirs in the experiments

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

DPDnet: A Robust People Detector using Deep Learning with an Overhead Depth Camera

Author: Casillas-Perez David
Fuentes-Jimenez David
Losada-Gutierrez Cristina
Luna Carlos A.
Macias-Guarasa Javier
Martin-Lopez Roberto
Pizarro Daniel
Publication venue: 'Elsevier BV'
Publication date: 15/05/2020
Field of study

In this paper we propose a method based on deep learning that detects multiple people from a single overhead depth image with high reliability. Our neural network, called DPDnet, is based on two fully-convolutional encoder-decoder neural blocks based on residual layers. The Main Block takes a depth image as input and generates a pixel-wise confidence map, where each detected person in the image is represented by a Gaussian-like distribution. The refinement block combines the depth image and the output from the main block, to refine the confidence map. Both blocks are simultaneously trained end-to-end using depth images and head position labels. The experimental work shows that DPDNet outperforms state-of-the-art methods, with accuracies greater than 99% in three different publicly available datasets, without retraining not fine-tuning. In addition, the computational complexity of our proposal is independent of the number of people in the scene and runs in real time using conventional GPUs

arXiv.org e-Print Archive

e_Buah - Biblioteca Digital de la Universidad de Alcalá

GourmetNet: Food Segmentation Using Multi-Scale Waterfall Features With Spatial and Channel Attention

Author: Sharma Udit
Publication venue: RIT Scholar Works
Publication date: 01/11/2021
Field of study

Deep learning and Computer vision are extensively used to solve problems in wide range of domains from automotive and manufacturing to healthcare and surveillance. Research in deep learning for food images is mainly limited to food identification and detection. Food segmentation is an important problem as the first step for nutrition monitoring, food volume and calorie estimation. This research is intended to expand the horizons of deep learning and semantic segmentation by proposing a novel single-pass, end-to-end trainable network for food segmentation. Our novel architecture incorporates both channel attention and spatial attention information in an expanded multi-scale feature representation using the WASPv2 module. The refined features will be processed with the advanced multi-scale waterfall module that combines the benefits of cascade filtering and pyramid representations without requiring a separate decoder or postprocessing

RIT Scholar Works

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Author: Cheng Gong
Han Junwei
Yan Kebing
Yao Xiwen
Yuan Xiang
Zeng Qinghua
Publication venue
Publication date: 24/12/2022
Field of study

With the rise of deep convolutional neural networks, object detection has achieved prominent advances in past years. However, such prosperity could not camouflage the unsatisfactory situation of Small Object Detection (SOD), one of the notoriously challenging tasks in computer vision, owing to the poor visual appearance and noisy representation caused by the intrinsic structure of small targets. In addition, large-scale dataset for benchmarking small object detection methods remains a bottleneck. In this paper, we first conduct a thorough review of small object detection. Then, to catalyze the development of SOD, we construct two large-scale Small Object Detection dAtasets (SODA), SODA-D and SODA-A, which focus on the Driving and Aerial scenarios respectively. SODA-D includes 24828 high-quality traffic images and 278433 instances of nine categories. For SODA-A, we harvest 2513 high resolution aerial images and annotate 872069 instances over nine classes. The proposed datasets, as we know, are the first-ever attempt to large-scale benchmarks with a vast collection of exhaustively annotated instances tailored for multi-category SOD. Finally, we evaluate the performance of mainstream methods on SODA. We expect the released benchmarks could facilitate the development of SOD and spawn more breakthroughs in this field. Datasets and codes are available at: \url{https://shaunyuan22.github.io/SODA}

arXiv.org e-Print Archive

MSGNet: multi-source guidance network for fish segmentation in underwater videos

Author: Haiqing Li
Haiqing Li
Haiqing Li
Haiqing Li
Hong Yu
Hong Yu
Hong Yu
Hong Yu
Junfeng Wu
Junfeng Wu
Junfeng Wu
Junfeng Wu
Peng Zhang
Peng Zhang
Peng Zhang
Peng Zhang
Sixue Wei
Sixue Wei
Sixue Wei
Sixue Wei
Wan Tu
Wan Tu
Wan Tu
Wan Tu
Xin Zhang
Xin Zhang
Xin Zhang
Xin Zhang
Yuanshan Lin
Yuanshan Lin
Yuanshan Lin
Yuanshan Lin
Zongyi Yang
Zongyi Yang
Zongyi Yang
Zongyi Yang
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

Fish segmentation in underwater videos provides basic data for fish measurements, which is vital information that supports fish habitat monitoring and fishery resources survey. However, because of water turbidity and insufficient lighting, fish segmentation in underwater videos has low accuracy and poor robustness. Most previous work has utilized static fish appearance information while ignoring fish motion in underwater videos. Considering that motion contains more detail, this paper proposes a method that simultaneously combines appearance and motion information to guide fish segmentation in underwater videos. First, underwater videos are preprocessed to highlight fish in motion, and obtain high-quality underwater optical flow. Then, a multi-source guidance network (MSGNet) is presented to segment fish in complex underwater videos with degraded visual features. To enhance both fish appearance and motion information, a non-local-based multiple co-attention guidance module (M-CAGM) is applied in the encoder stage, in which the appearance and motion features from the intra-frame salient fish and the moving fish in video sequences are reciprocally enhanced. In addition, a feature adaptive fusion module (FAFM) is introduced in the decoder stage to avoid errors accumulated in the video sequences due to blurred fish or inaccurate optical flow. Experiments based on three publicly available datasets were designed to test the performance of the proposed model. The mean pixel accuracy (mPA) and mean intersection over union (mIoU) of MSGNet were 91.89% and 88.91% respectively with the mixed dataset. Compared with those of the advanced underwater fish segmentation and video object segmentation models, the mPA and mIoU of the proposed model significantly improved. The results showed that MSGNet achieves excellent segmentation performance in complex underwater videos and can provide an effective segmentation solution for fisheries resource assessment and ocean observation. The proposed model and code are exposed via Github1

Directory of Open Access Journals