776 research outputs found
Survey on video anomaly detection in dynamic scenes with moving cameras
The increasing popularity of compact and inexpensive cameras, e.g.~dash
cameras, body cameras, and cameras equipped on robots, has sparked a growing
interest in detecting anomalies within dynamic scenes recorded by moving
cameras. However, existing reviews primarily concentrate on Video Anomaly
Detection (VAD) methods assuming static cameras. The VAD literature with moving
cameras remains fragmented, lacking comprehensive reviews to date. To address
this gap, we endeavor to present the first comprehensive survey on Moving
Camera Video Anomaly Detection (MC-VAD). We delve into the research papers
related to MC-VAD, critically assessing their limitations and highlighting
associated challenges. Our exploration encompasses three application domains:
security, urban transportation, and marine environments, which in turn cover
six specific tasks. We compile an extensive list of 25 publicly-available
datasets spanning four distinct environments: underwater, water surface,
ground, and aerial. We summarize the types of anomalies these datasets
correspond to or contain, and present five main categories of approaches for
detecting such anomalies. Lastly, we identify future research directions and
discuss novel contributions that could advance the field of MC-VAD. With this
survey, we aim to offer a valuable reference for researchers and practitioners
striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie
AquaSAM: Underwater Image Foreground Segmentation
The Segment Anything Model (SAM) has revolutionized natural image
segmentation, nevertheless, its performance on underwater images is still
restricted. This work presents AquaSAM, the first attempt to extend the success
of SAM on underwater images with the purpose of creating a versatile method for
the segmentation of various underwater targets. To achieve this, we begin by
classifying and extracting various labels automatically in SUIM dataset.
Subsequently, we develop a straightforward fine-tuning method to adapt SAM to
general foreground underwater image segmentation. Through extensive experiments
involving eight segmentation tasks like human divers, we demonstrate that
AquaSAM outperforms the default SAM model especially at hard tasks like coral
reefs. AquaSAM achieves an average Dice Similarity Coefficient (DSC) of 7.13
(%) improvement and an average of 8.27 (%) on mIoU improvement in underwater
segmentation tasks
Sea-Surface Object Detection Based on Electro-Optical Sensors: A Review
Sea-surface object detection is critical for navigation safety of autonomous ships. Electrooptical (EO) sensors, such as video cameras, complement radar on board in detecting small obstacle
sea-surface objects. Traditionally, researchers have used horizon detection, background subtraction, and
foreground segmentation techniques to detect sea-surface objects. Recently, deep learning-based object
detection technologies have been gradually applied to sea-surface object detection. This article demonstrates a comprehensive overview of sea-surface object-detection approaches where the advantages
and drawbacks of each technique are compared, covering four essential aspects: EO sensors and image
types, traditional object-detection methods, deep learning methods, and maritime datasets collection. In
particular, sea-surface object detections based on deep learning methods are thoroughly analyzed and
compared with highly influential public datasets introduced as benchmarks to verify the effectiveness of
these approaches. The arti
WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmark for Autonomous Driving on Water Surfaces
Autonomous driving on water surfaces plays an essential role in executing
hazardous and time-consuming missions, such as maritime surveillance, survivors
rescue, environmental monitoring, hydrography mapping and waste cleaning. This
work presents WaterScenes, the first multi-task 4D radar-camera fusion dataset
for autonomous driving on water surfaces. Equipped with a 4D radar and a
monocular camera, our Unmanned Surface Vehicle (USV) proffers all-weather
solutions for discerning object-related information, including color, shape,
texture, range, velocity, azimuth, and elevation. Focusing on typical static
and dynamic objects on water surfaces, we label the camera images and radar
point clouds at pixel-level and point-level, respectively. In addition to basic
perception tasks, such as object detection, instance segmentation and semantic
segmentation, we also provide annotations for free-space segmentation and
waterline segmentation. Leveraging the multi-task and multi-modal data, we
conduct numerous experiments on the single modality of radar and camera, as
well as the fused modalities. Results demonstrate that 4D radar-camera fusion
can considerably enhance the robustness of perception on water surfaces,
especially in adverse lighting and weather conditions. WaterScenes dataset is
public on https://waterscenes.github.io
Knowledge-Driven Semantic Segmentation for Waterway Scene Perception
Semantic segmentation as one of the most popular scene perception techniques has been studied for autonomous vehicles. However, deep learning-based solutions rely on the volume and quality of data and knowledge from specific scene might not be incorporated. A novel knowledge-driven semantic segmentation method is proposed for waterway scene perception. Based on the knowledge that water is irregular and dynamically changing, a Life Time of Feature (LToF) detector is designed to distinguish water region from surrounding scene. Using a Bayesian framework, the detector as the likelihood function is combined with U-Net based semantic segmentation to achieve an optimized solution. Finally, two public datasets and typical semantic segmentation networks, FlowNet, DeepLab and DVSNet are selected to evaluate the proposed method. Also, the sensitivity of these methods and ours to dataset is discussed
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Semantic Segmentation of Fish and Underwater Environments Using Deep Convolutional Neural Networks and Learned Active Contours
The conservation of marine resources requires constant monitoring of the underwater environment by researchers. For this purpose, visual automated monitoring systems are of great interest, especially those that can describe the environment using semantic segmentation based on deep learning. Although they have been successfully used in several applications, such as biomedical ones, obtaining optimal results in underwater environments is still a challenge due to the heterogeneity of water and lighting conditions, and the scarcity of labeled datasets. Even more, the existing deep learning techniques oriented to semantic segmentation only provide low resolution results, lacking the enough spatial details for a high performance monitoring. To address these challenges, a combined loss function based on the active contour theory and level set methods is proposed to refine the spatial segmentation resolution and quality. To evaluate the method, a new underwater dataset with pixel annotations for three classes (fish, seafloor, and water) was created using images from publicly accessible datasets like SUIM, RockFish, and DeepFish. The performance of architectures of convolutional neural networks (CNNs), such as UNet and DeepLabV3+, trained with different loss functions (cross entropy, dice, and active contours) was compared, finding that the proposed combined loss function improved the segmentation results by around 3%, both in the metric Intercept Over Union (IoU) as in Hausdorff Distance (HD).2022-2
Autonomous temporal pseudo-labeling for fish detection
The first major step in training an object detection model to different classes from the available datasets is the gathering of meaningful and properly annotated data. This recurring task will determine the length of any project, and, more importantly, the quality of the resulting models. This obstacle is amplified when the data available for the new classes are scarce or incompatible, as in the case of fish detection in the open sea. This issue was tackled using a mixed and reversed approach: a network is initiated with a noisy dataset of the same species as our classes (fish), although in different scenarios and conditions (fish from Australian marine fauna), and we gathered the target footage (fish from Portuguese marine fauna; Atlantic Ocean) for the application without annotations. Using the temporal information of the detected objects and augmented techniques during later training, it was possible to generate highly accurate labels from our targeted footage. Furthermore, the data selection method retained the samples of each unique situation, filtering repetitive data, which would bias the training process. The obtained results validate the proposed method of automating the labeling processing, resorting directly to the final application as the source of training data. The presented method achieved a mean average precision of 93.11% on our own data, and 73.61% on unseen data, an increase of 24.65% and 25.53% over the baseline of the noisy dataset, respectively.info:eu-repo/semantics/publishedVersio
- …