250 research outputs found
LiDAR-assisted Large-scale Privacy Protection in Street-view Cycloramas
Recently, privacy has a growing importance in several domains, especially in
street-view images. The conventional way to achieve this is to automatically
detect and blur sensitive information from these images. However, the
processing cost of blurring increases with the ever-growing resolution of
images. We propose a system that is cost-effective even after increasing the
resolution by a factor of 2.5. The new system utilizes depth data obtained from
LiDAR to significantly reduce the search space for detection, thereby reducing
the processing cost. Besides this, we test several detectors after reducing the
detection space and provide an alternative solution based on state-of-the-art
deep learning detectors to the existing HoG-SVM-Deep system that is faster and
has a higher performance.Comment: Accepted at Electronic Imaging 201
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
Bootstrapped CNNs for Building Segmentation on RGB-D Aerial Imagery
Detection of buildings and other objects from aerial images has various
applications in urban planning and map making. Automated building detection
from aerial imagery is a challenging task, as it is prone to varying lighting
conditions, shadows and occlusions. Convolutional Neural Networks (CNNs) are
robust against some of these variations, although they fail to distinguish easy
and difficult examples. We train a detection algorithm from RGB-D images to
obtain a segmented mask by using the CNN architecture DenseNet.First, we
improve the performance of the model by applying a statistical re-sampling
technique called Bootstrapping and demonstrate that more informative examples
are retained. Second, the proposed method outperforms the non-bootstrapped
version by utilizing only one-sixth of the original training data and it
obtains a precision-recall break-even of 95.10% on our aerial imagery dataset.Comment: Published at ISPRS Annals of the Photogrammetry, Remote Sensing and
Spatial Information Science
Homography Estimation in Complex Topological Scenes
Surveillance videos and images are used for a broad set of applications,
ranging from traffic analysis to crime detection. Extrinsic camera calibration
data is important for most analysis applications. However, security cameras are
susceptible to environmental conditions and small camera movements, resulting
in a need for an automated re-calibration method that can account for these
varying conditions. In this paper, we present an automated camera-calibration
process leveraging a dictionary-based approach that does not require prior
knowledge on any camera settings. The method consists of a custom
implementation of a Spatial Transformer Network (STN) and a novel topological
loss function. Experiments reveal that the proposed method improves the IoU
metric by up to 12% w.r.t. a state-of-the-art model across five synthetic
datasets and the World Cup 2014 dataset.Comment: Will be published in Intelligent Vehicle Symposium 202
Adversarial Loss for Semantic Segmentation of Aerial Imagery
Automatic building extraction from aerial imagery has several applications in
urban planning, disaster management, and change detection. In recent years,
several works have adopted deep convolutional neural networks (CNNs) for
building extraction, since they produce rich features that are invariant
against lighting conditions, shadows, etc. Although several advances have been
made, building extraction from aerial imagery still presents multiple
challenges. Most of the deep learning segmentation methods optimize the
per-pixel loss with respect to the ground truth without knowledge of the
context. This often leads to imperfect outputs that may lead to missing or
unrefined regions. In this work, we propose a novel loss function combining
both adversarial and cross-entropy losses that learn to understand both local
and global contexts for semantic segmentation. The newly proposed loss function
deployed on the DeepLab v3+ network obtains state-of-the-art results on the
Massachusetts buildings dataset. The loss function improves the structure and
refines the edges of buildings without requiring any of the commonly used
post-processing methods, such as Conditional Random Fields. We also perform
ablation studies to understand the impact of the adversarial loss. Finally, the
proposed method achieves a relaxed F1 score of 95.59% on the Massachusetts
buildings dataset compared to the previous best F1 of 94.88%.Comment: IEEE Symposium on Information Theory and Signal Processing in the
Benelux (May 2019
Density-Guided Label Smoothing for Temporal Localization of Driving Actions
Temporal localization of driving actions plays a crucial role in advanced
driver-assistance systems and naturalistic driving studies. However, this is a
challenging task due to strict requirements for robustness, reliability and
accurate localization. In this work, we focus on improving the overall
performance by efficiently utilizing video action recognition networks and
adapting these to the problem of action localization. To this end, we first
develop a density-guided label smoothing technique based on label probability
distributions to facilitate better learning from boundary video-segments that
typically include multiple labels. Second, we design a post-processing step to
efficiently fuse information from video-segments and multiple camera views into
scene-level predictions, which facilitates elimination of false positives. Our
methodology yields a competitive performance on the A2 test set of the
naturalistic driving action recognition track of the 2022 NVIDIA AI City
Challenge with an F1 score of 0.271
Dual Embedding Expansion for Vehicle Re-identification
Vehicle re-identification plays a crucial role in the management of
transportation infrastructure and traffic flow. However, this is a challenging
task due to the large view-point variations in appearance, environmental and
instance-related factors. Modern systems deploy CNNs to produce unique
representations from the images of each vehicle instance. Most work focuses on
leveraging new losses and network architectures to improve the descriptiveness
of these representations. In contrast, our work concentrates on re-ranking and
embedding expansion techniques. We propose an efficient approach for combining
the outputs of multiple models at various scales while exploiting tracklet and
neighbor information, called dual embedding expansion (DEx). Additionally, a
comparative study of several common image retrieval techniques is presented in
the context of vehicle re-ID. Our system yields competitive performance in the
2020 NVIDIA AI City Challenge with promising results. We demonstrate that DEx
when combined with other re-ranking techniques, can produce an even larger gain
without any additional attribute labels or manual supervision
Improving Catheter Segmentation & Localization in 3D Cardiac Ultrasound Using Direction-Fused FCN
Fast and accurate catheter detection in cardiac catheterization using
harmless 3D ultrasound (US) can improve the efficiency and outcome of the
intervention. However, the low image quality of US requires extra training for
sonographers to localize the catheter. In this paper, we propose a catheter
detection method based on a pre-trained VGG network, which exploits 3D
information through re-organized cross-sections to segment the catheter by a
shared fully convolutional network (FCN), which is called a Direction-Fused FCN
(DF-FCN). Based on the segmented image of DF-FCN, the catheter can be localized
by model fitting. Our experiments show that the proposed method can
successfully detect an ablation catheter in a challenging ex-vivo 3D US
dataset, which was collected on the porcine heart. Extensive analysis shows
that the proposed method achieves a Dice score of 57.7%, which offers at least
an 11.8 % improvement when compared to state-of-the-art instrument detection
methods. Due to the improved segmentation performance by the DF-FCN, the
catheter can be localized with an error of only 1.4 mm.Comment: ISBI 2019 accepte
- …