1,090 research outputs found
Advances in Object and Activity Detection in Remote Sensing Imagery
The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms
RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario Understanding
Human drivers naturally reason about interactions between road users to
understand and safely navigate through traffic. Thus, developing autonomous
vehicles necessitates the ability to mimic such knowledge and model
interactions between road users to understand and navigate unpredictable,
dynamic environments. However, since real-world scenarios often differ from
training datasets, effectively modeling the behavior of various road users in
an environment remains a significant research challenge. This reality
necessitates models that generalize to a broad range of domains and explicitly
model interactions between road users and the environment to improve scenario
understanding. Graph learning methods address this problem by modeling
interactions using graph representations of scenarios. However, existing
methods cannot effectively transfer knowledge gained from the training domain
to real-world scenarios. This constraint is caused by the domain-specific rules
used for graph extraction that can vary in effectiveness across domains,
limiting generalization ability. To address these limitations, we propose
RoadScene2Graph (RS2G): a data-driven graph extraction and modeling approach
that learns to extract the best graph representation of a road scene for
solving autonomous scene understanding tasks. We show that RS2G enables better
performance at subjective risk assessment than rule-based graph extraction
methods and deep-learning-based models. RS2G also improves generalization and
Sim2Real transfer learning, which denotes the ability to transfer knowledge
gained from simulation datasets to unseen real-world scenarios. We also present
ablation studies showing how RS2G produces a more useful graph representation
for downstream classifiers. Finally, we show how RS2G can identify the relative
importance of rule-based graph edges and enables intelligent graph sparsity
tuning
Multimodal perception for autonomous driving
Mención Internacional en el tÃtulo de doctorAutonomous driving is set to play an important role among intelligent
transportation systems in the coming decades. The advantages
of its large-scale implementation –reduced accidents, shorter commuting
times, or higher fuel efficiency– have made its development a priority
for academia and industry. However, there is still a long way to
go to achieve full self-driving vehicles, capable of dealing with any
scenario without human intervention. To this end, advances in control,
navigation and, especially, environment perception technologies
are yet required. In particular, the detection of other road users that
may interfere with the vehicle’s trajectory is a key element, since it
allows to model the current traffic situation and, thus, to make decisions
accordingly.
The objective of this thesis is to provide solutions to some of
the main challenges of on-board perception systems, such as extrinsic
calibration of sensors, object detection, and deployment on
real platforms. First, a calibration method for obtaining the relative
transformation between pairs of sensors is introduced, eliminating
the complex manual adjustment of these parameters. The algorithm
makes use of an original calibration pattern and supports LiDARs,
and monocular and stereo cameras. Second, different deep learning
models for 3D object detection using LiDAR data in its bird’s eye
view projection are presented. Through a novel encoding, the use
of architectures tailored to image detection is proposed to process
the 3D information of point clouds in real time. Furthermore, the
effectiveness of using this projection together with image features is
analyzed. Finally, a method to mitigate the accuracy drop of LiDARbased
detection networks when deployed in ad-hoc configurations is
introduced. For this purpose, the simulation of virtual signals mimicking
the specifications of the desired real device is used to generate
new annotated datasets that can be used to train the models.
The performance of the proposed methods is evaluated against
other existing alternatives using reference benchmarks in the field of
computer vision (KITTI and nuScenes) and through experiments in
open traffic with an automated vehicle. The results obtained demonstrate
the relevance of the presented work and its suitability for commercial
use.La conducción autónoma está llamada a jugar un papel importante en
los sistemas inteligentes de transporte de las próximas décadas. Las
ventajas de su implementación a larga escala –disminución de accidentes,
reducción del tiempo de trayecto, u optimización del consumo–
han convertido su desarrollo en una prioridad para la academia y
la industria. Sin embargo, todavÃa hay un largo camino por delante
hasta alcanzar una automatización total, capaz de enfrentarse a cualquier
escenario sin intervención humana. Para ello, aún se requieren
avances en las tecnologÃas de control, navegación y, especialmente,
percepción del entorno. Concretamente, la detección de otros usuarios
de la carretera que puedan interferir en la trayectoria del vehÃculo
es una pieza fundamental para conseguirlo, puesto que permite modelar
el estado actual del tráfico y tomar decisiones en consecuencia.
El objetivo de esta tesis es aportar soluciones a algunos de los
principales retos de los sistemas de percepción embarcados, como
la calibración extrÃnseca de los sensores, la detección de objetos, y su
despliegue en plataformas reales. En primer lugar, se introduce un
método para la obtención de la transformación relativa entre pares
de sensores, eliminando el complejo ajuste manual de estos parámetros.
El algoritmo hace uso de un patrón de calibración propio y da
soporte a cámaras monoculares, estéreo, y LiDAR. En segundo lugar,
se presentan diferentes modelos de aprendizaje profundo para la detección
de objectos en 3D utilizando datos de escáneres LiDAR en su
proyección en vista de pájaro. A través de una nueva codificación, se
propone la utilización de arquitecturas de detección en imagen para
procesar en tiempo real la información tridimensional de las nubes
de puntos. Además, se analiza la efectividad del uso de esta proyección
junto con caracterÃsticas procedentes de imágenes. Por último,
se introduce un método para mitigar la pérdida de precisión de las
redes de detección basadas en LiDAR cuando son desplegadas en
configuraciones ad-hoc. Para ello, se plantea la simulación de señales
virtuales con las caracterÃsticas del modelo real que se quiere utilizar,
generando asà nuevos conjuntos anotados para entrenar los modelos.
El rendimiento de los métodos propuestos es evaluado frente a
otras alternativas existentes haciendo uso de bases de datos de referencia
en el campo de la visión por computador (KITTI y nuScenes),
y mediante experimentos en tráfico abierto empleando un vehÃculo
automatizado. Los resultados obtenidos demuestran la relevancia de
los trabajos presentados y su viabilidad para un uso comercial.Programa de Doctorado en IngenierÃa Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Jesús GarcÃa Herrero.- Secretario: Ignacio Parra Alonso.- Vocal: Gustavo Adolfo Peláez Coronad
Anomaly Detection in Autonomous Driving: A Survey
Nowadays, there are outstanding strides towards a future with autonomous
vehicles on our roads. While the perception of autonomous vehicles performs
well under closed-set conditions, they still struggle to handle the unexpected.
This survey provides an extensive overview of anomaly detection techniques
based on camera, lidar, radar, multimodal and abstract object level data. We
provide a systematization including detection approach, corner case level,
ability for an online application, and further attributes. We outline the
state-of-the-art and point out current research gaps.Comment: Daniel Bogdoll and Maximilian Nitsche contributed equally. Accepted
for publication at CVPR 2022 WAD worksho
A Dual Sensor Computational Camera for High Quality Dark Videography
Videos captured under low light conditions suffer from severe noise. A
variety of efforts have been devoted to image/video noise suppression and made
large progress. However, in extremely dark scenarios, extensive photon
starvation would hamper precise noise modeling. Instead, developing an imaging
system collecting more photons is a more effective way for high-quality video
capture under low illuminations. In this paper, we propose to build a
dual-sensor camera to additionally collect the photons in NIR wavelength, and
make use of the correlation between RGB and near-infrared (NIR) spectrum to
perform high-quality reconstruction from noisy dark video pairs. In hardware,
we build a compact dual-sensor camera capturing RGB and NIR videos
simultaneously. Computationally, we propose a dual-channel multi-frame
attention network (DCMAN) utilizing spatial-temporal-spectral priors to
reconstruct the low-light RGB and NIR videos. In addition, we build a
high-quality paired RGB and NIR video dataset, based on which the approach can
be applied to different sensors easily by training the DCMAN model with
simulated noisy input following a physical-process-based CMOS noise model. Both
experiments on synthetic and real videos validate the performance of this
compact dual-sensor camera design and the corresponding reconstruction
algorithm in dark videography
Towards a Common Software/Hardware Methodology for Future Advanced Driver Assistance Systems
The European research project DESERVE (DEvelopment platform for Safe and Efficient dRiVE, 2012-2015) had the aim of designing and developing a platform tool to cope with the continuously increasing complexity and the simultaneous need to reduce cost for future embedded Advanced Driver Assistance Systems (ADAS). For this purpose, the DESERVE platform profits from cross-domain software reuse, standardization of automotive software component interfaces, and easy but safety-compliant integration of heterogeneous modules. This enables the development of a new generation of ADAS applications, which challengingly combine different functions, sensors, actuators, hardware platforms, and Human Machine Interfaces (HMI). This book presents the different results of the DESERVE project concerning the ADAS development platform, test case functions, and validation and evaluation of different approaches. The reader is invited to substantiate the content of this book with the deliverables published during the DESERVE project. Technical topics discussed in this book include:Modern ADAS development platforms;Design space exploration;Driving modelling;Video-based and Radar-based ADAS functions;HMI for ADAS;Vehicle-hardware-in-the-loop validation system
- …