29 research outputs found
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
Very High Resolution (VHR) Satellite Imagery: Processing and Applications
Recently, growing interest in the use of remote sensing imagery has appeared to provide synoptic maps of water quality parameters in coastal and inner water ecosystems;, monitoring of complex land ecosystems for biodiversity conservation; precision agriculture for the management of soils, crops, and pests; urban planning; disaster monitoring, etc. However, for these maps to achieve their full potential, it is important to engage in periodic monitoring and analysis of multi-temporal changes. In this context, very high resolution (VHR) satellite-based optical, infrared, and radar imaging instruments provide reliable information to implement spatially-based conservation actions. Moreover, they enable observations of parameters of our environment at greater broader spatial and finer temporal scales than those allowed through field observation alone. In this sense, recent very high resolution satellite technologies and image processing algorithms present the opportunity to develop quantitative techniques that have the potential to improve upon traditional techniques in terms of cost, mapping fidelity, and objectivity. Typical applications include multi-temporal classification, recognition and tracking of specific patterns, multisensor data fusion, analysis of land/marine ecosystem processes and environment monitoring, etc. This book aims to collect new developments, methodologies, and applications of very high resolution satellite data for remote sensing. The works selected provide to the research community the most recent advances on all aspects of VHR satellite remote sensing
Vulnerable road users and connected autonomous vehicles interaction: a survey
There is a group of users within the vehicular traffic ecosystem known as Vulnerable Road Users (VRUs). VRUs include pedestrians, cyclists, motorcyclists, among others. On the other hand, connected autonomous vehicles (CAVs) are a set of technologies that combines, on the one hand, communication technologies to stay always ubiquitous connected, and on the other hand, automated technologies to assist or replace the human driver during the driving process. Autonomous vehicles are being visualized as a viable alternative to solve road accidents providing a general safe environment for all the users on the road specifically to the most vulnerable. One of the problems facing autonomous vehicles is to generate mechanisms that facilitate their integration not only within the mobility environment, but also into the road society in a safe and efficient way. In this paper, we analyze and discuss how this integration can take place, reviewing the work that has been developed in recent years in each of the stages of the vehicle-human interaction, analyzing the challenges of vulnerable users and proposing solutions that contribute to solving these challenges.This work was partially funded by the Ministry of Economy, Industry, and Competitiveness
of Spain under Grant: Supervision of drone fleet and optimization of commercial operations flight
plans, PID2020-116377RB-C21.Peer ReviewedPostprint (published version
Geo-rectification and cloud-cover correction of multi-temporal Earth observation imagery
Over the past decades, improvements in remote sensing technology have led to mass proliferation of aerial imagery. This, in turn, opened vast new possibilities relating to land cover classification, cartography, and so forth.
As applications in these fields became increasingly more complex, the amount of data required also rose accordingly and so, to satisfy these new needs, automated systems had to be developed. Geometric distortions in raw imagery must be rectified, otherwise the high accuracy requirements of the newest applications will not be attained.
This dissertation proposes an automated solution for the pre-stages of multi-spectral satellite imagery classification, focusing on Fast Fourier Shift theorem based geo-rectification and multi-temporal cloud-cover correction.
By automatizing the first stages of image processing, automatic classifiers can take advantage of a larger supply of image data, eventually allowing for the creation of semi-real-time mapping applications
QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios
The concerns about individuals security have justified the increasing number of surveillance
cameras deployed both in private and public spaces. However, contrary to popular belief,
these devices are in most cases used solely for recording, instead of feeding intelligent analysis
processes capable of extracting information about the observed individuals. Thus, even though
video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant
details about the subjects that took part in a crime depends on the manual inspection
of recordings. As such, the current goal of the research community is the development of
automated surveillance systems capable of monitoring and identifying subjects in surveillance
scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric
recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at
designing a visual surveillance system capable of acquiring biometric data at a distance (e.g.,
face, iris or gait) without requiring human intervention in the process, as well as devising biometric
recognition methods robust to the degradation factors resulting from the unconstrained
acquisition process.
Regarding the first goal, the analysis of the data acquired by typical surveillance systems
shows that large acquisition distances significantly decrease the resolution of biometric samples,
and thus their discriminability is not sufficient for recognition purposes. In the literature,
diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring
high-resolution imagery at a distance, particularly when using a master-slave configuration. In
the master-slave configuration, the video acquired by a typical surveillance camera is analyzed
for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged
at high-resolution by the PTZ camera. Several methods have already shown that this configuration
can be used for acquiring biometric data at a distance. Nevertheless, these methods
failed at providing effective solutions to the typical challenges of this strategy, restraining its
use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development
of a biometric data acquisition system based on the cooperation of a PTZ camera
with a typical surveillance camera. The first proposal is a camera calibration method capable
of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ
camera. The second proposal is a camera scheduling method for determining - in real-time -
the sequence of acquisitions that maximizes the number of different targets obtained, while
minimizing the cumulative transition time. In order to achieve the first goal of this thesis,
both methods were combined with state-of-the-art approaches of the human monitoring field
to develop a fully automated surveillance capable of acquiring biometric data at a distance and
without human cooperation, designated as QUIS-CAMPI system.
The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis
of the performance of the state-of-the-art biometric recognition approaches shows that these
approaches attain almost ideal recognition rates in unconstrained data. However, this performance
is incongruous with the recognition rates observed in surveillance scenarios. Taking into
account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising
biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a
distance ranging from 5 to 40 meters and without human intervention in the acquisition process.
This set allows to objectively assess the performance of state-of-the-art biometric recognition
methods in data that truly encompass the covariates of surveillance scenarios. As such, this set
was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained
by the nine methods specially designed for this competition. In addition, the data acquired by
the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the
development of methods robust to the covariates of surveillance scenarios. The first proposal
regards a method for detecting corrupted features in biometric signatures inferred by a redundancy
analysis algorithm. The second proposal is a caricature-based face recognition approach
capable of enhancing the recognition performance by automatically generating a caricature
from a 2D photo. The experimental evaluation of these methods shows that both approaches
contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivĂduos tem justificado o crescimento
do nĂşmero de câmaras de vĂdeo-vigilância instaladas tanto em espaços privados como pĂşblicos.
Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos
casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente
capaz de inferir em tempo real informações sobre os indivĂduos observados. Assim, apesar de a
vĂdeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda
confinado Ă disponibilização de vĂdeos que tĂŞm que ser manualmente inspecionados para extrair
informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal
desafio da comunidade cientĂfica Ă© o desenvolvimento de sistemas automatizados capazes de
monitorizar e identificar indivĂduos em ambientes de vĂdeo-vigilância.
Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento
biomĂ©trico aos ambientes de vĂdeo-vigilância. De forma mais especifica, pretende-se
1) conceber um sistema de vĂdeo-vigilância que consiga adquirir dados biomĂ©tricos a longas distâncias
(e.g., imagens da cara, Ăris, ou vĂdeos do tipo de passo) sem requerer a cooperação dos
indivĂduos no processo; e 2) desenvolver mĂ©todos de reconhecimento biomĂ©trico robustos aos
fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas.
No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas tĂpicos
de vĂdeo-vigilância mostra que, devido Ă distância de captura, os traços biomĂ©tricos amostrados
não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis.
Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir
imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave.
Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse
(e.g. carros, pessoas) a partir do vĂdeo adquirido por uma câmara de vĂdeo-vigilância
e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos
métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos
à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados
com esta estratĂ©gia, impedindo assim o seu uso em ambientes de vĂdeo-vigilância. Deste modo,
esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de
vĂdeo-vigilância usando uma câmara PTZ assistida por uma câmara tĂpica de vĂdeo-vigilância. O
primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara
master para o ângulo da câmara PTZ (slave) sem o auxĂlio de outros dispositivos Ăłticos. O
segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela
câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações
que maximiza o nĂşmero de diferentes sujeitos observados e simultaneamente minimiza o
tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os
dois métodos propostos foram combinados com os avanços alcançados na área da monitorização
de humanos para assim desenvolver o primeiro sistema de vĂdeo-vigilância completamente automatizado
e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação
dos indivĂduos no processo, designado por sistema QUIS-CAMPI.
O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada
com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento
biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento
quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento
maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho nĂŁo Ă© corroborado pelos resultados observados em ambientes de vĂdeo-vigilância, o que sugere que os conjuntos
de dados atuais nĂŁo contĂŞm verdadeiramente os fatores de degradação tĂpicos dos ambientes de
vĂdeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biomĂ©tricos atuais,
esta tese introduz um novo conjunto de dados biomĂ©tricos (imagens da face e vĂdeos do tipo de
passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação
dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho
dos mĂ©todos do estado-da-arte no reconhecimento de indivĂduos em imagens/vĂdeos
capturados num ambiente real de vĂdeo-vigilância. Como tal, este conjunto foi utilizado para
promover a primeira competição de reconhecimento biométrico em ambientes não controlados.
Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9
métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos
pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para
aumentar a robustez aos fatores de degradação observados em ambientes de vĂdeo-vigilância. O
primeiro Ă© um mĂ©todo para detetar caracterĂsticas corruptas em assinaturas biomĂ©tricas atravĂ©s
da análise da redundância entre subconjuntos de caracterĂsticas. O segundo Ă© um mĂ©todo de
reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma Ăşnica
foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir
as taxas de erro em dados adquiridos de forma nĂŁo controlada
Advances in semantic-guided and feedback-based approaches for video analysis
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, septiembre 201
Texture and Colour in Image Analysis
Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews
Multimedia
The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications