13 research outputs found
Incremental Learning Through Unsupervised Adaptation in Video Face Recognition
Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo]
Durante a última década, os métodos baseados en deep learning trouxeron un
salto significativo no rendemento dos sistemas de visión artificial. Unha das claves
neste éxito foi a creación de grandes conxuntos de datos perfectamente etiquetados
para usar durante o adestramento. En certa forma, as redes de deep learning
resumen esta enorme cantidade datos en prácticos vectores multidimensionais. Por
este motivo, cando as diferenzas entre os datos de adestramento e os adquiridos
durante o funcionamento dos sistemas (debido a factores como o contexto de adquisición)
son especialmente notorias, as redes de deep learning son susceptibles de
sufrir degradación no rendemento.
Mentres que a solución inmediata a este tipo de problemas sería a de recorrer a
unha recolección adicional de imaxes, co seu correspondente proceso de etiquetado,
esta dista moito de ser óptima. A gran cantidade de posibles variacións que presenta
o mundo visual converten rápido este enfoque nunha tarefa sen fin. Máis aínda cando
existen aplicacións específicas nas que esta acción é difícil, ou incluso imposible, de
realizar debido a problemas de custos ou de privacidade.
Esta tese propón abordar todos estes problemas usando a perspectiva da adaptación.
Así, a hipótese central consiste en asumir que é posible utilizar os datos non
etiquetados adquiridos durante o funcionamento para mellorar o rendemento que
obteríamos con sistemas de recoñecemento xerais. Para isto, e como proba de concepto,
o campo de estudo da tese restrinxiuse ao recoñecemento de caras. Esta é unha
aplicación paradigmática na cal o contexto de adquisición pode ser especialmente
relevante.
Este traballo comeza examinando as diferenzas intrínsecas entre algúns dos contextos
específicos nos que se pode necesitar o recoñecemento de caras e como estas
afectan ao rendemento. Desta maneira, comparamos distintas bases de datos (xunto
cos seus contextos) entre elas, usando algúns dos descritores de características máis
avanzados e así determinar a necesidade real de adaptación.
A partir desta punto, pasamos a presentar o método novo, que representa a principal
contribución da tese: o Dynamic Ensemble of SVM (De-SVM). Este método implementa
a capacidade de adaptación utilizando unha aprendizaxe incremental non
supervisada na que as súas propias predicións se usan como pseudo-etiquetas durante
as actualizacións (a estratexia de auto-adestramento). Os experimentos realizáronse
baixo condicións de vídeo-vixilancia, un exemplo paradigmático dun contexto moi
específico no que os procesos de etiquetado son particularmente complicados. As
ideas claves de De-SVM probáronse en diferentes sub-problemas de recoñecemento
de caras: a verificación de caras e recoñecemento de caras en conxunto pechado e en
conxunto aberto.
Os resultados acadados mostran un comportamento prometedor en termos de
adquisición de coñecemento sen supervisión así como robustez contra impostores.
Ademais, este rendemento é capaz de superar a outros métodos do estado da arte
que non posúen esta capacidade de adaptación.[Resumen]
Durante la última década, los métodos basados en deep learning trajeron un salto
significativo en el rendimiento de los sistemas de visión artificial. Una de las claves en
este éxito fue la creación de grandes conjuntos de datos perfectamente etiquetados
para usar durante el entrenamiento. En cierta forma, las redes de deep learning
resumen esta enorme cantidad datos en prácticos vectores multidimensionales. Por
este motivo, cuando las diferencias entre los datos de entrenamiento y los adquiridos
durante el funcionamiento de los sistemas (debido a factores como el contexto de
adquisición) son especialmente notorias, las redes de deep learning son susceptibles
de sufrir degradación en el rendimiento.
Mientras que la solución a este tipo de problemas es recurrir a una recolección
adicional de imágenes, con su correspondiente proceso de etiquetado, esta dista mucho
de ser óptima. La gran cantidad de posibles variaciones que presenta el mundo
visual convierten rápido este enfoque en una tarea sin fin. Más aún cuando existen
aplicaciones específicas en las que esta acción es difícil, o incluso imposible, de
realizar; debido a problemas de costes o de privacidad.
Esta tesis propone abordar todos estos problemas usando la perspectiva de la
adaptación. Así, la hipótesis central consiste en asumir que es posible utilizar los
datos no etiquetados adquiridos durante el funcionamiento para mejorar el rendimiento
que se obtendría con sistemas de reconocimiento generales. Para esto, y como
prueba de concepto, el campo de estudio de la tesis se restringió al reconocimiento
de caras. Esta es una aplicación paradigmática en la cual el contexto de adquisición
puede ser especialmente relevante.
Este trabajo comienza examinando las diferencias entre algunos de los contextos
específicos en los que se puede necesitar el reconocimiento de caras y así como
sus efectos en términos de rendimiento. De esta manera, comparamos distintas ba
ses de datos (y sus contextos) entre ellas, usando algunos de los descriptores de
características más avanzados para así determinar la necesidad real de adaptación.
A partir de este punto, pasamos a presentar el nuevo método, que representa la
principal contribución de la tesis: el Dynamic Ensemble of SVM (De- SVM). Este
método implementa la capacidad de adaptación utilizando un aprendizaje incremental
no supervisado en la que sus propias predicciones se usan cómo pseudo-etiquetas
durante las actualizaciones (la estrategia de auto-entrenamiento). Los experimentos
se realizaron bajo condiciones de vídeo-vigilancia, un ejemplo paradigmático de
contexto muy específico en el que los procesos de etiquetado son particularmente
complicados. Las ideas claves de De- SVM se probaron en varios sub-problemas
del reconocimiento de caras: la verificación de caras y reconocimiento de caras de
conjunto cerrado y conjunto abierto.
Los resultados muestran un comportamiento prometedor en términos de adquisición
de conocimiento así como de robustez contra impostores. Además, este rendimiento
es capaz de superar a otros métodos del estado del arte que no poseen esta
capacidad de adaptación.[Abstract]
In the last decade, deep learning has brought an unprecedented leap forward for
computer vision general classification problems. One of the keys to this success is the
availability of extensive and wealthy annotated datasets to use as training samples.
In some sense, a deep learning network summarises this enormous amount of data
into handy vector representations. For this reason, when the differences between
training datasets and the data acquired during operation (due to factors such as
the acquisition context) are highly marked, end-to-end deep learning methods are
susceptible to suffer performance degradation.
While the immediate solution to mitigate these problems is to resort to an additional
data collection and its correspondent annotation procedure, this solution
is far from optimal. The immeasurable possible variations of the visual world can
convert the collection and annotation of data into an endless task. Even more when
there are specific applications in which this additional action is difficult or simply not
possible to perform due to, among other reasons, cost-related problems or privacy
issues.
This Thesis proposes to tackle all these problems from the adaptation point of
view. Thus, the central hypothesis assumes that it is possible to use operational
data with almost no supervision to improve the performance we would achieve with
general-purpose recognition systems. To do so, and as a proof-of-concept, the field
of study of this Thesis is restricted to face recognition, a paradigmatic application
in which the context of acquisition can be especially relevant.
This work begins by examining the intrinsic differences between some of the
face recognition contexts and how they directly affect performance. To do it, we
compare different datasets, and their contexts, against each other using some of the
most advanced feature representations available to determine the actual need for
adaptation.
From this point, we move to present the novel method, representing the central
contribution of the Thesis: the Dynamic Ensembles of SVM (De-SVM). This
method implements the adaptation capabilities by performing unsupervised incremental
learning using its own predictions as pseudo-labels for the update decision
(the self-training strategy). Experiments are performed under video surveillance
conditions, a paradigmatic example of a very specific context in which labelling
processes are particularly complicated. The core ideas of De-SVM are tested in
different face recognition sub-problems: face verification and, the more complex,
general closed- and open-set face recognition.
In terms of the achieved results, experiments have shown a promising behaviour
in terms of both unsupervised knowledge acquisition and robustness against impostors,
surpassing the performances achieved by state-of-the-art non-adaptive methods.Funding and Technical Resources For the successful development of this Thesis, it was necessary to rely on series of indispensable means included in the following list:
• Working material, human and financial support primarily by the CITIC and
the Computer Architecture Group of the University of A Coruña and CiTIUS
of University of Santiago de Compostela, along with a PhD grant funded by
Xunta the Galicia and the European Social Fund.
• Access to bibliographical material through the library of the University of A
Coruña.
• Additional funding through the following research projects:
State funding by the Ministry of Economy and Competitiveness of Spain
(project TIN2017-90135-R MINECO, FEDER)
Evaluation and Understandability of Face Image Quality Assessment
Face image quality assessment (FIQA) has been an area of interest to researchers as a way to improve the face recognition accuracy. By filtering out the low quality images we can reduce various difficulties faced in unconstrained face recognition, such as, failure in face or facial landmark detection or low presence of useful facial information. In last decade or so, researchers have proposed different methods to assess the face image quality, spanning from fusion of quality measures to using learning based methods. Different approaches have their own strength and weaknesses. But, it is hard to perform a comparative assessment of these methods without a database containing wide variety of face quality, a suitable training protocol that can efficiently utilize this large-scale dataset. In this thesis we focus on developing an evaluation platfrom using a large scale face database containing wide ranging face image quality and try to deconstruct the reason behind the predicted scores of learning based face image quality assessment methods. Contributions of this thesis is two-fold. Firstly, (i) a carefully crafted large scale database dedicated entirely to face image quality assessment has been proposed; (ii) a learning to rank based large-scale training protocol is devel- oped. Finally, (iii) a comprehensive study of 15 face image quality assessment methods using 12 different feature types, and relative ranking based label generation schemes, is performed. Evalua- tion results show various insights about the assessment methods which indicate the significance of the proposed database and the training protocol. Secondly, we have seen that in last few years, researchers have tried various learning based approaches to assess the face image quality. Most of these methods offer either a quality bin or a score summary as a measure of the biometric quality of the face image. But, to the best of our knowledge, so far there has not been any investigation on what are the explainable reasons behind the predicted scores. In this thesis, we propose a method to provide a clear and concise understanding of the predicted quality score of a learning based face image quality assessment. It is believed that this approach can be integrated into the FBI’s understandable template and can help in improving the image acquisition process by providing information on what quality factors need to be addressed
A Survey of Face Recognition
Recent years witnessed the breakthrough of face recognition with deep
convolutional neural networks. Dozens of papers in the field of FR are
published every year. Some of them were applied in the industrial community and
played an important role in human life such as device unlock, mobile payment,
and so on. This paper provides an introduction to face recognition, including
its history, pipeline, algorithms based on conventional manually designed
features or deep learning, mainstream training, evaluation datasets, and
related applications. We have analyzed and compared state-of-the-art works as
many as possible, and also carefully designed a set of experiments to find the
effect of backbone size and data distribution. This survey is a material of the
tutorial named The Practical Face Recognition Technology in the Industrial
World in the FG2023
Long Range Automated Persistent Surveillance
This dissertation addresses long range automated persistent surveillance with focus on three topics: sensor planning, size preserving tracking, and high magnification imaging.
field of view should be reserved so that camera handoff can be executed successfully before the object of interest becomes unidentifiable or untraceable. We design a sensor planning algorithm that not only maximizes coverage but also ensures uniform and sufficient overlapped camera’s field of view for an optimal handoff success rate. This algorithm works for environments with multiple dynamic targets using different types of cameras. Significantly improved handoff success rates are illustrated via experiments using floor plans of various scales.
Size preserving tracking automatically adjusts the camera’s zoom for a consistent view of the object of interest. Target scale estimation is carried out based on the paraperspective projection model which compensates for the center offset and considers system latency and tracking errors. A computationally efficient foreground segmentation strategy, 3D affine shapes, is proposed. The 3D affine shapes feature direct and real-time implementation and improved flexibility in accommodating the target’s 3D motion, including off-plane rotations. The effectiveness of the scale estimation and foreground segmentation algorithms is validated via both offline and real-time tracking of pedestrians at various resolution levels.
Face image quality assessment and enhancement compensate for the performance degradations in face recognition rates caused by high system magnifications and long observation distances. A class of adaptive sharpness measures is proposed to evaluate and predict this degradation. A wavelet based enhancement algorithm with automated frame selection is developed and proves efficient by a considerably elevated face recognition rate for severely blurred long range face images
Enhancing person annotation for personal photo management using content and context based technologies
Rapid technological growth and the decreasing cost of photo capture means that we are all taking more digital photographs than ever before. However, lack of technology for automatically organising personal photo archives has resulted in many users left with poorly annotated photos, causing them great frustration when such photo collections are to be browsed or searched at a later time. As a result, there has recently been significant research interest in technologies for supporting effective annotation.
This thesis addresses an important sub-problem of the broad annotation problem, namely "person annotation" associated with personal digital photo management. Solutions to this problem are provided using content analysis tools in combination with context data within the experimental photo management framework, called “MediAssist”. Readily available image metadata, such as location and date/time, are captured from digital cameras with in-built GPS functionality, and thus provide knowledge about when and where the photos were taken. Such information is then used to identify the "real-world" events corresponding to certain activities in the photo capture process. The
problem of enabling effective person annotation is formulated in such a way that both "within-event" and "cross-event" relationships of persons' appearances are captured.
The research reported in the thesis is built upon a firm foundation of content-based analysis technologies, namely face detection, face recognition, and body-patch matching together with data fusion.
Two annotation models are investigated in this thesis, namely progressive and non-progressive. The effectiveness of each model is evaluated against varying proportions of
initial annotation, and the type of initial annotation based on individual and combined face, body-patch and person-context information sources. The results reported in the thesis strongly validate the use of multiple information sources for person annotation whilst
emphasising the advantage of event-based photo analysis in real-life photo management systems
Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring
Artificially intelligent perception is increasingly present in the lives of
every one of us. Vehicles are no exception, (...) In the near future, pattern
recognition will have an even stronger role in vehicles, as self-driving cars
will require automated ways to understand what is happening around (and within)
them and act accordingly. (...) This doctoral work focused on advancing
in-vehicle sensing through the research of novel computer vision and pattern
recognition methodologies for both biometrics and wellbeing monitoring. The
main focus has been on electrocardiogram (ECG) biometrics, a trait well-known
for its potential for seamless driver monitoring. Major efforts were devoted to
achieving improved performance in identification and identity verification in
off-the-person scenarios, well-known for increased noise and variability. Here,
end-to-end deep learning ECG biometric solutions were proposed and important
topics were addressed such as cross-database and long-term performance,
waveform relevance through explainability, and interlead conversion. Face
biometrics, a natural complement to the ECG in seamless unconstrained
scenarios, was also studied in this work. The open challenges of masked face
recognition and interpretability in biometrics were tackled in an effort to
evolve towards algorithms that are more transparent, trustworthy, and robust to
significant occlusions. Within the topic of wellbeing monitoring, improved
solutions to multimodal emotion recognition in groups of people and
activity/violence recognition in in-vehicle scenarios were proposed. At last,
we also proposed a novel way to learn template security within end-to-end
models, dismissing additional separate encryption processes, and a
self-supervised learning approach tailored to sequential data, in order to
ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022
to the University of Port
Attention Restraint, Working Memory Capacity, and Mind Wandering: Do Emotional Valence or Intentionality Matter?
Attention restraint appears to mediate the relationship between working memory capacity (WMC) and mind wandering (Kane et al., 2016). Prior work has identifed two dimensions of mind wandering—emotional valence and intentionality. However, less is known about how WMC and attention restraint correlate with these dimensions. Te current study examined the relationship between WMC, attention restraint, and mind wandering by emotional valence and intentionality. A confrmatory factor analysis demonstrated that WMC and attention restraint were strongly correlated, but only attention restraint was related to overall mind wandering, consistent with prior fndings. However, when examining the emotional valence of mind wandering, attention restraint and WMC were related to negatively and positively valenced, but not neutral, mind wandering. Attention restraint was also related to intentional but not unintentional mind wandering. Tese results suggest that WMC and attention restraint predict some, but not all, types of mind wandering