4,010 research outputs found
Deep learning architectures for Computer Vision
Deep learning has become part of many state-of-the-art systems in multiple disciplines (specially in computer vision and speech processing). In this thesis Convolutional Neural Networks are used to solve the problem of recognizing people in images, both for verification and identification. Two different architectures, AlexNet and VGG19, both winners of the ILSVRC, have been fine-tuned and tested with four datasets: Labeled Faces in the Wild, FaceScrub, YouTubeFaces and Google UPC, a dataset generated at the UPC. Finally, with the features extracted from these fine-tuned networks, some verifications algorithms have been tested including Support Vector Machines, Joint Bayesian and Advanced Joint Bayesian formulation. The results of this work show that an Area Under the Receiver Operating Characteristic curve of 99.6% can be obtained, close to the state-of-the-art performance.El aprendizaje profundo se ha convertido en parte de muchos sistemas en el estado del arte de múltiples ámbitos (especialmente en visión por computador y procesamiento de voz). En esta tesis se utilizan las Redes Neuronales Convolucionales para resolver el problema de reconocer a personas en imágenes, tanto para verificación como para identificación. Dos arquitecturas diferentes, AlexNet y VGG19, ambas ganadores del ILSVRC, han sido afinadas y probadas con cuatro conjuntos de datos: Labeled Faces in the Wild, FaceScrub, YouTubeFaces y Google UPC, un conjunto generado en la UPC. Finalmente con las características extraídas de las redes afinadas, se han probado diferentes algoritmos de verificación, incluyendo Maquinas de Soporte Vectorial, Joint Bayesian y Advanced Joint Bayesian. Los resultados de este trabajo muestran que el Área Bajo la Curva de la Característica Operativa del Receptor puede llegar a ser del 99.6%, cercana al valor del estado del arte.L’aprenentatge profund s’ha convertit en una part importat de molts sistemes a l’estat de
l’art de múltiples àmbits (especialment de la visió per computador i el processament de
veu). A aquesta tesi s’utilitzen les Xarxes Neuronals Convolucionals per a resoldre el
problema de reconèixer persones a imatges, tant per verificació com per identificatió.
Dos arquitectures diferents, AlexNet i VGG19, les dues guanyadores del ILSVRC, han
sigut afinades i provades amb quatre bases de dades: Labeled Faces in the Wild,
FaceScrub, YouTubeFaces i Google UPC, un conjunt generat a la UPC.
Finalment, amb les característiques extretes de les xarxes afinades, s’han provat diferents
algoritmes de verificació, incloent Màquines de Suport Vectorial, Joint Bayesian i Advanced
Joint Bayesian. Els resultats d’aquest treball mostres que un Àrea Baix la Curva de la
Característica Operativa del Receptor por arribar a ser del 99.6%, propera al valor de l’estat
de l’art
Learning to See the Wood for the Trees: Deep Laser Localization in Urban and Natural Environments on a CPU
Localization in challenging, natural environments such as forests or
woodlands is an important capability for many applications from guiding a robot
navigating along a forest trail to monitoring vegetation growth with handheld
sensors. In this work we explore laser-based localization in both urban and
natural environments, which is suitable for online applications. We propose a
deep learning approach capable of learning meaningful descriptors directly from
3D point clouds by comparing triplets (anchor, positive and negative examples).
The approach learns a feature space representation for a set of segmented point
clouds that are matched between a current and previous observations. Our
learning method is tailored towards loop closure detection resulting in a small
model which can be deployed using only a CPU. The proposed learning method
would allow the full pipeline to run on robots with limited computational
payload such as drones, quadrupeds or UGVs.Comment: Accepted for publication at RA-L/ICRA 2019. More info:
https://ori.ox.ac.uk/esm-localizatio
LO-Net: Deep Real-time Lidar Odometry
We present a novel deep convolutional network pipeline, LO-Net, for real-time
lidar odometry estimation. Unlike most existing lidar odometry (LO) estimations
that go through individually designed feature selection, feature matching, and
pose estimation pipeline, LO-Net can be trained in an end-to-end manner. With a
new mask-weighted geometric constraint loss, LO-Net can effectively learn
feature representation for LO estimation, and can implicitly exploit the
sequential dependencies and dynamics in the data. We also design a scan-to-map
module, which uses the geometric and semantic information learned in LO-Net, to
improve the estimation accuracy. Experiments on benchmark datasets demonstrate
that LO-Net outperforms existing learning based approaches and has similar
accuracy with the state-of-the-art geometry-based approach, LOAM
- …