84 research outputs found
Face Hallucination via Deep Neural Networks.
We firstly address aligned low-resolution (LR) face images (i.e. 16X16 pixels) by designing a discriminative generative network, named URDGN. URDGN is composed of two networks: a generative model and a discriminative model.
We introduce a pixel-wise L2 regularization term to the generative model and exploit the feedback of the discriminative network to make the upsampled face images more similar to real ones.
We present an end-to-end transformative discriminative neural network (TDN) devised for super-resolving unaligned tiny face images. TDN embeds spatial transformation layers to enforce local receptive fields to line-up with similar spatial supports. To upsample noisy unaligned LR face images, we propose decoder-encoder-decoder networks. A transformative discriminative decoder network is employed to upsample and denoise LR inputs simultaneously. Then we project the intermediate HR faces to aligned and noise-free LR faces by a transformative encoder network. Finally, high-quality hallucinated HR images are generated by our second decoder. Furthermore, we present an end-to-end multiscale transformative discriminative neural network (MTDN) to super-resolve unaligned LR face images of different resolutions in a unified framework.
We propose a method that explicitly incorporates structural information of faces into the face super-resolution process by using a multi-task convolutional neural network (CNN). Our method not only uses low-level information (i.e. intensity similarity), but also middle-level information (i.e. face structure) to further explore spatial constraints of facial components from LR inputs images.
We demonstrate that supplementing residual images or feature maps with additional facial attribute information can significantly reduce the ambiguity in face super-resolution. To explore this idea, we develop an attribute-embedded upsampling network. In this manner, our method is able to super-resolve LR faces by a large upscaling factor while reducing the uncertainty of one-to-many mappings remarkably.
We further push the boundaries of hallucinating a tiny, non-frontal face image to understand how much of this is possible by leveraging the availability of large datasets and deep networks. To this end, we introduce a novel Transformative Adversarial Neural Network (TANN) to jointly frontalize very LR out-of-plane rotated face images (including profile views) and aggressively super-resolve them by 8X, regardless of their original poses and without using any 3D information. Besides recovering an HR face images from an LR version, this thesis also addresses the task of restoring realistic faces from stylized portrait images, which can also be regarded as face hallucination
Semantic Foreground Inpainting from Weak Supervision
Semantic scene understanding is an essential task for self-driving vehicles
and mobile robots. In our work, we aim to estimate a semantic segmentation map,
in which the foreground objects are removed and semantically inpainted with
background classes, from a single RGB image. This semantic foreground
inpainting task is performed by a single-stage convolutional neural network
(CNN) that contains our novel max-pooling as inpainting (MPI) module, which is
trained with weak supervision, i.e., it does not require manual background
annotations for the foreground regions to be inpainted. Our approach is
inherently more efficient than the previous two-stage state-of-the-art method,
and outperforms it by a margin of 3% IoU for the inpainted foreground regions
on Cityscapes. The performance margin increases to 6% IoU, when tested on the
unseen KITTI dataset. The code and the manually annotated datasets for testing
are shared with the research community at
https://github.com/Chenyang-Lu/semantic-foreground-inpainting.Comment: RA-L and ICRA'2
Lidar-based scene understanding for autonomous driving using deep learning
With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions.
In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground.
More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving
obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducción autónoma serÃa una solución viable para mejorar la seguridad en nuestras carreteras. Además la conducción autónoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestión del tráfico, la contaminación y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancÃas. Aunque algunos avances, como el control de crucero adaptativo, la detección de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehÃculos comerciales en la forma de los Sistemas Avanzados de Asistencia a la Conducción (ADAS), la tecnologÃa aún no ha alcanzado el suficiente grado de madurez. Se necesita una comprensión completa de la escena para que los vehÃculos puedan entender el entorno, detectando los elementos presentes, asà como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducción utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos métodos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehÃculos. A continuación, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los métodos previamente desarrollados para lograr desafÃos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. EspecÃficamente, en el CapÃtulo 2 detectamos vehÃculos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el CapÃtulo 3 aplicamos y probamos aún más este método en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creación de mapas sobre los que poder localizarnos mejor en los dÃas posteriores, como para el seguimiento de vehÃculos. Desde la perspectiva dinámica, en el CapÃtulo 4 aprendemos de la nube de puntos en 3D una caracterÃstica dinámica novedosa que se asemeja al flujo óptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo óptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el CapÃtulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificación y regresión para la tarea de estimación de flujo óptico de manera conjunta. Por último, en el CapÃtulo 6 reunimos los métodos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentación y estimación del movimiento de vehÃculos desde nuestra propia perspectivaAmb més d’1,35 milions de morts per accidents de trà nsit al món, a principis de segle es va
predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat
a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els
paradigmes del transport, fent possible aixà reduir la densitat del trà nsit, la contaminació i
el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de
persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu,
la detecció de punts cecs o l’estacionament automà tic, s’han transferit gradualment a vehicles
comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia
encara no ha arribat a aconseguir el grau suficient de maduresa. És necessà ria, doncs, una
total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els
elements presents, aixà com el seu moviment, intencions i interaccions.
A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents
escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant
l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des
d’una perspectiva està tica per a detectar vehicles. A continuació, a la Part II, desenvolupem
noves formes d’entendre les dinà miques de l’entorn. Finalment, a la Part III apliquem els
mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per
exemple, segmentar obstacles dinà mics al mateix temps que estimem el seu vector de moviment
respecte al terra.
Concretament, al CapÃtol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge
profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V)
com a representacions 2D del núvol de punts 3D que serveixen com a punt de partida per
entrenar els nostres models. Més endavant, al CapÃtol 3 apliquem i provem de nou aquest
mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació
de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme
el seguiment de vehicles. Des de la perspectiva dinà mica, al CapÃtol 4 aprenem una nova
caracterÃstica dinà mica del núvol de punts en 3D que s’assembla al flux òptic sobre imatges
RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo
mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després,
al CapÃtol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes
de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment,
al CapÃtol 6 posem en comú els mètodes anteriors i demostrem que mitjançant aquests processos
independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació
i estimació del moviment de vehicles des de la nostra pròpia perspectiva
A Unified Framework to Super-Resolve Face Images of Varied Low Resolutions
The existing face image super-resolution (FSR) algorithms usually train a
specific model for a specific low input resolution for optimal results. By
contrast, we explore in this work a unified framework that is trained once and
then used to super-resolve input face images of varied low resolutions. For
that purpose, we propose a novel neural network architecture that is composed
of three anchor auto-encoders, one feature weight regressor and a final image
decoder. The three anchor auto-encoders are meant for optimal FSR for three
pre-defined low input resolutions, or named anchor resolutions, respectively.
An input face image of an arbitrary low resolution is firstly up-scaled to the
target resolution by bi-cubic interpolation and then fed to the three
auto-encoders in parallel. The three encoded anchor features are then fused
with weights determined by the feature weight regressor. At last, the fused
feature is sent to the final image decoder to derive the super-resolution
result. As shown by experiments, the proposed algorithm achieves robust and
state-of-the-art performance over a wide range of low input resolutions by a
single framework. Code and models will be made available after the publication
of this work
ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing
We address the problem of finding realistic geometric corrections to a
foreground object such that it appears natural when composited into a
background image. To achieve this, we propose a novel Generative Adversarial
Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as
the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek
image realism by operating in the geometric warp parameter space. In
particular, we exploit an iterative STN warping scheme and propose a sequential
training strategy that achieves better results compared to naive training of a
single generator. One of the key advantages of ST-GAN is its applicability to
high-resolution images indirectly since the predicted warp parameters are
transferable between reference frames. We demonstrate our approach in two
applications: (1) visualizing how indoor furniture (e.g. from product images)
might be perceived in a room, (2) hallucinating how accessories like glasses
would look when matched with real portraits.Comment: Accepted to CVPR 2018 (website & code:
https://chenhsuanlin.bitbucket.io/spatial-transformer-GAN/
- …