147 research outputs found
Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection
We present a novel approach for vanishing point detection from uncalibrated
monocular images. In contrast to state-of-the-art, we make no a priori
assumptions about the observed scene. Our method is based on a convolutional
neural network (CNN) which does not use natural images, but a Gaussian sphere
representation arising from an inverse gnomonic projection of lines detected in
an image. This allows us to rely on synthetic data for training, eliminating
the need for labelled images. Our method achieves competitive performance on
three horizon estimation benchmark datasets. We further highlight some
additional use cases for which our vanishing point detection algorithm can be
used.Comment: Accepted for publication at German Conference on Pattern Recognition
(GCPR) 2017. This research was supported by German Research Foundation DFG
within Priority Research Programme 1894 "Volunteered Geographic Information:
Interpretation, Visualisation and Social Computing
Pose estimation system based on monocular cameras
Our world is full of wonders. It is filled with mysteries and challenges, which through
the ages inspired and called for the human civilization to grow itself, either philosophically
or sociologically. In time, humans reached their own physical limitations;
nevertheless, we created technology to help us overcome it. Like the ancient uncovered
land, we are pulled into the discovery and innovation of our time. All of this is
possible due to a very human characteristic - our imagination.
The world that surrounds us is mostly already discovered, but with the power of
computer vision (CV) and augmented reality (AR), we are able to live in multiple hidden
universes alongside our own. With the increasing performance and capabilities of
the current mobile devices, AR is what we dream it can be. There are still many obstacles,
but this future is already our reality, and with the evolving technologies closing
the gap between the real and the virtual world, soon it will be possible for us to surround
ourselves into other dimensions, or fuse them with our own.
This thesis focuses on the development of a system to predict the camera’s pose
estimation in the real-world regarding to the virtual world axis. The work was developed
as a sub-module integrated on the M5SAR project: Mobile Five Senses Augmented
Reality System for Museums, aiming to a more immerse experience with the
total or partial replacement of the environments’ surroundings. It is based mainly on
man-made buildings indoors and their typical rectangular cuboid shape. With the possibility
of knowing the user’s camera direction, we can then superimpose dynamic AR content, inviting the user to explore the hidden worlds.
The M5SAR project introduced a new way to explore the existent historical museums
by exploring the human’s five senses: hearing, smell, taste, touch, vision. With
this innovative technology, the user is able to enhance their visitation and immerse
themselves into a virtual world blended with our reality. A mobile device application
was built containing an innovating framework: MIRAR - Mobile Image Recognition
based Augmented Reality - containing object recognition, navigation, and additional
AR information projection in order to enrich the users’ visit, providing an intuitive
and compelling information regarding the available artworks, exploring the hearing
and vision senses. A device specially designed was built to explore the additional
three senses: smell, taste and touch which, when attached to a mobile device, either
smartphone or tablet, would pair with it and automatically react in with the offered
narrative related to the artwork, immersing the user with a sensorial experience.
As mentioned above, the work presented on this thesis is relative to a sub-module
of the MIRAR regarding environment detection and the superimposition of AR content.
With the main goal being the full replacement of the walls’ contents, and with the
possibility of keeping the artwork visible or not, it presented an additional challenge
with the limitation of using only monocular cameras. Without the depth information,
any 2D image of an environment, to a computer doesn’t represent the tridimensional
layout of the real-world dimensions. Nevertheless, man-based building tends to follow
a rectangular approach to divisions’ constructions, which allows for a prediction
to where the vanishing point on any environment image may point, allowing the reconstruction
of an environment’s layout from a 2D image. Furthermore, combining
this information with an initial localization through an improved image recognition
to retrieve the camera’s spatial position regarding to the real-world coordinates and
the virtual-world, alas, pose estimation, allowed for the possibility of superimposing
specific localized AR content over the user’s mobile device frame, in order to immerse,
i.e., a museum’s visitor into another era correlated to the present artworks’ historical
period. Through the work developed for this thesis, it was also presented a better planar surface in space rectification and retrieval, a hybrid and scalable multiple images
matching system, a more stabilized outlier filtration applied to the camera’s axis,
and a continuous tracking system that works with uncalibrated cameras and is able to
achieve particularly obtuse angles and still maintain the surface superimposition.
Furthermore, a novelty method using deep learning models for semantic segmentation
was introduced for indoor layout estimation based on monocular images. Contrary
to the previous developed methods, there is no need to perform geometric calculations
to achieve a near state of the art performance with a fraction of the parameters
required by similar methods. Contrary to the previous work presented on this thesis,
this method performs well even in unseen and cluttered rooms if they follow the Manhattan
assumption. An additional lightweight application to retrieve the camera pose
estimation is presented using the proposed method.O nosso mundo está repleto de maravilhas. Está cheio de mistérios e desafios, os quais,
ao longo das eras, inspiraram e impulsionaram a civilização humana a evoluir, seja
filosófica ou sociologicamente. Eventualmente, os humanos foram confrontados com
os seus limites físicos; desta forma, criaram tecnologias que permitiram superá-los.
Assim como as terras antigas por descobrir, somos impulsionados à descoberta e inovação
da nossa era, e tudo isso é possível graças a uma característica marcadamente
humana: a nossa imaginação.
O mundo que nos rodeia está praticamente todo descoberto, mas com o poder da
visão computacional (VC) e da realidade aumentada (RA), podemos viver em múltiplos
universos ocultos dentro do nosso. Com o aumento da performance e das capacidades
dos dispositivos móveis da atualidade, a RA pode ser exatamente aquilo que
sonhamos. Continuam a existir muitos obstáculos, mas este futuro já é o nosso presente,
e com a evolução das tecnologias a fechar o fosso entre o mundo real e o mundo
virtual, em breve será possível cercarmo-nos de outras dimensões, ou fundi-las dentro
da nossa.
Esta tese foca-se no desenvolvimento de um sistema de predição para a estimação
da pose da câmara no mundo real em relação ao eixo virtual do mundo. Este trabalho
foi desenvolvido como um sub-módulo integrado no projeto M5SAR: Mobile
Five Senses Augmented Reality System for Museums, com o objetivo de alcançar uma
experiência mais imersiva com a substituição total ou parcial dos limites do ambiente. Dedica-se ao interior de edifícios de arquitetura humana e a sua típica forma
de retângulo cuboide. Com a possibilidade de saber a direção da câmara do dispositivo,
podemos então sobrepor conteúdo dinâmico de RA, num convite ao utilizador
para explorar os mundos ocultos.
O projeto M5SAR introduziu uma nova forma de explorar os museus históricos existentes
através da exploração dos cinco sentidos humanos: a audição, o cheiro, o paladar,
o toque e a visão. Com essa tecnologia inovadora, o utilizador pode engrandecer
a sua visita e mergulhar num mundo virtual mesclado com a nossa realidade. Uma
aplicação para dispositivo móvel foi criada, contendo uma estrutura inovadora: MIRAR
- Mobile Image Recognition based Augmented Reality - a possuir o reconhecimento
de objetos, navegação e projeção de informação de RA adicional, de forma a
enriquecer a visita do utilizador, a fornecer informação intuitiva e interessante em relação
às obras de arte disponíveis, a explorar os sentidos da audição e da visão. Foi
também desenhado um dispositivo para exploração em particular dos três outros sentidos
adicionais: o cheiro, o toque e o sabor. Este dispositivo, quando afixado a um
dispositivo móvel, como um smartphone ou tablet, emparelha e reage com este automaticamente
com a narrativa relacionada à obra de arte, a imergir o utilizador numa
experiência sensorial.
Como já referido, o trabalho apresentado nesta tese é relativo a um sub-módulo
do MIRAR, relativamente à deteção do ambiente e a sobreposição de conteúdo de RA.
Sendo o objetivo principal a substituição completa dos conteúdos das paredes, e com
a possibilidade de manter as obras de arte visíveis ou não, foi apresentado um desafio
adicional com a limitação do uso de apenas câmaras monoculares. Sem a informação
relativa à profundidade, qualquer imagem bidimensional de um ambiente, para um
computador isso não se traduz na dimensão tridimensional das dimensões do mundo
real. No entanto, as construções de origem humana tendem a seguir uma abordagem
retangular às divisões dos edifícios, o que permite uma predição de onde poderá apontar
o ponto de fuga de qualquer ambiente, a permitir a reconstrução da disposição de
uma divisão através de uma imagem bidimensional. Adicionalmente, ao combinar esta informação com uma localização inicial através de um reconhecimento por imagem
refinado, para obter a posição espacial da câmara em relação às coordenadas
do mundo real e do mundo virtual, ou seja, uma estimativa da pose, foi possível alcançar
a possibilidade de sobrepor conteúdo de RA especificamente localizado sobre
a moldura do dispositivo móvel, de maneira a imergir, ou seja, colocar o visitante do
museu dentro de outra era, relativa ao período histórico da obra de arte em questão.
Ao longo do trabalho desenvolvido para esta tese, também foi apresentada uma melhor
superfície planar na recolha e retificação espacial, um sistema de comparação de
múltiplas imagens híbrido e escalável, um filtro de outliers mais estabilizado, aplicado
ao eixo da câmara, e um sistema de tracking contínuo que funciona com câmaras não
calibradas e que consegue obter ângulos particularmente obtusos, continuando a manter
a sobreposição da superfície.
Adicionalmente, um algoritmo inovador baseado num modelo de deep learning
para a segmentação semântica foi introduzido na estimativa do traçado com base em
imagens monoculares. Ao contrário de métodos previamente desenvolvidos, não é
necessário realizar cálculos geométricos para obter um desempenho próximo ao state
of the art e ao mesmo tempo usar uma fração dos parâmetros requeridos para métodos
semelhantes. Inversamente ao trabalho previamente apresentado nesta tese, este
método apresenta um bom desempenho mesmo em divisões sem vista ou obstruídas,
caso sigam a mesma premissa Manhattan. Uma leve aplicação adicional para obter a
posição da câmara é apresentada usando o método proposto
A Simple and Effective Method to Detect Orthogonal Vanishing Points in Uncalibrated Images of Man-Made Environments
International audienceThis paper presents an effective and easy-to-implement algorithm to compute orthogonal vanishing points in uncalibrated images of man-made scenes. The main contribution is to estimate the zenith and the horizon line before detecting the vanishing points, using simple properties of the central projection and exploiting accumulations of oriented segments around the horizon. Our method is fast and yields an accuracy comparable, and even better in some cases, to that of state-of-the-art algorithms
A-Contrario Horizon-First Vanishing Point Detection Using Second-Order Grouping Laws
International audienceWe show that, in images of man-made environments, the horizon line can usually be hypothesized based on a-contrario detections of second-order grouping events. This allows constraining the extraction of the horizontal vanishing points on that line, thus reducing false detections. Experiments made on three datasets show that our method, not only achieves state-of-the-art performance w.r.t. horizon line detection on two datasets, but also yields much less spurious vanishing points than the previous top-ranked methods
Vision-Based Building Seismic Displacement Measurement by Stratification of Projective Rectification Using Lines
We propose a new flexible technique for accurate vision-based seismic displacement measurement of building structures via a single non-stationary camera with any perspective view. No a priori information about the camera’s parameters or only partial knowledge of the internal camera parameters is required, and geometric constraints in the world coordinate system are employed for projective rectification in this research. Whereas most projective rectifications are conducted by specifying the positions of four or more fixed reference points, our method adopts a stratified approach to partially determine the projective transformation from line-based geometric relationships on the world plane. Since line features are natural and plentiful in a man-made architectural building environment, robust estimation techniques for automatic projective/affine distortion removal can be applied in a more practical way. Both simulations and real-recorded data were used to verify the effectiveness and robustness of the proposed method. We hope that the proposed method could advance the consumer-grade camera system for vision-based structural measurement one more step, from laboratory environments to real-world structural health monitoring systems
Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction
We tackle the problem of estimating a Manhattan frame, i.e. three orthogonal
vanishing points, and the unknown focal length of the camera, leveraging a
prior vertical direction. The direction can come from an Inertial Measurement
Unit that is a standard component of recent consumer devices, e.g.,
smartphones. We provide an exhaustive analysis of minimal line configurations
and derive two new 2-line solvers, one of which does not suffer from
singularities affecting existing solvers. Additionally, we design a new
non-minimal method, running on an arbitrary number of lines, to boost the
performance in local optimization. Combining all solvers in a hybrid robust
estimator, our method achieves increased accuracy even with a rough prior.
Experiments on synthetic and real-world datasets demonstrate the superior
accuracy of our method compared to the state of the art, while having
comparable runtimes. We further demonstrate the applicability of our solvers
for relative rotation estimation. The code is available at
https://github.com/cvg/VP-Estimation-with-Prior-Gravity.Comment: Accepted at ICCV 202
Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking
Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
Segmentation of Floors in Corridor Images for Mobile Robot Navigation
This thesis presents a novel method of floor segmentation from a single image for mobile robot navigation. In contrast with previous approaches that rely upon homographies, our approach does not require multiple images (either stereo or optical flow). It also does not require the camera to be calibrated, even for lens distortion. The technique combines three visual cues for evaluating the likelihood of horizontal intensity edge line segments belonging to the wall-floor boundary. The combination of these cues yields a robust system that works even in the presence of severe specular reflections, which are common in indoor environments. The nearly real-time algorithm is tested on a large database of images collected in a wide variety of conditions, on which it achieves nearly 90% segmentation accuracy. Additionally, we apply the floor segmentation method to low-resolution images and propose a minimalistic corridor representation consisting of the orientation line (center) and the wall-floor boundaries (lateral limit). Our study investigates the impact of image resolution upon the accuracy of extracting such a geometry, showing that detection of wall-floor boundaries can be estimated even in texture-poor environments with images as small as 16x12. One of the advantages of working at such resolutions is that the algorithm operates at hundreds of frames per second, or equivalently requires only a small percentage of the CPU
- …