3,251 research outputs found
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
On Degeneracy of Linear Reconstruction from Three Views: Linear Line Complex and Applications
This paper investigates the linear degeneracies of projective structure estimation from point and line features across three views. We show that the rank of the linear system of equations for recovering the trilinear tensor of three views reduces to 23 (instead of 26) in the case when the scene is a Linear Line Complex (set of lines in space intersecting at a common line) and is 21 when the scene is planar. The LLC situation is only linearly degenerate, and we show that one can obtain a unique solution when the admissibility constraints of the tensor are accounted for. The line configuration described by an LLC, rather than being some obscure case, is in fact quite typical. It includes, as a particular example, the case of a camera moving down a hallway in an office environment or down an urban street. Furthermore, an LLC situation may occur as an artifact such as in direct estimation from spatio-temporal derivatives of image brightness. Therefore, an investigation into degeneracies and their remedy is important also in practice
Angular variation as a monocular cue for spatial percepcion
Monocular cues are spatial sensory inputs which are picked up exclusively from one eye. They are in majority static features that
provide depth information and are extensively used in graphic art to create realistic representations of a scene. Since the spatial
information contained in these cues is picked up from the retinal image, the existence of a link between it and the theory of direct
perception can be conveniently assumed. According to this theory, spatial information of an environment is directly contained in the
optic array. Thus, this assumption makes possible the modeling of visual perception processes through computational approaches.
In this thesis, angular variation is considered as a monocular cue, and the concept of direct perception is adopted by a computer
vision approach that considers it as a suitable principle from which innovative techniques to calculate spatial information can be
developed.
The expected spatial information to be obtained from this monocular cue is the position and orientation of an object with respect to
the observer, which in computer vision is a well known field of research called 2D-3D pose estimation. In this thesis, the attempt to
establish the angular variation as a monocular cue and thus the achievement of a computational approach to direct perception is
carried out by the development of a set of pose estimation methods. Parting from conventional strategies to solve the pose
estimation problem, a first approach imposes constraint equations to relate object and image features. In this sense, two algorithms
based on a simple line rotation motion analysis were developed. These algorithms successfully provide pose information; however,
they depend strongly on scene data conditions. To overcome this limitation, a second approach inspired in the biological processes
performed by the human visual system was developed. It is based in the proper content of the image and defines a computational
approach to direct perception.
The set of developed algorithms analyzes the visual properties provided by angular variations. The aim is to gather valuable data
from which spatial information can be obtained and used to emulate a visual perception process by establishing a 2D-3D metric
relation. Since it is considered fundamental in the visual-motor coordination and consequently essential to interact with the
environment, a significant cognitive effect is produced by the application of the developed computational approach in environments
mediated by technology. In this work, this cognitive effect is demonstrated by an experimental study where a number of participants
were asked to complete an action-perception task. The main purpose of the study was to analyze the visual guided behavior in
teleoperation and the cognitive effect caused by the addition of 3D information. The results presented a significant influence of the
3D aid in the skill improvement, which showed an enhancement of the sense of presence.Las señales monoculares son entradas sensoriales capturadas exclusivamente por un
solo ojo que ayudan a la percepción de distancia o espacio. Son en su mayoría
características estáticas que proveen información de profundidad y son muy
utilizadas en arte gráfico para crear apariencias reales de una escena. Dado que la
información espacial contenida en dichas señales son extraídas de la retina, la
existencia de una relación entre esta extracción de información y la teoría de
percepción directa puede ser convenientemente asumida. De acuerdo a esta teoría, la
información espacial de todo le que vemos está directamente contenido en el arreglo
óptico. Por lo tanto, esta suposición hace posible el modelado de procesos de
percepción visual a través de enfoques computacionales. En esta tesis doctoral, la
variación angular es considerada como una señal monocular, y el concepto de
percepción directa adoptado por un enfoque basado en algoritmos de visión por
computador que lo consideran un principio apropiado para el desarrollo de nuevas
técnicas de cálculo de información espacial.
La información espacial esperada a obtener de esta señal monocular es la posición y
orientación de un objeto con respecto al observador, lo cual en visión por computador
es un conocido campo de investigación llamado estimación de la pose 2D-3D. En esta
tesis doctoral, establecer la variación angular como señal monocular y conseguir un
modelo matemático que describa la percepción directa, se lleva a cabo mediante el
desarrollo de un grupo de métodos de estimación de la pose. Partiendo de estrategias
convencionales, un primer enfoque implanta restricciones geométricas en ecuaciones
para relacionar características del objeto y la imagen. En este caso, dos algoritmos
basados en el análisis de movimientos de rotación de una línea recta fueron
desarrollados. Estos algoritmos exitosamente proveen información de la pose. Sin
embargo, dependen fuertemente de condiciones de la escena. Para superar esta
limitación, un segundo enfoque inspirado en los procesos biológicos ejecutados por el
sistema visual humano fue desarrollado. Está basado en el propio contenido de la
imagen y define un enfoque computacional a la percepción directa.
El grupo de algoritmos desarrollados analiza las propiedades visuales suministradas
por variaciones angulares. El propósito principal es el de reunir datos de importancia
con los cuales la información espacial pueda ser obtenida y utilizada para emular
procesos de percepción visual mediante el establecimiento de relaciones métricas 2D-
3D. Debido a que dicha relación es considerada fundamental en la coordinación
visuomotora y consecuentemente esencial para interactuar con lo que nos rodea, un
efecto cognitivo significativo puede ser producido por la aplicación de métodos de
L
estimación de pose en entornos mediados tecnológicamente. En esta tesis doctoral, este
efecto cognitivo ha sido demostrado por un estudio experimental en el cual un número
de participantes fueron invitados a ejecutar una tarea de acción-percepción. El
propósito principal de este estudio fue el análisis de la conducta guiada visualmente en
teleoperación y el efecto cognitivo causado por la inclusión de información 3D. Los
resultados han presentado una influencia notable de la ayuda 3D en la mejora de la
habilidad, así como un aumento de la sensación de presencia
Recursive Motion Estimation on the Essential Manifold
Visual motion estimation can be regarded as estimation of the state of a system of difference equations with unknown inputs defined on a manifold. Such a system happens to be "linear", but it is defined on a space (the so called "Essential manifold") which is not a linear (vector) space.
In this paper we will introduce a novel perspective for viewing the motion estimation problem which results in three original schemes for solving it. The first consists in "flattening the space" and solving a nonlinear estimation problem on the flat (euclidean) space.
The second approach consists in viewing the system as embedded in a larger euclidean space (the smallest of the embedding spaces), and solving at each step a linear estimation problem on a linear space, followed by a "projection" on the manifold (see fig. 5).
A third "algebraic" formulation of motion estimation is inspired by the structure of the problem in local coordinates (flattened space), and consists in a double iteration for solving an "adaptive fixed-point" problem (see fig. 6).
Each one of these three schemes outputs motion estimates together with the joint second order statistics of the estimation error, which can be used by any structure from motion module which incorporates motion error [20, 23] in order to estimate 3D scene structure.
The original contribution of this paper involves both the problem formulation, which gives new insight into the differential geometric structure of visual motion estimation, and the ideas generating the three schemes. These are viewed within a unified framework. All the schemes have a strong theoretical motivation and exhibit accuracy, speed of convergence, real time operation and flexibility which are superior to other existing schemes [1, 20, 23].
Simulations are presented for real and synthetic image sequences to compare the three schemes against each other and highlight the peculiarities of each one
Efficient solutions to the relative pose of three calibrated cameras from four points using virtual correspondences
We study the challenging problem of estimating the relative pose of three
calibrated cameras. We propose two novel solutions to the notoriously difficult
configuration of four points in three views, known as the 4p3v problem. Our
solutions are based on the simple idea of generating one additional virtual
point correspondence in two views by using the information from the locations
of the four input correspondences in the three views. For the first solver, we
train a network to predict this point correspondence. The second solver uses a
much simpler and more efficient strategy based on the mean points of three
corresponding input points. The new solvers are efficient and easy to implement
since they are based on the existing efficient minimal solvers, i.e., the
well-known 5-point relative pose and the P3P solvers. The solvers achieve
state-of-the-art results on real data. The idea of solving minimal problems
using virtual correspondences is general and can be applied to other problems,
e.g., the 5-point relative pose problem. In this way, minimal problems can be
solved using simpler non-minimal solvers or even using sub-minimal samples
inside RANSAC.
In addition, we compare different variants of 4p3v solvers with the baseline
solver for the minimal configuration consisting of three triplets of points and
two points visible in two views. We discuss which configuration of points is
potentially the most practical in real applications
PHOTOMATCH: AN OPEN-SOURCE MULTI-VIEW and MULTI-MODAL FEATURE MATCHING TOOL for PHOTOGRAMMETRIC APPLICATIONS
Automatic feature matching is a crucial step in Structure-from-Motion (SfM) applications for 3D reconstruction purposes. From an historical perspective we can say now that SIFT was the enabling technology that made SfM a successful and fully automated pipeline. SIFT was the ancestor of a wealth of detector/descriptor methods that are now available. Various research activities have tried to benchmark detector/descriptors operators, but a clear outcome is difficult to be drawn. This paper presents an ISPRS Scientific Initiative aimed at providing the community with an educational open-source tool (called PhotoMatch) for tie point extractions and image matching. Several enhancement and decolorization methods can be initially applied to an image dataset in order to improve the successive feature extraction steps. Then different detector/descriptor combinations are possible, coupled with different matching strategies and quality control metrics. Examples and results show the implemented functionality of PhotoMatch which has also a tutorial for shortly explaining the implemented methods
- …