207 research outputs found
Learning the Consensus of Multiple Correspondences between Data Structures
En aquesta tesi presentem un marc de treball per aprendre el consens donades múltiples correspondències. S'assumeix que les diferents parts involucrades han generat aquestes correspondències per separat, i el nostre sistema actua com un mecanisme que calibra diferents caracterÃstiques i considera diferents parà metres per aprendre les millors assignacions i aixÃ, conformar una correspondència amb la major precisió possible a costa d'un cost computacional raonable. Aquest marc de treball de consens és presentat en una forma gradual, començant pels desenvolupaments més bà sics que utilitzaven exclusivament conceptes ben definits o únicament un parell de correspondències, fins al model final que és capaç de considerar múltiples correspondències, amb la capacitat d'aprendre automà ticament alguns parà metres de ponderació. Cada pas d'aquest marc de treball és avaluat fent servir bases de dades de naturalesa variada per demostrar efectivament que és possible tractar diferents escenaris de matching.
Addicionalment, dos avanços suplementaris relacionats amb correspondències es presenten en aquest treball. En primer lloc, una nova mètrica de distà ncia per correspondències s'ha desenvolupat, la qual va derivar en una nova estratègia per a la cerca de mitjanes ponderades. En segon lloc, un marc de treball especÃficament dissenyat per a generar correspondències al camp del registre d'imatges s'ha modelat, on es considera que una de les imatges és una imatge completa, i l'altra és una mostra petita d'aquesta.
La conclusió presenta noves percepcions de com el nostre marc de treball de consens pot ser millorada, i com els dos desenvolupaments paral·lels poden convergir amb el marc de treball de consens.En esta tesis presentamos un marco de trabajo para aprender el consenso dadas múltiples correspondencias. Se asume que las distintas partes involucradas han generado dichas correspondencias por separado, y nuestro sistema actúa como un mecanismo que calibra distintas caracterÃsticas y considera diferentes parámetros para aprender las mejores asignaciones y asÃ, conformar una correspondencia con la mayor precisión posible a expensas de un costo computacional razonable. El marco de trabajo de consenso es presentado en una forma gradual, comenzando por los acercamientos más básicos que utilizaban exclusivamente conceptos bien definidos o únicamente un par de correspondencias, hasta el modelo final que es capaz de considerar múltiples correspondencias, con la capacidad de aprender automáticamente algunos parámetros de ponderación. Cada paso de este marco de trabajo es evaluado usando bases de datos de naturaleza variada para demostrar efectivamente que es posible tratar diferentes escenarios de matching.
Adicionalmente, dos avances suplementarios relacionados con correspondencias son presentados en este trabajo. En primer lugar, una nueva métrica de distancia para correspondencias ha sido desarrollada, la cual derivó en una nueva estrategia para la búsqueda de medias ponderadas. En segundo lugar, un marco de trabajo especÃficamente diseñado para generar correspondencias en el campo del registro de imágenes ha sido establecida, donde se considera que una de las imágenes es una imagen completa, y la otra es una muestra pequeña de ésta. La conclusión presenta nuevas percepciones de cómo nuestro marco de trabajo de consenso puede ser mejorada, y cómo los dos desarrollos paralelos pueden converger con éste.In this work, we present a framework to learn the consensus given multiple correspondences. It is assumed that the several parties involved have generated separately these correspondences, and our system acts as a mechanism that gauges several characteristics and considers different parameters to learn the best mappings and thus, conform a correspondence with the highest possible accuracy at the expense of a reasonable computational cost. The consensus framework is presented in a gradual form, starting from the most basic approaches that used exclusively well-known concepts or only two correspondences, until the final model which is able to consider multiple correspondences, with the capability of automatically learning some weighting parameters. Each step of the framework is evaluated using databases of varied nature to effectively demonstrate that it is capable to address different matching scenarios.
In addition, two supplementary advances related on correspondences are presented in this work. Firstly, a new distance metric for correspondences has been developed, which lead to a new strategy for the weighted mean correspondence search. Secondly, a framework specifically designed for correspondence generation in the image registration field has been established, where it is considered that one of the images is a full image, and the other one is a small sample of it. The conclusion presents insights of how our consensus framework can be enhanced, and how these two parallel developments can converge with it
Content-Aware Unsupervised Deep Homography Estimation
Homography estimation is a basic image alignment method in many applications.
It is usually conducted by extracting and matching sparse feature points, which
are error-prone in low-light and low-texture images. On the other hand,
previous deep homography approaches use either synthetic images for supervised
learning or aerial images for unsupervised learning, both ignoring the
importance of handling depth disparities and moving objects in real world
applications. To overcome these problems, in this work we propose an
unsupervised deep homography method with a new architecture design. In the
spirit of the RANSAC procedure in traditional methods, we specifically learn an
outlier mask to only select reliable regions for homography estimation. We
calculate loss with respect to our learned deep features instead of directly
comparing image content as did previously. To achieve the unsupervised
training, we also formulate a novel triplet loss customized for our network. We
verify our method by conducting comprehensive comparisons on a new dataset that
covers a wide range of scenes with varying degrees of difficulties for the
task. Experimental results reveal that our method outperforms the
state-of-the-art including deep solutions and feature-based solutions.Comment: Accepted by ECCV 2020 (Oral, Top 2%, 3 over 3 Strong Accepts). Jirong
Zhang and Chuan Wang are joint first authors, and Shuaicheng Liu is the
corresponding autho
A robust and efficient video representation for action recognition
This paper introduces a state-of-the-art video representation and applies it
to efficient action recognition and detection. We first propose to improve the
popular dense trajectory features by explicit camera motion estimation. More
specifically, we extract feature point matches between frames using SURF
descriptors and dense optical flow. The matches are used to estimate a
homography with RANSAC. To improve the robustness of homography estimation, a
human detector is employed to remove outlier matches from the human body as
human motion is not constrained by the camera. Trajectories consistent with the
homography are considered as due to camera motion, and thus removed. We also
use the homography to cancel out camera motion from the optical flow. This
results in significant improvement on motion-based HOF and MBH descriptors. We
further explore the recent Fisher vector as an alternative feature encoding
approach to the standard bag-of-words histogram, and consider different ways to
include spatial layout information in these encodings. We present a large and
varied set of evaluations, considering (i) classification of short basic
actions on six datasets, (ii) localization of such actions in feature-length
movies, and (iii) large-scale recognition of complex events. We find that our
improved trajectory features significantly outperform previous dense
trajectories, and that Fisher vectors are superior to bag-of-words encodings
for video recognition tasks. In all three tasks, we show substantial
improvements over the state-of-the-art results
Automatic Alignment of 3D Multi-Sensor Point Clouds
Automatic 3D point cloud alignment is a major research topic in photogrammetry, computer vision and computer graphics. In this research, two keypoint feature matching approaches have been developed and proposed for the automatic alignment of 3D point clouds, which have been acquired from different sensor platforms and are in different 3D conformal coordinate systems.
The first proposed approach is based on 3D keypoint feature matching. First, surface curvature information is utilized for scale-invariant 3D keypoint extraction. Adaptive non-maxima suppression (ANMS) is then applied to retain the most distinct and well-distributed set of keypoints. Afterwards, every keypoint is characterized by a scale, rotation and translation invariant 3D surface descriptor, called the radial geodesic distance-slope histogram. Similar keypoints descriptors on the source and target datasets are then matched using bipartite graph matching, followed by a modified-RANSAC for outlier removal.
The second proposed method is based on 2D keypoint matching performed on height map images of the 3D point clouds. Height map images are generated by projecting the 3D point clouds onto a planimetric plane. Afterwards, a multi-scale wavelet 2D keypoint detector with ANMS is proposed to extract keypoints on the height maps. Then, a scale, rotation and translation-invariant 2D descriptor referred to as the Gabor, Log-Polar-Rapid Transform descriptor is computed for all keypoints. Finally, source and target height map keypoint correspondences are determined using a bi-directional nearest neighbour matching, together with the modified-RANSAC for outlier removal.
Each method is assessed on multi-sensor, urban and non-urban 3D point cloud datasets. Results show that unlike the 3D-based method, the height map-based approach is able to align source and target datasets with differences in point density, point distribution and missing point data. Findings also show that the 3D-based method obtained lower transformation errors and a greater number of correspondences when the source and target have similar point characteristics. The 3D-based approach attained absolute mean alignment differences in the range of 0.23m to 2.81m, whereas the height map approach had a range from 0.17m to 1.21m. These differences meet the proximity requirements of the data characteristics and the further application of fine co-registration approaches
Real-time Monocular Object SLAM
We present a real-time object-based SLAM system that leverages the largest
object database to date. Our approach comprises two main components: 1) a
monocular SLAM algorithm that exploits object rigidity constraints to improve
the map and find its real scale, and 2) a novel object recognition algorithm
based on bags of binary words, which provides live detections with a database
of 500 3D objects. The two components work together and benefit each other: the
SLAM algorithm accumulates information from the observations of the objects,
anchors object features to especial map landmarks and sets constrains on the
optimization. At the same time, objects partially or fully located within the
map are used as a prior to guide the recognition algorithm, achieving higher
recall. We evaluate our proposal on five real environments showing improvements
on the accuracy of the map and efficiency with respect to other
state-of-the-art techniques
- …