20 research outputs found
ViPR: Visual-Odometry-aided Pose Regression for 6DoF Camera Localization
Visual Odometry (VO) accumulates a positional drift in long-term robot
navigation tasks. Although Convolutional Neural Networks (CNNs) improve VO in
various aspects, VO still suffers from moving obstacles, discontinuous
observation of features, and poor textures or visual information. While recent
approaches estimate a 6DoF pose either directly from (a series of) images or by
merging depth maps with optical flow (OF), research that combines absolute pose
regression with OF is limited. We propose ViPR, a novel modular architecture
for long-term 6DoF VO that leverages temporal information and synergies between
absolute pose estimates (from PoseNet-like modules) and relative pose estimates
(from FlowNet-based modules) by combining both through recurrent layers.
Experiments on known datasets and on our own Industry dataset show that our
modular design outperforms state of the art in long-term navigation tasks.Comment: Conf. on Computer Vision and Pattern Recognition (CVPR): Joint
Workshop on Long-Term Visual Localization, Visual Odometry and Geometric and
Learning-based SLAM 202
Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments
The localization of objects is a crucial task in various applications such as
robotics, virtual and augmented reality, and the transportation of goods in
warehouses. Recent advances in deep learning have enabled the localization
using monocular visual cameras. While structure from motion (SfM) predicts the
absolute pose from a point cloud, absolute pose regression (APR) methods learn
a semantic understanding of the environment through neural networks. However,
both fields face challenges caused by the environment such as motion blur,
lighting changes, repetitive patterns, and feature-less structures. This study
aims to address these challenges by incorporating additional information and
regularizing the absolute pose using relative pose regression (RPR) methods.
The optical flow between consecutive images is computed using the Lucas-Kanade
algorithm, and the relative pose is predicted using an auxiliary small
recurrent convolutional network. The fusion of absolute and relative poses is a
complex task due to the mismatch between the global and local coordinate
systems. State-of-the-art methods fusing absolute and relative poses use pose
graph optimization (PGO) to regularize the absolute pose predictions using
relative poses. In this work, we propose recurrent fusion networks to optimally
align absolute and relative pose predictions to improve the absolute pose
prediction. We evaluate eight different recurrent units and construct a
simulation environment to pre-train the APR and RPR networks for better
generalized training. Additionally, we record a large database of different
scenarios in a challenging large-scale indoor environment that mimics a
warehouse with transportation robots. We conduct hyperparameter searches and
experiments to show the effectiveness of our recurrent fusion method compared
to PGO
The Need for Inherently Privacy-Preserving Vision in Trustworthy Autonomous Systems
Vision is a popular and effective sensor for robotics from which we can
derive rich information about the environment: the geometry and semantics of
the scene, as well as the age, gender, identity, activity and even emotional
state of humans within that scene. This raises important questions about the
reach, lifespan, and potential misuse of this information. This paper is a call
to action to consider privacy in the context of robotic vision. We propose a
specific form privacy preservation in which no images are captured or could be
reconstructed by an attacker even with full remote access. We present a set of
principles by which such systems can be designed, and through a case study in
localisation demonstrate in simulation a specific implementation that delivers
an important robotic capability in an inherently privacy-preserving manner.
This is a first step, and we hope to inspire future works that expand the range
of applications open to sighted robotic systems.Comment: 7 pages, 6 figure
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression
Visual-inertial localization is a key problem in computer vision and robotics
applications such as virtual reality, self-driving cars, and aerial vehicles.
The goal is to estimate an accurate pose of an object when either the
environment or the dynamics are known. Recent methods directly regress the pose
using convolutional and spatio-temporal networks. Absolute pose regression
(APR) techniques predict the absolute camera pose from an image input in a
known scene. Odometry methods perform relative pose regression (RPR) that
predicts the relative pose from a known object dynamic (visual or inertial
inputs). The localization task can be improved by retrieving information of
both data sources for a cross-modal setup, which is a challenging problem due
to contradictory tasks. In this work, we conduct a benchmark to evaluate deep
multimodal fusion based on PGO and attention networks. Auxiliary and Bayesian
learning are integrated for the APR task. We show accuracy improvements for the
RPR-aided APR task and for the RPR-RPR task for aerial vehicles and hand-held
devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets, and
record a novel industry dataset.Comment: Under revie
Utilizing ethnic-specific differences in minor allele frequency to recategorize reported pathogenic deafness variants
Q1Q1ArtÃculo original445-453Ethnic-specific differences in minor allele frequency impact variant categorization for genetic screening of nonsyndromic hearing loss (NSHL) and other genetic disorders. We sought to evaluate all previously reported pathogenic NSHL variants in the context of a large number of controls from ethnically distinct populations sequenced with orthogonal massively parallel sequencing methods. We used HGMD, ClinVar, and dbSNP to generate a comprehensive list of reported pathogenic NSHL variants and re-evaluated these variants in the context of 8,595 individuals from 12 populations and 6 ethnically distinct major human evolutionary phylogenetic groups from three sources (Exome Variant Server, 1000 Genomes project, and a control set of individuals created for this study, the OtoDB). Of the 2,197 reported pathogenic deafness variants, 325 (14.8%) were present in at least one of the 8,595 controls, indicating a minor allele frequency (MAF) >0.00006. MAFs ranged as high as 0.72, a level incompatible with pathogenicity for a fully penetrant disease like NSHL. Based on these data, we established MAF thresholds of 0.005 for autosomal-recessive variants (excluding specific variants in GJB2) and 0.0005 for autosomal-dominant variants. Using these thresholds, we recategorized 93 (4.2%) of reported pathogenic variants as benign. Our data show that evaluation of reported pathogenic deafness variants using variant MAFs from multiple distinct ethnicities and sequenced by orthogonal methods provides a powerful filter for determining pathogenicity. The proposed MAF thresholds will facilitate clinical interpretation of variants identified in genetic testing for NSHL. All data are publicly available to facilitate interpretation of genetic variants causing deafness
Imitrob: Imitation Learning Dataset for Training and Evaluating 6D Object Pose Estimators
This paper introduces a dataset for training and evaluating methods for 6D
pose estimation of hand-held tools in task demonstrations captured by a
standard RGB camera. Despite the significant progress of 6D pose estimation
methods, their performance is usually limited for heavily occluded objects,
which is a common case in imitation learning where the object is typically
partially occluded by the manipulating hand. Currently, there is a lack of
datasets that would enable the development of robust 6D pose estimation methods
for these conditions. To overcome this problem, we collect a new dataset
(Imitrob) aimed at 6D pose estimation in imitation learning and other
applications where a human holds a tool and performs a task. The dataset
contains image sequences of three different tools and six manipulation tasks
with two camera viewpoints, four human subjects, and left/right hand. Each
image is accompanied by an accurate ground truth measurement of the 6D object
pose, obtained by the HTC Vive motion tracking device. The use of the dataset
is demonstrated by training and evaluating a recent 6D object pose estimation
method (DOPE) in various setups. The dataset and code are publicly available at
http://imitrob.ciirc.cvut.cz/imitrobdataset.php
Caracterización y reconocimiento de objetos mediante algoritmos de visión computacional para la interacción de un robot con su entorno
En el campo de la robótica, se han desarrollado distintos algoritmos y métodos con el
objetivo de mejorar la interacción de los robots con las personas y con su entorno de
trabajo en tiempo real; es asÃ, como el sistema reacciona y evoluciona constantemente
ante cambios que podrÃan ocurrir durante su funcionamiento. Para alcanzar los objetivos
mencionados, una de las habilidades que se le confiere a la máquina es la capacidad
de detectar, registrar y reconocer objetos.
La presente tesis es un trabajo de investigación aplicada que tiene como objetivo
desarrollar un procedimiento que permita a un sistema robótico reconocer y detectar
objetos en tiempo real dentro de un entorno controlado; para ello, nos enfocamos en
utilizar dos métodos conocidos de reconocimientos de objetos (métodos SIFT y SURF)
con los cuales categorizaremos un objeto de un dominio predefinido y comparamos los
resultados obtenidos. Se eligieron el método SIFT y el método SURF por la similitud en
los pasos que siguen para obtener la información de un objeto; cabe resaltar que el
método SURF es un método alterno al SIFT.
Los resultados finales mostraron una mejor predicción en la categorización utilizando el
método SIFT, pero ésta requerÃa de mayor tiempo para extraer los puntos caracterÃsticos
de los objetos. Por otro lado, el método SURF generaba más puntos caracterÃsticos de
los objetos y en mejor tiempo. La extracción de puntos de interés se analizó en tiempo
real; mientras, que la etapa de categorización no consideró este parámetro, sino la
cantidad de puntos de interés necesarios para predecir con exactitud la categorÃa de un
objeto.Tesi
Supplement to MTI Study on Selective Passenger Screening in the Mass Transit Rail Environment, MTI Report 09-05
This supplement updates and adds to MTIs 2007 report on Selective Screening of Rail Passengers (Jenkins and Butterworth MTI 07-06: Selective Screening of Rail Passengers). The report reviews current screening programs implemented (or planned) by nine transit agencies, identifying best practices. The authors also discuss why three other transit agencies decided not to implement passenger screening at this time. The supplement reconfirms earlier conclusions that selective screening is a viable security option, but that effective screening must be based on clear policies and carefully managed to avoid perceptions of racial or ethnic profiling, and that screening must have public support. The supplement also addresses new developments, such as vapor-wake detection canines, continuing challenges, and areas of debate. Those interested should also read MTI S-09-01 Rail Passenger Selective Screening Summit