877 research outputs found
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Recommended from our members
A Survey on Cooperative Longitudinal Motion Control of Multiple Connected and Automated Vehicles
Enhancing depth cues with AR visualization for forklift operation assistance in warehouse.
With warehouse operations contributing to the major part of logistics, architects tend to utilize
every inch of the space allocated to maximize the stacking space. Increasing the height of
the aisles and narrowing down the aisle-aisle space are major design issues in doing so. Even
though forklift manufacturing companies introduced high reach trucks and forklifts for narrow
aisles, forklift operators face many issues while working with heavy pallets. This thesis focused
on developing a systemthat uses Augmented Reality(AR) to aid forklift operators in performing
their pallet racking and pick up tasks. It used AR technology to superimpose virtual cues over
the real world specifying the pallets to be picked up and moved and also assist in operating the
forklift using depth cues. This aims to increase the productivity of the forklift operators in the
warehouse. Depth cues are overlaid on a live video feed from a camera attached to the front of
the forklift which was displayed using a laptop to the participants.
To evaluate the usability of the system designed, an experiment was conducted and the performance
results and the feedback from the participants was evaluated. A remote controlled toy
forklift was used to conduct the experiment and a motion tracking system was set-up to track
the cab and pallet. Simple pallet handling tasks were designed for the participants and their
performance and feedback was collected and analysed. This thesis shows how AR offers a simple
and effecient solution for the problems faced by forklift operators while performing pallet
handling tasks in warehouse
2007 Annual Report of the Graduate School of Engineering and Management, Air Force Institute of Technology
The Graduate School\u27s Annual Report highlights research focus areas, new academic programs, faculty accomplishments and news, and provides top-level sponsor-funded research data and information
Design Framework of UAV-Based Environment Sensing, Localization, and Imaging System
In this dissertation research, we develop a framework for designing an Unmanned Aerial Vehicle or UAV-based environment sensing, localization, and imaging system for challenging environments with no GPS signals and low visibility. The UAV system relies on the various sensors that it carries to conduct accurate sensing and localization of the objects in an environment, and further to reconstruct the 3D shapes of those objects. The system can be very useful when exploring an unknown or dangerous environment, e.g., a disaster site, which is not convenient or not accessible for humans. In addition, the system can be used for monitoring and object tracking in a large scale environment, e.g., a smart manufacturing factory, for the purposes of workplace management/safety, and maintaining optimal system performance/productivity.
In our framework, the UAV system is comprised of two subsystems: a sensing and localization subsystem; and a mmWave radar-based 3D object reconstruction subsystem.
The first subsystem is referred to as LIDAUS (Localization of IoT Device via Anchor UAV SLAM), which is an infrastructure-free, multi-stage SLAM (Simultaneous Localization and Mapping) system that utilizes a UAV to accurately localize and track IoT devices in a space with weak or no GPS signals. The rapidly increasing deployment of Internet of Things (IoT) around the world is changing many aspects of our society. IoT devices can be deployed in various places for different purposes, e.g., in a manufacturing site or a large warehouse, and they can be displaced over time due to human activities, or manufacturing processes. Usually in an indoor environment, the lack of GPS signals and infrastructure support makes most existing indoor localization systems not practical when localizing a large number of wireless IoT devices. In addition, safety concerns, access restriction, and simply the huge amount of IoT devices make it not practical for humans to manually localize and track IoT devices. Our LIDAUS is developed to address these problems. The UAV in our LIDAUS system conducts multi-stage 3D SLAM trips to localize devices based only on Received Signal Strength Indicator (RSSI), the most widely available measurement of the signals of almost all commodity IoT devices. Our simulations and experiments of Bluetooth IoT devices demonstrate that our system LIDAUS can achieve high localization accuracy based only on RSSIs of commodity IoT devices.
Build on the first subsystem, we further develop the second subsystem for environment reconstruction and imaging via mmWave radar and deep learning. This subsystem is referred to as 3DRIMR/R2P (3D Reconstruction and Imaging via mmWave Radar/Radar to Point Cloud). It enables an exploring UAV to fly within an environment and collect mmWave radar data by scanning various objects in the environment. Taking advantage of the accurate locations given by the first subsystem, the UAV can scan an object from different viewpoints. Then based on radar data only, the UAV can reconstruct the 3D shapes of the objects in the space. mmWave radar has been shown as an effective sensing technique in low visibility, smoke, dusty, and dense fog environment. However, tapping the potential of radar sensing to reconstruct 3D object shapes remains a great challenge, due to the characteristics of radar data such as sparsity, low resolution, specularity, large noise, and multi-path induced shadow reflections and artifacts. Hence, it is challenging to reconstruct 3D object shapes based on the raw sparse and low-resolution mmWave radar signals.
To address the challenges, our second subsystem utilizes deep learning models to extract features from sparse raw mmWave radar intensity data, and reconstructs 3D shapes of objects in the format of dense and detailed point cloud. We first develop a deep learning model to reconstruct a single object’s 3D shape. The model first converts mmWave radar data to depth images, and then reconstructs an object’s 3D shape in point cloud format. Our experiments demonstrate the significant performance improvement of our system over the popular existing methods such as PointNet, PointNet++ and PCN. Then we further explore the feasibility of utilizing a mmWave radar sensor installed on a UAV to reconstruct the 3D shapes of multiple objects in a space. We evaluate two different models. Model 1 is 3DRIMR/R2P model, and Model 2 is formed by adding a segmentation stage in the processing pipeline of Model 1. Our experiments demonstrate that both models are promising in solving the multiple object reconstruction problem. We also show that Model 2, despite producing denser and smoother point clouds, can lead to higher reconstruction loss or even missing objects. In addition, we find that both models are robust to the highly noisy radar data obtained by unstable Synthetic Aperture Radar (SAR) operation due to the instability or vibration of a small UAV hovering at its intended scanning point. Our research shows a promising direction of applying mmWave radar sensing in 3D object reconstruction
Collaboratively Navigating Autonomous Systems
The objective of this project is to focus on technologies for enabling heterogeneous networks of autonomous vehicles to cooperate together on a specific task. The prototyped test bed consists of a retrofitted electric golf cart and a quadrotor designed to perform distributed information gathering to guide decision making across the entire test bed. The system prototype demonstrates several aspects of this technology and lays the groundwork for future projects in this area
Human factors in instructional augmented reality for intravehicular spaceflight activities and How gravity influences the setup of interfaces operated by direct object selection
In human spaceflight, advanced user interfaces are becoming an interesting mean to facilitate human-machine interaction, enhancing and guaranteeing the sequences of intravehicular space operations. The efforts made to ease such operations have shown strong interests in novel human-computer interaction like Augmented Reality (AR). The work presented in this thesis is directed towards a user-driven design for AR-assisted space operations, iteratively solving issues arisen from the problem space, which also includes the consideration of the effect of altered gravity on handling such interfaces.Auch in der bemannten Raumfahrt steigt das Interesse an neuartigen Benutzerschnittstellen, um nicht nur die Mensch-Maschine-Interaktion effektiver zu gestalten, sondern auch um einen korrekten Arbeitsablauf sicherzustellen. In der Vergangenheit wurden wiederholt Anstrengungen unternommen, Innenbordarbeiten mit Hilfe von Augmented Reality (AR) zu erleichtern. Diese Arbeit konzentriert sich auf einen nutzerorientierten AR-Ansatz, welcher zum Ziel hat, die Probleme schrittweise in einem iterativen Designprozess zu lösen. Dies erfordert auch die Berücksichtigung veränderter Schwerkraftbedingungen
Recommended from our members
Inertial-aided Visual Perception of Geometry and Semantics
We describe components of a visual perception system to understand the geometry and semantics of the three-dimensional scene by utilizing monocular cameras and inertial measurement units (IMUs). The use of the two sensor modalities is motivated by the wide availability of the camera-IMU sensor packages present in mobile devices from phones to cars, and their complementary sensing capabilities: IMUs can track the motion of the sensor platform over a short period of time accurately, and provide a scaled and gravity-aligned global reference frame, while cameras can capture rich photometric signatures of the scene, and provide relative motion constraints between images up to scale. We first show that visual 3D reconstruction can be improved by leveraging the global orientation frame -- easily inferred from inertials. In the gravity-aligned global orientation frame, a shape prior can be imposed in depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or orthogonal to it. Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of utilizing inertials in 3D reconstruction. The global reference provided by inertials is not only gravity-aligned but also scaled, which is exploited in depth completion: We describe a method to infer dense metric depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to “self-supervision,” measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also launch the first visual-inertial + depth dataset (dubbed ``VOID''), which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, and show state-of-the-art performance on it.In addition to dense geometry, the camera-IMU sensor package can also be used to recover the semantics of the scene. We present two methods to augment a point cloud map with class-labeled objects represented in the form of either scaled and oriented bounding boxes or CAD models. The tradeoff of the two shape representation resides in their generality and capability to model detailed structures. While being more generic, 3D bounding boxes fail to model the details of the objects, whereas CAD models preserve the finest shape details but require more computation and are limited to previously seen objects. Nevertheless, both methods populate an unknown environment with 3D objects placed in a Euclidean reference frame inferred causally and on-line using monocular video along with inertial sensors. Besides, both methods include bottom-up and top-down components, whereby deep networks trained for detection provide likelihood scores for object hypotheses provided by a nonlinear filter, whose state serves as memory. We test our methods on KITTI and SceneNN datasets, and also introduce the VISMA dataset, which contains ground truth pose, point-cloud map, and object models, along with time-stamped inertial measurements.To reduce the drift of the visual-inertial SLAM system -- a building block of all the visual perception systems we have built, we introduce an efficient loop closure detection approach based on the idea of hierarchical pooling of image descriptors. We also open-sourced a full-fledged SLAM system equipped with mapping and loop closure capabilities. The code is publicly available at https://github.com/ucla-vision/xivo
Toolkits for the Development of Hybrid Games: from Tangible Tabletops to Interactive Spaces
Durante los últimos años, los dispositivos tabletop han sido considerados el entorno ideal para los juegos híbridos, los cuales combinan técnicas de juego tradicional, como el uso de objetos físicos para interactuar con el juego de una forma natural, con las nuevas posibilidades que los tabletops ofrecen de aumentar el espacio de juego con imágenes digitales y audio.Sin embargo, los juegos híbridos no se restringen simplemente a tabletops, pudiéndose jugar también en entornos más amplios en los que convergen otros paradigmas de interacción. Por esta razón, el uso de juegos híbridos en Espacios Interactivos está ganando fuerza, pero el número y heterogeneidad de dispositivos y estilos de interacción que se encuentran en estos entornos hace que el diseño y prototipado de juegos sea una tarea difícil. Por lo tanto, el gran reto se encuentra en ofrecer a diseñadores y desarrolladores herramientas apropiadas para la creación de estas aplicaciones.En esta línea de trabajo, el grupo Affective Lab lanzó el proyecto JUGUEMOS (TIN2015-67149-C3-1R), un proyecto nacional centrado en el desarrollo de juegos híbridos en entornos interactivos. Esta Tesis Doctoral se enmarca en este proyecto.El primer paso de la realización de esta tesis fue establecer los dos objetivos principales (Capítulo 1):1) El primer objetivo que se estableció fue profundizar en el uso de tabletops tangibles en terapia con niños con necesidades especiales. Durante los últimos años el grupo Affective Lab había visto la potencialidad de los tabletops tangibles para el trabajo con niños pequeños, pero todavía era necesario llevar a cabo más experiencias y evaluaciones en el ámbito terapéutico, así como explorar si otros grupos de usuarios (adultos con problemas cognitivos) podían beneficiarse de las características de los tabletops.2) El segundo objetivo consistió en diseñar e implementar un toolkit para el desarrollo de juegos híbridos para espacios interactivos. Se decidió que el toolkit estuviera dirigido a desarrolladores para facilitar su trabajo a la hora de crear este tipo de aplicaciones.Una vez establecidos los objetivos, se realizó un estado del arte a su vez dividido en dos partes (Capítulo 2):1) Se realizó una categorización de juegos híbridos para entender y extraer sus principales características, así como los principales retos que surgen al desarrollar este tipo de juegos. También se estudiaron toolkits cuyo objetivo era el desarrollo de juegos híbridos.2) Se estudiaron juegos híbridos desarrollados para niños con necesidades especiales y adultos con problemas cognitivos que hacían uso de la Interacción Tangible y tabletops, así como toolkits dirigidos a terapeutas o educadores para ayudarles en la creación de actividades para sus pacientes.Para llevar a cabo las experiencias y evaluaciones relacionadas con el primer objetivo, se hizo uso del tabletop tangible NIKVision, desarrollado previamente por el grupo Affective Lab, y el toolkit KitVision, una herramienta dirigida a profesionales sin conocimientos de programación para la creación de actividades tangibles y que fue desarrollado durante el Proyecto Final de Carrera de la autora. En el Capítulo 3 de esta Tesis se comenta brevemente el tabletop NIKVision y la arquitectura de KitVision, se describen las evaluaciones que se llevaron a cabo con terapeutas con el objetivo de mejorar y probar la utilidad del toolkit, y se explica una experiencia de un año durante la cual una terapeuta ocupacional de ASAPME, una asociaciónque trabaja con adultos con problemas cognitivos, estuvo usando el tabletop y el toolkit sin supervisión.En el Capítulo 4 se describen diferentes experiencias con KitVision que se llevaron a cabo:- Gracias a una colaboración con la Residencia Romareda, NIKVision y KitVision fueron instalados provisionalmente en la residencia y, tras una evaluación inicial, se desarrollaron tres nuevas actividades para los usuarios de la residencia.- Gracias a la colaboración con ENMOvimienTO y con uno de los centros de Atención Temprana del Instituto Aragonés de Servicios Sociales (IASS), ambos enfocados a trabajar con niños con problemas de aprendizaje, se pudieron realizar evaluaciones que nos permitieron mejorar KitVision y crear nuevas actividades específicamente diseñadas para ellos.- Finalmente, gracias a una colaboración con Atenciona, pudimos evaluar actividades con niños con Trastorno por Déficit de Atención e Hiperactividad (TDAH) y extraer una serie de directrices para diseñar actividades para este tipo de niños. También pudimos llevar a cabo una experiencia de Diseño Participativo con estos niños.El completo desarrollo del toolkit JUGUEMOS, para la creación de juegos híbridos en espacios interactivos, se explica en el Capítulo 5. En este apartado primero se describe el Espacio Interactivo JUGUEMOS que sirvió de base para desarrollar el toolkit. Después se explican con detalle las decisiones de diseño que se tomaron, el modelo de abstracción que se usó para diseñar los juegos, y la arquitectura del toolkit. También se detallan las distintas fases de implementación que se llevaron a cabo, basadas en los tres retos que se extrajeron en el estado del arte: (1) integrar diferentes dispositivos, (2) gestionar salidas gráficas diversas y (3) facilitar la codificación del juego. Finalmente, se presentan dos prototipos de juegos que se desarrollaron durante las dos estancias de investigación que la autora realizó.Finalmente, en el Capítulo 6 se describen los tres casos de uso que se realizaron para tener una primera valoración de la usabilidad del toolkit JUGUEMOS: (1) una evaluación con estudiantes de Máster en la que se implementó un juego completamente funcional para el Espacio Interactivo JUGUEMOS, (2) un juego que fue completamente desarrollado usando el toolkit JUGUEMOS una vez que éste se acabó de implementar, (3) una experiencia que involucró a dos grupos multidisciplinares compuestos por diseñadores y desarrolladores, en la que tuvieron que colaborar para diseñar e implementar dos prototipos de juegos híbridos para el espacio interactivo.<br /
- …