6,998 research outputs found
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Implementation and analysis of several keyframe-based browsing interfaces to digital video
In this paper we present a variety of browsing interfaces for digital video information. The six interfaces are implemented on top of FĂschlĂĄr, an operational recording, indexing, browsing and playback system for broadcast TV programmes. In developing the six browsing interfaces, we have been informed by the various dimensions which can be used to distinguish one interface from another. For this we include layeredness (the number of âlayersâ of abstraction which can be used in browsing a programme), the provision or omission of temporal information (varying from full timestamp information to nothing at all on time) and visualisation of spatial vs. temporal aspects of the video. After introducing and defining these dimensions we then locate some common browsing interfaces from the literature in this 3-dimensional âspaceâ and then we locate our own six interfaces in this same space. We then present an outline of the interfaces and include some user feedback
Integration and coordination in a cognitive vision system
In this paper, we present a case study that exemplifies
general ideas of system integration and coordination.
The application field of assistant technology provides an
ideal test bed for complex computer vision systems including
real-time components, human-computer interaction, dynamic
3-d environments, and information retrieval aspects.
In our scenario the user is wearing an augmented reality device
that supports her/him in everyday tasks by presenting
information that is triggered by perceptual and contextual
cues. The system integrates a wide variety of visual functions
like localization, object tracking and recognition, action
recognition, interactive object learning, etc. We show
how different kinds of system behavior are realized using
the Active Memory Infrastructure that provides the technical
basis for distributed computation and a data- and eventdriven
integration approach
Cooperative Interactive Distributed Guidance on Mobile Devices
Mobiles device are quickly becoming an indispensable part of our society. Equipped with numerous communication capabilities, they are increasingly being examined as potential tools for civilian and military usage to aide in distributed remote collaboration for dynamic decision making and physical task completion. With an ever growing mobile workforce, the need for remote assistance in aiding field workers who are confronted with situations outside their expertise certainly increases. Enhanced capabilities in using mobile devices could significantly improve numerous components of a task\u27s completion (i.e. accuracy, timing, etc.). This dissertation considers the design of mobile implementation of technology and communication capabilities to support interactive collaboration between distributed team members. Specifically, this body of research seeks to explore and understand how various multimodal remote assistances affect both the human user\u27s performance and the mobile device\u27s effectiveness when used during cooperative tasks. Additionally, power effects are additionally studied to assess the energy demands on a mobile device supporting multimodal communication. In a series of applied experiments and demonstrations, the effectiveness of a mobile device facilitating multimodal collaboration is analyzed through both empirical data collection and subjective exploration. The utility of the mobile interactive system and its configurations are examined to assess the impact on distributed task performance and collaborative dialogue between pairs. The dissertation formulates and defends an argument that multimodal communication capabilities should be incorporated into mobile communication channels to provide collaborating partners salient perspectives with a goal of reaching a mutual understanding of task procedures. The body of research discusses the findings of this investigation and highlight these findings they may influence future mobile research seeking to enhance interactive distributed guidance
New Generation of Instrumented Ranges: Enabling Automated Performance Analysis
Military training conducted on physical ranges that match a unitâs future operational environment provides
an invaluable experience. Today, to conduct a training exercise while ensuring a unitâs performance is
closely observed, evaluated, and reported on in an After Action Review, the unit requires a number of
instructors to accompany the different elements. Training organized on ranges for urban warfighting brings
an additional level of complexityâthe high level of occlusion typical for these environments multiplies the
number of evaluators needed. While the units have great need for such training opportunities, they may not
have the necessary human resources to conduct them successfully. In this paper we report on our US
Navy/ONR-sponsored project aimed at a new generation of instrumented ranges, and the early results we
have achieved. We suggest a radically different concept: instead of recording multiple video streams that
need to be reviewed and evaluated by a number of instructors, our system will focus on capturing dynamic
individual warfighter pose data and performing automated performance evaluation. We will use an in situ
network of automatically-controlled pan-tilt-zoom video cameras and personal position and orientation
sensing devices. Our system will record video, reconstruct dynamic 3D individual poses, analyze,
recognize events, evaluate performances, generate reports, provide real-time free exploration of recorded
data, and even allow the user to generate âwhat-ifâ scenarios that were never recorded. The most direct
benefit for an individual unit will be the ability to conduct training with fewer human resources, while
having a more quantitative account of their performance (dispersion across the terrain, âweapon flaggingâ
incidents, number of patrols conducted). The instructors will have immediate feedback on some elements
of the unitâs performance. Having data sets for multiple units will enable historical trend analysis, thus
providing new insights and benefits for the entire service.Office of Naval Researc
A cognitive ego-vision system for interactive assistance
With increasing computational power and decreasing size, computers nowadays are already wearable and mobile. They become attendant of peoples' everyday life. Personal digital assistants and mobile phones equipped with adequate software gain a lot of interest in public, although the functionality they provide in terms of assistance is little more than a mobile databases for appointments, addresses, to-do lists and photos. Compared to the assistance a human can provide, such systems are hardly to call real assistants. The motivation to construct more human-like assistance systems that develop a certain level of cognitive capabilities leads to the exploration of two central paradigms in this work. The first paradigm is termed cognitive vision systems. Such systems take human cognition as a design principle of underlying concepts and develop learning and adaptation capabilities to be more flexible in their application. They are embodied, active, and situated. Second, the ego-vision paradigm is introduced as a very tight interaction scheme between a user and a computer system that especially eases close collaboration and assistance between these two. Ego-vision systems (EVS) take a user's (visual) perspective and integrate the human in the system's processing loop by means of a shared perception and augmented reality. EVSs adopt techniques of cognitive vision to identify objects, interpret actions, and understand the user's visual perception. And they articulate their knowledge and interpretation by means of augmentations of the user's own view. These two paradigms are studied as rather general concepts, but always with the goal in mind to realize more flexible assistance systems that closely collaborate with its users. This work provides three major contributions. First, a definition and explanation of ego-vision as a novel paradigm is given. Benefits and challenges of this paradigm are discussed as well. Second, a configuration of different approaches that permit an ego-vision system to perceive its environment and its user is presented in terms of object and action recognition, head gesture recognition, and mosaicing. These account for the specific challenges identified for ego-vision systems, whose perception capabilities are based on wearable sensors only. Finally, a visual active memory (VAM) is introduced as a flexible conceptual architecture for cognitive vision systems in general, and for assistance systems in particular. It adopts principles of human cognition to develop a representation for information stored in this memory. So-called memory processes continuously analyze, modify, and extend the content of this VAM. The functionality of the integrated system emerges from their coordinated interplay of these memory processes. An integrated assistance system applying the approaches and concepts outlined before is implemented on the basis of the visual active memory. The system architecture is discussed and some exemplary processing paths in this system are presented and discussed. It assists users in object manipulation tasks and has reached a maturity level that allows to conduct user studies. Quantitative results of different integrated memory processes are as well presented as an assessment of the interactive system by means of these user studies
Towards Live 3D Reconstruction from Wearable Video: An Evaluation of V-SLAM, NeRF, and Videogrammetry Techniques
Mixed reality (MR) is a key technology which promises to change the future of
warfare. An MR hybrid of physical outdoor environments and virtual military
training will enable engagements with long distance enemies, both real and
simulated. To enable this technology, a large-scale 3D model of a physical
environment must be maintained based on live sensor observations. 3D
reconstruction algorithms should utilize the low cost and pervasiveness of
video camera sensors, from both overhead and soldier-level perspectives.
Mapping speed and 3D quality can be balanced to enable live MR training in
dynamic environments. Given these requirements, we survey several 3D
reconstruction algorithms for large-scale mapping for military applications
given only live video. We measure 3D reconstruction performance from common
structure from motion, visual-SLAM, and photogrammetry techniques. This
includes the open source algorithms COLMAP, ORB-SLAM3, and NeRF using
Instant-NGP. We utilize the autonomous driving academic benchmark KITTI, which
includes both dashboard camera video and lidar produced 3D ground truth. With
the KITTI data, our primary contribution is a quantitative evaluation of 3D
reconstruction computational speed when considering live video.Comment: Accepted to 2022 Interservice/Industry Training, Simulation, and
Education Conference (I/ITSEC), 13 page
3D Modelling from Real Data
The genesis of a 3D model has basically two definitely different paths. Firstly we can consider the CAD generated models, where the shape is defined according to a user drawing action, operating with different mathematical âbricksâ like B-Splines, NURBS or subdivision surfaces (mathematical CAD modelling), or directly drawing small polygonal planar facets in space, approximating with them complex free form shapes (polygonal CAD modelling). This approach can be used for both ideal elements (a project, a fantasy shape in the mind of a designer, a 3D cartoon, etc.) or for real objects. In the latter case the object has to be first surveyed in order to generate a drawing coherent with the real stuff. If the surveying process is not only a rough acquisition of simple distances with a substantial amount of manual drawing, a scene can be modelled in 3D by capturing with a digital instrument many points of its geometrical features and connecting them by polygons to produce a 3D result similar to a polygonal CAD model, with the difference that the shape generated is in this case an accurate 3D acquisition of a real object (reality-based polygonal modelling).
Considering only device operating on the ground, 3D capturing techniques for the generation of reality-based 3D models may span from passive sensors and image data (Remondino and El-Hakim, 2006), optical active sensors and range data (Blais, 2004; Shan & Toth, 2008; Vosselman and Maas, 2010), classical surveying (e.g. total stations or Global Navigation Satellite System - GNSS), 2D maps (Yin et al., 2009) or an integration of the aforementioned methods (Stumpfel et al., 2003; Guidi et al., 2003; Beraldin, 2004; Stamos et al., 2008; Guidi et al., 2009a; Remondino et al., 2009; Callieri et al., 2011). The choice depends on the required resolution and accuracy, object dimensions, location constraints, instrumentâs portability and usability, surface characteristics, working team experience, projectâs budget, final goal, etc.
Although aware of the potentialities of the image-based approach and its recent developments in automated and dense image matching for non-expert the easy usability and reliability of optical active sensors in acquiring 3D data is generally a good motivation to decline image-based approaches. Moreover the great advantage of active sensors is the fact that they deliver immediately dense and detailed 3D point clouds, whose coordinate are metrically defined. On the other hand image data require some processing and a mathematical formulation to transform the two-dimensional image measurements into metric three-dimensional coordinates. Image-based modelling techniques (mainly photogrammetry and computer vision) are generally preferred in cases of monuments or architectures with regular geometric shapes, low budget projects, good experience of the working team, time or location constraints for the data acquisition and processing.
This chapter is intended as an updated review of reality-based 3D modelling in terrestrial applications, with the different categories of 3D sensing devices and the related data processing pipelines
Sensor fusion in smart camera networks for ambient intelligence
This short report introduces the topics of PhD research that was conducted on 2008-2013 and was defended on July 2013. The PhD thesis covers sensor fusion theory, gathers it into a framework with design rules for fusion-friendly design of vision networks, and elaborates on the rules through fusion experiments performed with four distinct applications of Ambient Intelligence
- âŠ