521 research outputs found

    Memory-efficient belief propagation for high-definition real-time stereo matching systems

    Get PDF
    Tele-presence systems will enable participants to feel like they are physically together. In order to improve this feeling, these systems are starting to include depth estimation capabilities. A typical requirement for these systems includes high definition, good quality results and low latency. Benchmarks demonstrate that stereo-matching algorithms using Belief Propagation (BP) produce the best results. The execution time of the BP algorithm in a CPU cannot satisfy real-time requirements with high-definition images. GPU-based implementations of BP algorithms are only able to work in real-time with small-medium size images because the traffic with memory limits their applicability. The inherent parallelism of the BP algorithm makes FPGA-based solutions a good choice. However, even though the memory traffic of a commercial FPGA-based ASIC-prototyping board is high, it is still not enough to comply with realtime, high definition and good immersive feeling requirements. The work presented estimates depth maps in less than 40 milliseconds for high-definition images at 30fps with 80 disparity levels. The proposed double BP topology and the new data-cost estimation improve the overall classical BP performance while they reduce the memory traffic by about 21%. Moreover, the adaptive message compression method and message distribution in memory reduce the number of memory accesses by more than 70% with an almost negligible loss of performance. The total memory traffic reduction is about 90%, demonstrating sufficient quality to be classified within the first 40 positions in the Middlebury ranking.This work has been partially supported by the CDTI under project CENIT-VISION 2007-1007 and the CICYT under TEC2008-04107

    Uncertainty Minimization in Robotic 3D Mapping Systems Operating in Dynamic Large-Scale Environments

    Get PDF
    This dissertation research is motivated by the potential and promise of 3D sensing technologies in safety and security applications. With specific focus on unmanned robotic mapping to aid clean-up of hazardous environments, under-vehicle inspection, automatic runway/pavement inspection and modeling of urban environments, we develop modular, multi-sensor, multi-modality robotic 3D imaging prototypes using localization/navigation hardware, laser range scanners and video cameras. While deploying our multi-modality complementary approach to pose and structure recovery in dynamic real-world operating conditions, we observe several data fusion issues that state-of-the-art methodologies are not able to handle. Different bounds on the noise model of heterogeneous sensors, the dynamism of the operating conditions and the interaction of the sensing mechanisms with the environment introduce situations where sensors can intermittently degenerate to accuracy levels lower than their design specification. This observation necessitates the derivation of methods to integrate multi-sensor data considering sensor conflict, performance degradation and potential failure during operation. Our work in this dissertation contributes the derivation of a fault-diagnosis framework inspired by information complexity theory to the data fusion literature. We implement the framework as opportunistic sensing intelligence that is able to evolve a belief policy on the sensors within the multi-agent 3D mapping systems to survive and counter concerns of failure in challenging operating conditions. The implementation of the information-theoretic framework, in addition to eliminating failed/non-functional sensors and avoiding catastrophic fusion, is able to minimize uncertainty during autonomous operation by adaptively deciding to fuse or choose believable sensors. We demonstrate our framework through experiments in multi-sensor robot state localization in large scale dynamic environments and vision-based 3D inference. Our modular hardware and software design of robotic imaging prototypes along with the opportunistic sensing intelligence provides significant improvements towards autonomous accurate photo-realistic 3D mapping and remote visualization of scenes for the motivating applications

    TractorEYE: Vision-based Real-time Detection for Autonomous Vehicles in Agriculture

    Get PDF
    Agricultural vehicles such as tractors and harvesters have for decades been able to navigate automatically and more efficiently using commercially available products such as auto-steering and tractor-guidance systems. However, a human operator is still required inside the vehicle to ensure the safety of vehicle and especially surroundings such as humans and animals. To get fully autonomous vehicles certified for farming, computer vision algorithms and sensor technologies must detect obstacles with equivalent or better than human-level performance. Furthermore, detections must run in real-time to allow vehicles to actuate and avoid collision.This thesis proposes a detection system (TractorEYE), a dataset (FieldSAFE), and procedures to fuse information from multiple sensor technologies to improve detection of obstacles and to generate a map. TractorEYE is a multi-sensor detection system for autonomous vehicles in agriculture. The multi-sensor system consists of three hardware synchronized and registered sensors (stereo camera, thermal camera and multi-beam lidar) mounted on/in a ruggedized and water-resistant casing. Algorithms have been developed to run a total of six detection algorithms (four for rgb camera, one for thermal camera and one for a Multi-beam lidar) and fuse detection information in a common format using either 3D positions or Inverse Sensor Models. A GPU powered computational platform is able to run detection algorithms online. For the rgb camera, a deep learning algorithm is proposed DeepAnomaly to perform real-time anomaly detection of distant, heavy occluded and unknown obstacles in agriculture. DeepAnomaly is -- compared to a state-of-the-art object detector Faster R-CNN -- for an agricultural use-case able to detect humans better and at longer ranges (45-90m) using a smaller memory footprint and 7.3-times faster processing. Low memory footprint and fast processing makes DeepAnomaly suitable for real-time applications running on an embedded GPU. FieldSAFE is a multi-modal dataset for detection of static and moving obstacles in agriculture. The dataset includes synchronized recordings from a rgb camera, stereo camera, thermal camera, 360-degree camera, lidar and radar. Precise localization and pose is provided using IMU and GPS. Ground truth of static and moving obstacles (humans, mannequin dolls, barrels, buildings, vehicles, and vegetation) are available as an annotated orthophoto and GPS coordinates for moving obstacles. Detection information from multiple detection algorithms and sensors are fused into a map using Inverse Sensor Models and occupancy grid maps. This thesis presented many scientific contribution and state-of-the-art within perception for autonomous tractors; this includes a dataset, sensor platform, detection algorithms and procedures to perform multi-sensor fusion. Furthermore, important engineering contributions to autonomous farming vehicles are presented such as easily applicable, open-source software packages and algorithms that have been demonstrated in an end-to-end real-time detection system. The contributions of this thesis have demonstrated, addressed and solved critical issues to utilize camera-based perception systems that are essential to make autonomous vehicles in agriculture a reality

    Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

    Full text link
    The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys & Tutorials, peer-reviewed academic journa

    Improving 3D Reconstruction using Deep Learning Priors

    Get PDF
    Modeling the 3D geometry of shapes and the environment around us has many practical applications in mapping, navigation, virtual/ augmented reality, and autonomous robots. In general, the acquisition of 3D models relies on using passive images or using active depth sensors such as structured light systems that use external infrared projectors. Although active methods provide very robust and reliable depth information, they have limited use cases and heavy power requirements, which makes passive techniques more suitable for day-to-day user applications. Image-based depth acquisition systems usually face challenges representing thin, textureless, or specular surfaces and regions in shadows or low-light environments. While scene depth information can be extracted from the set of passive images, fusion of depth information from several views into a consistent 3D representation remains a challenging task. The most common challenges in 3D environment capture include the use of efficient scene representation that preserves the details, thin structures, and ensures overall completeness of the reconstruction. In this thesis, we illustrate the use of deep learning techniques to resolve some of the challenges of image-based depth acquisition and 3D scene representation. We use a deep learning framework to learn priors over scene geometry and scene global context for solving several ambiguous and ill-posed problems such as estimating depth on textureless surfaces and producing complete 3D reconstruction for partially observed scenes. More specifically, we propose that using deep learning priors, a simple stereo camera system can be used to reconstruct a typical apartment size indoor scene environments with the fidelity that approaches the quality of a much more expensive state-of-the-art active depth-sensing system. Furthermore, we describe how deep learning priors on local shapes can represent 3D environments more efficiently than with traditional systems while at the same time preserving details and completing surfaces.Doctor of Philosoph

    Active recognition through next view planning: a survey

    Full text link

    Proceedings of the NASA Conference on Space Telerobotics, volume 3

    Get PDF
    The theme of the Conference was man-machine collaboration in space. The Conference provided a forum for researchers and engineers to exchange ideas on the research and development required for application of telerobotics technology to the space systems planned for the 1990s and beyond. The Conference: (1) provided a view of current NASA telerobotic research and development; (2) stimulated technical exchange on man-machine systems, manipulator control, machine sensing, machine intelligence, concurrent computation, and system architectures; and (3) identified important unsolved problems of current interest which can be dealt with by future research

    Proceedings of the NASA Conference on Space Telerobotics, volume 2

    Get PDF
    These proceedings contain papers presented at the NASA Conference on Space Telerobotics held in Pasadena, January 31 to February 2, 1989. The theme of the Conference was man-machine collaboration in space. The Conference provided a forum for researchers and engineers to exchange ideas on the research and development required for application of telerobotics technology to the space systems planned for the 1990s and beyond. The Conference: (1) provided a view of current NASA telerobotic research and development; (2) stimulated technical exchange on man-machine systems, manipulator control, machine sensing, machine intelligence, concurrent computation, and system architectures; and (3) identified important unsolved problems of current interest which can be dealt with by future research

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
    • …
    corecore