86 research outputs found

    Contents

    Get PDF

    Real time correlation-based stereo: algorithm, implementations and applications

    Get PDF
    This paper describes some of the work on stereo that has been going on at INRIA in the last four years. The work has concentrated on obtaining dense, accurate and reliable range maps of the environment at rates compatible with the real-time constraints of such applications as the navigation of mobile vehicles in man-made or natural environments. The class of algorithms which has been selected among several is the class of algorithms which has been selected among several is the class of correlation-based stereo algorithms because they are the only ones that can produce sufficiently dense range maps with an algoritmic structure which lends itself nicely to fast implementations because of the simplicity of the underlying computation. We describe the various improvements that we have brought to the original idea, including validation and characterization of the quality of the matches, a recursive implementation of the score computation which makes the method independent of the size of the correlation window and a calibration method which does not require the use of a calibration pattern. We then describe two implementations of this algorithm on two very different pieces of hardware. The first implementation is on a board with four digital signal processors designed jointly with Matra MSII. This implementation can produce 64x64 range maps at rate varying between 200 and 400 ms, depending upon the range of disparities. The second implementation is on a board developed by DEC-PRL and can perform the cross-correlation of two 256X256 images in 140 ms. The first implementation has been integrated in the navigation system of the INRIA cart and used to correct for inertial and odometric errors in navigation experiments both indoors and outdoors on road. This is the first application of our correlation-based algorithm which is described in the paper. The second application has been done jointly with people from the french national space agence (CNES) to study the possibility of using stereo on a future planetary rover for the construction of digital elevation maps. We have shown that real time stereo is possible today at low-cost and can be applied in real applications. The algorithm that has been described is not the most sophisticated available but we have made it robust and reliable thanks to a number of improvements. Evan though each of these improvements is not earth-shattering from the pure research point of view, altogether they have allowed us to go beyond a very important threshold. This threshold measures the difference between a program that runs in the laboratory on a few images and one that works continuously for hours on a sequence of stereo pairs and produces results at such rates and of such quality that they can be used to guide a real vehicle or to produce discrete elevation maps. We believe that this threshold has only been reached in a very small number of cases

    Investigation of Computer Vision Concepts and Methods for Structural Health Monitoring and Identification Applications

    Get PDF
    This study presents a comprehensive investigation of methods and technologies for developing a computer vision-based framework for Structural Health Monitoring (SHM) and Structural Identification (St-Id) for civil infrastructure systems, with particular emphasis on various types of bridges. SHM is implemented on various structures over the last two decades, yet, there are some issues such as considerable cost, field implementation time and excessive labor needs for the instrumentation of sensors, cable wiring work and possible interruptions during implementation. These issues make it only viable when major investments for SHM are warranted for decision making. For other cases, there needs to be a practical and effective solution, which computer-vision based framework can be a viable alternative. Computer vision based SHM has been explored over the last decade. Unlike most of the vision-based structural identification studies and practices, which focus either on structural input (vehicle location) estimation or on structural output (structural displacement and strain responses) estimation, the proposed framework combines the vision-based structural input and the structural output from non-contact sensors to overcome the limitations given above. First, this study develops a series of computer vision-based displacement measurement methods for structural response (structural output) monitoring which can be applied to different infrastructures such as grandstands, stadiums, towers, footbridges, small/medium span concrete bridges, railway bridges, and long span bridges, and under different loading cases such as human crowd, pedestrians, wind, vehicle, etc. Structural behavior, modal properties, load carrying capacities, structural serviceability and performance are investigated using vision-based methods and validated by comparing with conventional SHM approaches. In this study, some of the most famous landmark structures such as long span bridges are utilized as case studies. This study also investigated the serviceability status of structures by using computer vision-based methods. Subsequently, issues and considerations for computer vision-based measurement in field application are discussed and recommendations are provided for better results. This study also proposes a robust vision-based method for displacement measurement using spatio-temporal context learning and Taylor approximation to overcome the difficulties of vision-based monitoring under adverse environmental factors such as fog and illumination change. In addition, it is shown that the external load distribution on structures (structural input) can be estimated by using visual tracking, and afterward load rating of a bridge can be determined by using the load distribution factors extracted from computer vision-based methods. By combining the structural input and output results, the unit influence line (UIL) of structures are extracted during daily traffic just using cameras from which the external loads can be estimated by using just cameras and extracted UIL. Finally, the condition assessment at global structural level can be achieved using the structural input and output, both obtained from computer vision approaches, would give a normalized response irrespective of the type and/or load configurations of the vehicles or human loads

    Advances in Stereo Vision

    Get PDF
    Stereopsis is a vision process whose geometrical foundation has been known for a long time, ever since the experiments by Wheatstone, in the 19th century. Nevertheless, its inner workings in biological organisms, as well as its emulation by computer systems, have proven elusive, and stereo vision remains a very active and challenging area of research nowadays. In this volume we have attempted to present a limited but relevant sample of the work being carried out in stereo vision, covering significant aspects both from the applied and from the theoretical standpoints

    On Motion Analysis in Computer Vision with Deep Learning: Selected Case Studies

    Get PDF
    Motion analysis is one of the essential enabling technologies in computer vision. Despite recent significant advances, image-based motion analysis remains a very challenging problem. This challenge arises because the motion features are extracted directory from a sequence of images without any other meta data information. Extracting motion information (features) is inherently more difficult than in other computer vision disciplines. In a traditional approach, the motion analysis is often formulated as an optimisation problem, with the motion model being hand-crafted to reflect our understanding of the problem domain. The critical element of these traditional methods is a prior assumption about the model of motion believed to represent a specific problem. Data analytics’ recent trend is to replace hand-crafted prior assumptions with a model learned directly from observational data with no, or very limited, prior assumptions about that model. Although known for a long time, these approaches, based on machine learning, have been shown competitive only very recently due to advances in the so-called deep learning methodologies. This work's key aim has been to investigate novel approaches, utilising the deep learning methodologies, for motion analysis where the motion model is learned directly from observed data. These new approaches have focused on investigating the deep network architectures suitable for the effective extraction of spatiotemporal information. Due to the estimated motion parameters' volume and structure, it is frequently difficult or even impossible to obtain relevant ground truth data. Missing ground truth leads to choose the unsupervised learning methodologies which is usually represents challenging choice to utilize in already challenging high dimensional motion representation of the image sequence. The main challenge with unsupervised learning is to evaluate if the algorithm can learn the data model directly from the data only without any prior knowledge presented to the deep learning model during In this project, an emphasis has been put on the unsupervised learning approaches. Owning to a broad spectrum of computer vision problems and applications related to motion analysis, the research reported in the thesis has focused on three specific motion analysis challenges and corresponding practical case studies. These include motion detection and recognition, as well as 2D and 3D motion field estimation. Eyeblinks quantification has been used as a case study for the motion detection and recognition problem. The approach proposed for this problem consists of a novel network architecture processing weakly corresponded images in an action completion regime with learned spatiotemporal image features fused using cascaded recurrent networks. The stereo-vision disparity estimation task has been selected as a case study for the 2D motion field estimation problem. The proposed method directly estimates occlusion maps using novel convolutional neural network architecture that is trained with a custom-designed loss function in an unsupervised manner. The volumetric data registration task has been chosen as a case study for the 3D motion field estimation problem. The proposed solution is based on the 3D CNN, with a novel architecture featuring a Generative Adversarial Network used during training to improve network performance for unseen data. All the proposed networks demonstrated a state-of-the-art performance compared to other corresponding methods reported in the literature on a number of assessment metrics. In particular, the proposed architecture for 3D motion field estimation has shown to outperform the previously reported manual expert-guided registration methodology

    Towards Intelligent Telerobotics: Visualization and Control of Remote Robot

    Get PDF
    Human-machine cooperative or co-robotics has been recognized as the next generation of robotics. In contrast to current systems that use limited-reasoning strategies or address problems in narrow contexts, new co-robot systems will be characterized by their flexibility, resourcefulness, varied modeling or reasoning approaches, and use of real-world data in real time, demonstrating a level of intelligence and adaptability seen in humans and animals. The research I focused is in the two sub-field of co-robotics: teleoperation and telepresence. We firstly explore the ways of teleoperation using mixed reality techniques. I proposed a new type of display: hybrid-reality display (HRD) system, which utilizes commodity projection device to project captured video frame onto 3D replica of the actual target surface. It provides a direct alignment between the frame of reference for the human subject and that of the displayed image. The advantage of this approach lies in the fact that no wearing device needed for the users, providing minimal intrusiveness and accommodating users eyes during focusing. The field-of-view is also significantly increased. From a user-centered design standpoint, the HRD is motivated by teleoperation accidents, incidents, and user research in military reconnaissance etc. Teleoperation in these environments is compromised by the Keyhole Effect, which results from the limited field of view of reference. The technique contribution of the proposed HRD system is the multi-system calibration which mainly involves motion sensor, projector, cameras and robotic arm. Due to the purpose of the system, the accuracy of calibration should also be restricted within millimeter level. The followed up research of HRD is focused on high accuracy 3D reconstruction of the replica via commodity devices for better alignment of video frame. Conventional 3D scanner lacks either depth resolution or be very expensive. We proposed a structured light scanning based 3D sensing system with accuracy within 1 millimeter while robust to global illumination and surface reflection. Extensive user study prove the performance of our proposed algorithm. In order to compensate the unsynchronization between the local station and remote station due to latency introduced during data sensing and communication, 1-step-ahead predictive control algorithm is presented. The latency between human control and robot movement can be formulated as a linear equation group with a smooth coefficient ranging from 0 to 1. This predictive control algorithm can be further formulated by optimizing a cost function. We then explore the aspect of telepresence. Many hardware designs have been developed to allow a camera to be placed optically directly behind the screen. The purpose of such setups is to enable two-way video teleconferencing that maintains eye-contact. However, the image from the see-through camera usually exhibits a number of imaging artifacts such as low signal to noise ratio, incorrect color balance, and lost of details. Thus we develop a novel image enhancement framework that utilizes an auxiliary color+depth camera that is mounted on the side of the screen. By fusing the information from both cameras, we are able to significantly improve the quality of the see-through image. Experimental results have demonstrated that our fusion method compares favorably against traditional image enhancement/warping methods that uses only a single image

    Non-contact vision-based deformation monitoring on bridge structures

    Get PDF
    Information on deformation is an important metric for bridge condition and performance assessment, e.g. identifying abnormal events, calibrating bridge models and estimating load carrying capacities, etc. However, accurate measurement of bridge deformation, especially for long-span bridges remains as a challenging task. The major aim of this research is to develop practical and cost-effective techniques for accurate deformation monitoring on bridge structures. Vision-based systems are taken as the study focus due to a few reasons: low cost, easy installation, desired sample rates, remote and distributed sensing, etc. This research proposes an custom-developed vision-based system for bridge deformation monitoring. The system supports either consumer-grade or professional cameras and incorporates four advanced video tracking methods to adapt to different test situations. The sensing accuracy is firstly quantified in laboratory conditions. The working performance in field testing is evaluated on one short-span and one long-span bridge examples considering several influential factors i.e. long-range sensing, low-contrast target patterns, pattern changes and lighting changes. Through case studies, some suggestions about tracking method selection are summarised for field testing. Possible limitations of vision-based systems are illustrated as well. To overcome observed limitations of vision-based systems, this research further proposes a mixed system combining cameras with accelerometers for accurate deformation measurement. To integrate displacement with acceleration data autonomously, a novel data fusion method based on Kalman filter and maximum likelihood estimation is proposed. Through field test validation, the method is effective for improving displacement accuracy and widening frequency bandwidth. The mixed system based on data fusion is implemented on field testing of a railway bridge considering undesired test conditions (e.g. low-contrast target patterns and camera shake). Analysis results indicate that the system offers higher accuracy than using a camera alone and is viable for bridge influence line estimation. With considerable accuracy and resolution in time and frequency domains, the potential of vision-based measurement for vibration monitoring is investigated. The proposed vision-based system is applied on a cable-stayed footbridge for deck deformation and cable vibration measurement under pedestrian loading. Analysis results indicate that the measured data enables accurate estimation of modal frequencies and could be used to investigate variations of modal frequencies under varying pedestrian loads. The vision-based system in this application is used for multi-point vibration measurement and provides results comparable to those obtained using an array of accelerometers

    Robust airborne 3D visual simultaneous localisation and mapping

    Get PDF
    The aim of this thesis is to present robust solutions to technical problems of airborne three-dimensional (3D) Visual Simultaneous Localisation And Mapping (VSLAM). These solutions are developed based on a stereovision system available onboard Unmanned Aerial Vehicles (UAVs). The proposed airborne VSLAM enables unmanned aerial vehicles to construct a reliable map of an unknown environment and localise themselves within this map without any user intervention. Current research challenges related to Airborne VSLAM include the visual processing through invariant feature detectors/descriptors, efficient mapping of large environments and cooperative navigation and mapping of complex environments. Most of these challenges require scalable representations, robust data association algorithms, consistent estimation techniques, and fusion of different sensor modalities. To deal with these challenges, seven Chapters are presented in this thesis as follows: Chapter 1 introduces UAVs, definitions, current challenges and different applications. Next, in Chapter 2 we present the main sensors used by UAVs during navigation. Chapter 3 presents an important task for autonomous navigation which is UAV localisation. In this chapter, some robust and optimal approaches for data fusion are proposed with performance analysis. After that, UAV map building is presented in Chapter 4. This latter is divided into three parts. In the first part, a new imaging alternative technique is proposed to extract and match a suitable number of invariant features. The second part presents an image mosaicing algorithm followed by a super-resolution approach. In the third part, we propose a new feature detector and descriptor that is fast, robust and detect suitable number of features to solve the VSLAM problem. A complete Airborne Visual Simultaneous Localisation and Mapping (VSLAM) solution based on a stereovision system is presented in Chapter (5). Robust data association filters with consistency and observability analysis are presented in this chapter as well. The proposed algorithm is validated with loop closing detection and map management using experimental data. The airborne VSLAM is extended then to the multiple UAVs case in Chapter (6). This chapter presents two architectures of cooperation: a Centralised and a Decentralised. The former provides optimal precision in terms of UAV positions and constructed map while the latter is more suitable for real time and embedded system applications. Finally, conclusions and future works are presented in Chapter (7).EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Photogrammetric suite to manage the survey workflow in challenging environments and conditions

    Get PDF
    The present work is intended in providing new and innovative instruments to support the photogrammetric survey workflow during all its phases. A suite of tools has been conceived in order to manage the planning, the acquisition, the post-processing and the restitution steps, with particular attention to the rigorousness of the approach and to the final precision. The main focus of the research has been the implementation of the tool MAGO, standing for Adaptive Mesh for Orthophoto Generation. Its novelty consists in the possibility to automatically reconstruct \u201cunrolled\u201d orthophotos of adjacent fa\ue7ades of a building using the point cloud, instead of the mesh, as input source for the orthophoto reconstruction. The second tool has been conceived as a photogrammetric procedure based on Bundle Block Adjustment. The same issue is analysed from two mirrored perspectives: on the one hand, the use of moving cameras in a static scenario in order to manage real-time indoor navigation; on the other hand, the use of static cameras in a moving scenario in order to achieve the simultaneously reconstruction of the 3D model of the changing object. A third tool named U.Ph.O., standing for Unmanned Photogrammetric Office, has been integrated with a new module. The general aim is on the one hand to plan the photogrammetric survey considering the expected precision, computed on the basis of a network simulation, and on the other hand to check if the achieved survey has been collected compatibly with the planned conditions. The provided integration concerns the treatment of surfaces with a generic orientation further than the ones with a planimetric development. After a brief introduction, a general description about the photogrammetric principles is given in the first chapter of the dissertation; a chapter follows about the parallelism between Photogrammetry and Computer Vision and the contribution of this last in the development of the described tools. The third chapter specifically regards, indeed, the implemented software and tools, while the fourth contains the training test and the validation. Finally, conclusions and future perspectives are reported
    corecore