668 research outputs found

    Markerless deformation capture of hoverfly wings using multiple calibrated cameras

    Get PDF
    This thesis introduces an algorithm for the automated deformation capture of hoverfly wings from multiple camera image sequences. The algorithm is capable of extracting dense surface measurements, without the aid of fiducial markers, over an arbitrary number of wingbeats of hovering flight and requires limited manual initialisation. A novel motion prediction method, called the ‘normalised stroke model’, makes use of the similarity of adjacent wing strokes to predict wing keypoint locations, which are then iteratively refined in a stereo image registration procedure. Outlier removal, wing fitting and further refinement using independently reconstructed boundary points complete the algorithm. It was tested on two hovering data sets, as well as a challenging flight manoeuvre. By comparing the 3-d positions of keypoints extracted from these surfaces with those resulting from manual identification, the accuracy of the algorithm is shown to approach that of a fully manual approach. In particular, half of the algorithm-extracted keypoints were within 0.17mm of manually identified keypoints, approximately equal to the error of the manual identification process. This algorithm is unique among purely image based flapping flight studies in the level of automation it achieves, and its generality would make it applicable to wing tracking of other insects

    Review of machine-vision based methodologies for displacement measurement in civil structures

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this record.Vision-based systems are promising tools for displacement measurement in civil structures, possessing advantages over traditional displacement sensors in instrumentation cost, installation efforts and measurement capacity in terms of frequency range and spatial resolution. Approximately one hundred papers to date have appeared on this subject, investigating topics like: system development and improvement, the viability on field applications and the potential for structural condition assessment. The main contribution of this paper is to present a literature review of vision-based displacement measurement, from the perspectives of methodologies and applications. Video processing procedures in this paper are summarised as a three-component framework, camera calibration, target tracking and structural displacement calculation. Methods for each component are presented in principle, with discussions about the relative advantages and limitations. Applications in the two most active fields: bridge deformation and cable vibration measurement are examined followed by a summary of field challenges observed in monitoring tests. Important gaps requiring further investigation are presented e.g. robust tracking methods, non-contact sensing and measurement accuracy evaluation in field conditions

    Non-contact vision-based deformation monitoring on bridge structures

    Get PDF
    Information on deformation is an important metric for bridge condition and performance assessment, e.g. identifying abnormal events, calibrating bridge models and estimating load carrying capacities, etc. However, accurate measurement of bridge deformation, especially for long-span bridges remains as a challenging task. The major aim of this research is to develop practical and cost-effective techniques for accurate deformation monitoring on bridge structures. Vision-based systems are taken as the study focus due to a few reasons: low cost, easy installation, desired sample rates, remote and distributed sensing, etc. This research proposes an custom-developed vision-based system for bridge deformation monitoring. The system supports either consumer-grade or professional cameras and incorporates four advanced video tracking methods to adapt to different test situations. The sensing accuracy is firstly quantified in laboratory conditions. The working performance in field testing is evaluated on one short-span and one long-span bridge examples considering several influential factors i.e. long-range sensing, low-contrast target patterns, pattern changes and lighting changes. Through case studies, some suggestions about tracking method selection are summarised for field testing. Possible limitations of vision-based systems are illustrated as well. To overcome observed limitations of vision-based systems, this research further proposes a mixed system combining cameras with accelerometers for accurate deformation measurement. To integrate displacement with acceleration data autonomously, a novel data fusion method based on Kalman filter and maximum likelihood estimation is proposed. Through field test validation, the method is effective for improving displacement accuracy and widening frequency bandwidth. The mixed system based on data fusion is implemented on field testing of a railway bridge considering undesired test conditions (e.g. low-contrast target patterns and camera shake). Analysis results indicate that the system offers higher accuracy than using a camera alone and is viable for bridge influence line estimation. With considerable accuracy and resolution in time and frequency domains, the potential of vision-based measurement for vibration monitoring is investigated. The proposed vision-based system is applied on a cable-stayed footbridge for deck deformation and cable vibration measurement under pedestrian loading. Analysis results indicate that the measured data enables accurate estimation of modal frequencies and could be used to investigate variations of modal frequencies under varying pedestrian loads. The vision-based system in this application is used for multi-point vibration measurement and provides results comparable to those obtained using an array of accelerometers

    Augmented reality over maps

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaMaps and Geographic Information System (GIS) play a major role in modern society, particularly on tourism, navigation and personal guidance. However, providing geographical information of interest related to individual queries remains a strenuous task. The main constraints are (1) the several information scales available, (2) the large amount of information available on each scale, and (3) difficulty in directly infer a meaningful geographical context from text, pictures, or diagrams that are used by most user-aiding systems. To that extent, and to overcome the aforementioned difficulties, we develop a solution which allows the overlap of visual information over the maps being queried — a method commonly referred to as Augmented Reality (AR). With that in mind, the object of this dissertation is the research and implementation of a method for the delivery of visual cartographic information over physical (analogue) and digital two-dimensional (2D) maps utilizing AR. We review existing state-of-art solutions and outline their limitations across different use cases. Afterwards, we provide a generic modular solution for a multitude of real-life applications, to name a few: museums, fairs, expositions, and public street maps. During the development phase, we take into consideration the trade-off between speed and accuracy in order to develop an accurate and real-time solution. Finally, we demonstrate the feasibility of our methods with an application on a real use case based on a map of the city of Oporto, in Portugal.Mapas e Sistema de Informação Geográfica (GIS) desempenham um papel importante na sociedade, particularmente no turismo, navegação e orientação pessoal. No entanto, fornecer informações geográficas de interesse a consultas dos utilizadores é uma tarefa árdua. Os principais dificuldades são (1) as várias escalas de informações disponíveis, (2) a grande quantidade de informação disponível em cada escala e (3) dificuldade em inferir diretamente um contexto geográfico significativo a partir dos textos, figuras ou diagramas usados. Assim, e para superar as dificuldades mencionadas, desenvolvemos uma solução que permite a sobreposição de informações visuais sobre os mapas que estão a ser consultados - um método geralmente conhecido como Realidade Aumentada (AR). Neste sentido, o objetivo desta dissertação é a pesquisa e implementação de um método para a visualização de informações cartográficas sobre mapas 2D físicos (analógicos) e digitais utilizando AR. Em primeiro lugar, analisamos o estado da arte juntamente com as soluções existentes e também as suas limitações nas diversas utilizações possíveis. Posteriormente, fornecemos uma solução modular genérica para uma várias aplicações reais tais como: museus, feiras, exposições e mapas públicos de ruas. Durante a fase de desenvolvimento, tivemos em consideração o compromisso entre velocidade e precisão, a fim de desenvolver uma solução precisa que funciona em tempo real. Por fim, demonstramos a viabilidade de nossos métodos com uma aplicação num caso de uso real baseado num mapa da cidade do Porto (Portugal)

    Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments

    Get PDF
    In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method

    Development of an active vision system for robot inspection of complex objects

    Get PDF
    Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection of leather pieces. This work starts with a literature review about the two main methods for collecting the necessary data to perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first ones work with some kind of support mounted on the hand that holds all the necessary sensors to measure the desired parameters. While the second ones recur to one or more cameras to capture the hands and through computer vision algorithms track their position and configuration. The selected method for this work was the vision-based method Openpose. For each recorded image, this application can locate 21 hand keypoints on each hand that together form a skeleton of the hands. This application is used in the tracking system developed throughout this dissertation. Its information is used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates of these points considering the pinhole camera model. To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation, it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates of all the remaining points on the hand. This solution solves the problems related to hand occlusions that a prone to happen due to the use of only one camera to record the inspection videos. Once the world location of all the points in the hands is accurately estimated, their orientation can be defined by selecting three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro, com recurso a manipuladores robóticos. O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo. Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto. Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto, para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser definida com recurso a quaisquer três pontos que definam um plano

    Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping

    Full text link
    This paper presents a novel real-time method for tracking salient closed boundaries from video image sequences. This method operates on a set of straight line segments that are produced by line detection. The tracking scheme is coherently integrated into a perceptual grouping framework in which the visual tracking problem is tackled by identifying a subset of these line segments and connecting them sequentially to form a closed boundary with the largest saliency and a certain similarity to the previous one. Specifically, we define a new tracking criterion which combines a grouping cost and an area similarity constraint. The proposed criterion makes the resulting boundary tracking more robust to local minima. To achieve real-time tracking performance, we use Delaunay Triangulation to build a graph model with the detected line segments and then reduce the tracking problem to finding the optimal cycle in this graph. This is solved by our newly proposed closed boundary candidates searching algorithm called "Bidirectional Shortest Path (BDSP)". The efficiency and robustness of the proposed method are tested on real video sequences as well as during a robot arm pouring experiment.Comment: 7 pages, 8 figures, The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017) submission ID 103

    Proceedings of the 2020 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    In 2020 fand der jährliche Workshop des Faunhofer IOSB und the Lehrstuhls für interaktive Echtzeitsysteme statt. Vom 27. bis zum 31. Juli trugen die Doktorranden der beiden Institute über den Stand ihrer Forschung vor in Themen wie KI, maschinellen Lernen, computer vision, usage control, Metrologie vor. Die Ergebnisse dieser Vorträge sind in diesem Band als technische Berichte gesammelt

    The application of range imaging for improved local feature representations

    Get PDF
    This thesis presents an investigation into the integration of information extracted from co-aligned range and intensity images to achieve pose invariant object recognition. Local feature matching is a fundamental technique in image analysis that underpins many computer vision-based applications; the approach comprises identifying a collection of interest points in an image, characterising the local image region surrounding the interest point by means of a descriptor, and matching these descriptors between example images. Such local feature descriptors are formed from a measure of the local image statistics in the region surrounding the interest point. The interest point locations and the means of measuring local image statistics should be chosen such that resultant descriptor remains stable across a range of common image transformations. Recently the availability of low cost, high quality range imaging devices has motivated an interest in local feature extraction from range images. It has been widely assumed in the vision community that the range imaging domain has properties which remain quasi-invariant through a wide range of changes in illumination and pose. Accordingly, it has been suggested that local feature extraction in the range domain should allow the calculation of local feature descriptors that are potentially more robust than those calculated from the intensity imaging domain alone. However, range images represent differing characteristics from those represented within intensity images which are frequently used, independently from range images, to create robust local features. Therefore, this work attempts to establish the best means of combining information from these two imaging modalities to further increase the reliability of matching local features. Local feature extraction comprises a series of processes applied to an image location such that a collection of repeatable descriptors can be established. By using co-aligned range and intensity images this work investigates the choice of modality and method for each step in the extraction process as an approach to optimising the resulting descriptor. Additionally, multimodal features are formed by combining information from both domains in a single stage in the extraction process. To further improve the quality of feature descriptors, a calculation of the surface normals and a use of the 3D structure from the range image are applied to correct the 3D appearance of a local sample patch, thereby increasing the similarity between observations. The matching performance of local features is evaluated using an experimental setup comprising a turntable and stereo pair of cameras. This experimental setup is used to create a database of intensity and range images for 5 objects imaged at 72 calibrated viewpoints, creating a database of 360 object observations. The use of a calibrated turntable in combination with the 3D object surface coordiantes, supplied by the range image allow location correspondences between object observations to be established; and therefore descriptor matches to be labelled as either true positive or false positive. Applying this methodology to the formulated local features show that two approaches demonstrate state-of-the-art performance, with a ~40% increase in area under ROC curve at a False Positive Rate of 10% when compared with standard SIFT. These approaches are range affine corrected intensity SIFT and element corrected surface gradients SIFT. Furthermore,this work uses the 3D structure encoded in the range image to organise collections of interest points from a series of observations into a collection of canonical views in a new model local feature. The canonical views for a interest point are stored in a view compartmentalised structure which allows the appearance of a local interest point to be characterised across the view sphere. Each canonical view is assigned a confidence measure based on the 3D pose of the interest point at observation, this confidence measure is then used to match similar canonical views of model and query interest points thereby achieving a pose invariant interest point description. This approach does not produce a statistically significant performance increase. However, does contribute a validated methodology for combining multiple descriptors with differing confidence weightings into a single keypoint
    corecore