668 research outputs found
Markerless deformation capture of hoverfly wings using multiple calibrated cameras
This thesis introduces an algorithm for the automated deformation capture of hoverfly
wings from multiple camera image sequences. The algorithm is capable of extracting
dense surface measurements, without the aid of fiducial markers, over an arbitrary number
of wingbeats of hovering flight and requires limited manual initialisation. A novel motion
prediction method, called the ‘normalised stroke model’, makes use of the similarity of adjacent
wing strokes to predict wing keypoint locations, which are then iteratively refined in
a stereo image registration procedure. Outlier removal, wing fitting and further refinement
using independently reconstructed boundary points complete the algorithm. It was tested
on two hovering data sets, as well as a challenging flight manoeuvre. By comparing the
3-d positions of keypoints extracted from these surfaces with those resulting from manual
identification, the accuracy of the algorithm is shown to approach that of a fully manual
approach. In particular, half of the algorithm-extracted keypoints were within 0.17mm of
manually identified keypoints, approximately equal to the error of the manual identification
process. This algorithm is unique among purely image based flapping flight studies in the
level of automation it achieves, and its generality would make it applicable to wing tracking
of other insects
Review of machine-vision based methodologies for displacement measurement in civil structures
This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this record.Vision-based systems are promising tools for displacement measurement in civil structures, possessing advantages over traditional displacement sensors in instrumentation cost, installation efforts and measurement capacity in terms of frequency range and spatial resolution. Approximately one hundred papers to date have appeared on this subject, investigating topics like: system development and improvement, the viability on field applications and the potential for structural condition assessment. The main contribution of this paper is to present a literature review of vision-based displacement measurement, from the perspectives of methodologies and applications. Video processing procedures in this paper are summarised as a three-component framework, camera calibration, target tracking and structural displacement calculation. Methods for each component are presented in principle, with discussions about the relative advantages and limitations. Applications in the two most active fields: bridge deformation and cable vibration measurement are examined followed by a summary of field challenges observed in monitoring tests. Important gaps requiring further investigation are presented e.g. robust tracking methods, non-contact sensing and measurement accuracy evaluation in field conditions
Non-contact vision-based deformation monitoring on bridge structures
Information on deformation is an important metric for bridge condition and performance assessment, e.g. identifying abnormal events, calibrating bridge models and estimating load carrying capacities, etc. However, accurate measurement of bridge deformation, especially for long-span bridges remains as a challenging task. The major aim of this research is to develop practical and cost-effective techniques for accurate deformation monitoring on bridge structures. Vision-based systems are taken as the study focus due to a few reasons: low cost, easy installation, desired sample rates, remote and distributed sensing, etc.
This research proposes an custom-developed vision-based system for bridge deformation monitoring. The system supports either consumer-grade or professional cameras and incorporates four advanced video tracking methods to adapt to different test situations. The sensing accuracy is firstly quantified in laboratory conditions. The working performance in field testing is evaluated on one short-span and one long-span bridge examples considering several influential factors i.e. long-range sensing, low-contrast target patterns, pattern changes and lighting changes. Through case studies, some suggestions about tracking method selection are summarised for field testing. Possible limitations of vision-based systems are illustrated as well.
To overcome observed limitations of vision-based systems, this research further proposes a mixed system combining cameras with accelerometers for accurate deformation measurement. To integrate displacement with acceleration data autonomously, a novel data fusion method based on Kalman filter and maximum likelihood estimation is proposed. Through field test validation, the method is effective for improving displacement accuracy and widening frequency bandwidth. The mixed system based on data fusion is implemented on field testing of a railway bridge considering undesired test conditions (e.g. low-contrast target patterns and camera shake). Analysis results indicate that the system offers higher accuracy than using a camera alone and is viable for bridge influence line estimation.
With considerable accuracy and resolution in time and frequency domains, the potential of vision-based measurement for vibration monitoring is investigated. The proposed vision-based system is applied on a cable-stayed footbridge for deck deformation and cable vibration measurement under pedestrian loading. Analysis results indicate that the measured data enables accurate estimation of modal frequencies and could be used to investigate variations of modal frequencies under varying pedestrian loads. The vision-based system in this application is used for multi-point vibration measurement and provides results comparable to those obtained using an array of accelerometers
Augmented reality over maps
Dissertação de mestrado integrado em Engenharia InformáticaMaps and Geographic Information System (GIS) play a major role in modern society,
particularly on tourism, navigation and personal guidance. However, providing geographical
information of interest related to individual queries remains a strenuous task. The main
constraints are (1) the several information scales available, (2) the large amount of information
available on each scale, and (3) difficulty in directly infer a meaningful geographical context
from text, pictures, or diagrams that are used by most user-aiding systems. To that extent,
and to overcome the aforementioned difficulties, we develop a solution which allows the
overlap of visual information over the maps being queried — a method commonly referred
to as Augmented Reality (AR).
With that in mind, the object of this dissertation is the research and implementation of a
method for the delivery of visual cartographic information over physical (analogue) and
digital two-dimensional (2D) maps utilizing AR. We review existing state-of-art solutions and
outline their limitations across different use cases. Afterwards, we provide a generic modular
solution for a multitude of real-life applications, to name a few: museums, fairs, expositions,
and public street maps. During the development phase, we take into consideration the
trade-off between speed and accuracy in order to develop an accurate and real-time solution.
Finally, we demonstrate the feasibility of our methods with an application on a real use case
based on a map of the city of Oporto, in Portugal.Mapas e Sistema de Informação Geográfica (GIS) desempenham um papel importante na
sociedade, particularmente no turismo, navegação e orientação pessoal. No entanto, fornecer
informações geográficas de interesse a consultas dos utilizadores é uma tarefa árdua. Os
principais dificuldades são (1) as várias escalas de informações disponíveis, (2) a grande
quantidade de informação disponível em cada escala e (3) dificuldade em inferir diretamente
um contexto geográfico significativo a partir dos textos, figuras ou diagramas usados. Assim,
e para superar as dificuldades mencionadas, desenvolvemos uma solução que permite a
sobreposição de informações visuais sobre os mapas que estão a ser consultados - um
método geralmente conhecido como Realidade Aumentada (AR).
Neste sentido, o objetivo desta dissertação é a pesquisa e implementação de um método para
a visualização de informações cartográficas sobre mapas 2D físicos (analógicos) e digitais
utilizando AR. Em primeiro lugar, analisamos o estado da arte juntamente com as soluções
existentes e também as suas limitações nas diversas utilizações possíveis. Posteriormente,
fornecemos uma solução modular genérica para uma várias aplicações reais tais como:
museus, feiras, exposições e mapas públicos de ruas. Durante a fase de desenvolvimento,
tivemos em consideração o compromisso entre velocidade e precisão, a fim de desenvolver
uma solução precisa que funciona em tempo real. Por fim, demonstramos a viabilidade de
nossos métodos com uma aplicação num caso de uso real baseado num mapa da cidade do
Porto (Portugal)
Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments
In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method
Development of an active vision system for robot inspection of complex objects
Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho
and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be
capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection
of leather pieces.
This work starts with a literature review about the two main methods for collecting the necessary data to
perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first
ones work with some kind of support mounted on the hand that holds all the necessary sensors to
measure the desired parameters. While the second ones recur to one or more cameras to capture the
hands and through computer vision algorithms track their position and configuration. The selected
method for this work was the vision-based method Openpose. For each recorded image, this application
can locate 21 hand keypoints on each hand that together form a skeleton of the hands.
This application is used in the tracking system developed throughout this dissertation. Its information is
used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands
in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the
Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information
and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates
of these points considering the pinhole camera model.
To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation,
it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which
estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand
keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates
of all the remaining points on the hand. This solution solves the problems related to hand occlusions that
a prone to happen due to the use of only one camera to record the inspection videos. Once the world
location of all the points in the hands is accurately estimated, their orientation can be defined by selecting
three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho
e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e
orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro,
com recurso a manipuladores robóticos.
O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a
recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo.
Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem
normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos
os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão
estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão
por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um
algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem
gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto.
Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação
numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada
mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada
com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB
gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter
as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da
posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto,
para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional
da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D
do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das
coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com
oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez
estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser
definida com recurso a quaisquer três pontos que definam um plano
Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping
This paper presents a novel real-time method for tracking salient closed
boundaries from video image sequences. This method operates on a set of
straight line segments that are produced by line detection. The tracking scheme
is coherently integrated into a perceptual grouping framework in which the
visual tracking problem is tackled by identifying a subset of these line
segments and connecting them sequentially to form a closed boundary with the
largest saliency and a certain similarity to the previous one. Specifically, we
define a new tracking criterion which combines a grouping cost and an area
similarity constraint. The proposed criterion makes the resulting boundary
tracking more robust to local minima. To achieve real-time tracking
performance, we use Delaunay Triangulation to build a graph model with the
detected line segments and then reduce the tracking problem to finding the
optimal cycle in this graph. This is solved by our newly proposed closed
boundary candidates searching algorithm called "Bidirectional Shortest Path
(BDSP)". The efficiency and robustness of the proposed method are tested on
real video sequences as well as during a robot arm pouring experiment.Comment: 7 pages, 8 figures, The 2017 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2017) submission ID 103
Proceedings of the 2020 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory
In 2020 fand der jährliche Workshop des Faunhofer IOSB und the Lehrstuhls für interaktive Echtzeitsysteme statt. Vom 27. bis zum 31. Juli trugen die Doktorranden der beiden Institute über den Stand ihrer Forschung vor in Themen wie KI, maschinellen Lernen, computer vision, usage control, Metrologie vor. Die Ergebnisse dieser Vorträge sind in diesem Band als technische Berichte gesammelt
The application of range imaging for improved local feature representations
This thesis presents an investigation into the integration of information extracted from co-aligned range and intensity images to achieve pose invariant object recognition. Local feature matching is a fundamental technique in image analysis that underpins many computer vision-based applications; the approach comprises identifying a collection of interest points in an image, characterising the local image region surrounding the interest point by means of a descriptor, and matching these descriptors between example images. Such local feature descriptors are formed from a measure of the local image statistics in the region surrounding the interest point. The interest point locations and the means of measuring local image statistics should be chosen such that resultant descriptor remains stable across a range of common image transformations. Recently the availability of low cost, high quality range imaging devices has motivated an interest in local feature extraction from range images. It has been widely assumed in the vision community that the range imaging domain has properties which remain quasi-invariant through a wide range of changes in illumination and pose. Accordingly, it has been suggested that local feature extraction in the range domain should allow the calculation of local feature descriptors that are potentially more robust than those calculated from the intensity imaging domain alone. However, range images represent differing characteristics from those represented within intensity images which are frequently used, independently from range images, to create robust local features. Therefore, this work attempts to establish the best means of combining information from these two imaging modalities to further increase the reliability of matching local features.
Local feature extraction comprises a series of processes applied to an image location such that a collection of repeatable descriptors can be established. By using co-aligned range and intensity images this work investigates the choice of modality and method for each step in the extraction process as an approach to optimising the resulting descriptor. Additionally, multimodal features are formed by combining information from both domains in a single stage in the extraction process. To further improve the quality of feature descriptors, a calculation of the surface normals and a use of the 3D structure from the range image are applied to correct the 3D appearance of a local sample patch, thereby increasing the similarity between observations.
The matching performance of local features is evaluated using an experimental setup comprising a turntable and stereo pair of cameras. This experimental setup is used to create a database of intensity and range images for 5 objects imaged at 72 calibrated viewpoints, creating a database of 360 object observations. The use of a calibrated turntable in combination with the 3D object surface coordiantes, supplied by the range image allow location correspondences between object observations to be established; and therefore descriptor matches to be labelled as either true positive or false positive. Applying this methodology to the formulated local features show that two approaches demonstrate state-of-the-art performance, with a ~40% increase in area under ROC curve at a False Positive Rate of 10% when compared with standard SIFT. These approaches are range affine corrected intensity SIFT and element corrected surface gradients SIFT.
Furthermore,this work uses the 3D structure encoded in the range image to organise collections of interest points from a series of observations into a collection of canonical views in a new model local feature. The canonical views for a interest point are stored in a view compartmentalised structure which allows the appearance of a local interest point to be characterised across the view sphere. Each canonical view is assigned a confidence measure based on the 3D pose of the interest point at observation, this confidence measure is then used to match similar canonical views of model and query interest points thereby achieving a pose invariant interest point description. This approach does not produce a statistically significant performance increase. However, does contribute a validated methodology for combining multiple descriptors with differing confidence weightings into a single keypoint
- …