310 research outputs found
On Optimizing Locally Linear Nearest Neighbour Reconstructions Using Prototype Reduction Schemes
This paper concerns the use of Prototype Reduction Schemes (PRS) to optimize the computations involved in typical k-Nearest Neighbor (k-NN) rules. These rules have been successfully used for decades in statistical Pattern Recognition (PR) applications, and have numerous applications because of their known error bounds. For a given data point of unknown identity, the k-NN possesses the phenomenon that it combines the information about the samples from a priori target classes (values) of selected neighbors to, for example, predict the target class of the tested sample. Recently, an implementation of the k-NN, named as the Locally Linear Reconstruction (LLR) [11], has been proposed. The salient feature of the latter is that by invoking a quadratic optimization process, it is capable of systematically setting model parameters, such as the number of neighbors (specified by the parameter, k) and the weights. However, the LLR takes more time than other conventional methods when it has to be applied to classification tasks. To overcome this problem, we propose a strategy of using a PRS to efficiently compute the optimization problem. In this paper, we demonstrate, first of all, that by completely discarding the points not included by the PRS, we can obtain a reduced set of sample points, using which, in turn, the quadratic optimization problem can be computed far more expediently. The values of the corresponding indices are comparable to those obtained with the original training set (i.e., the one which considers all the data points) even though the computations required to obtain the prototypes and the corresponding classification accuracies are noticeably less. The proposed method has been tested on artificial and real-life data sets, and the results obtained are very promising, and has potential in PR applications
Adaptive Template Enhancement for Improved Person Recognition using Small Datasets
A novel instance-based method for the classification of electroencephalography (EEG) signals is presented and evaluated in this paper. The non-stationary nature of the EEG signals, coupled with the demanding task of pattern recognition with limited training data as well as the potentially noisy signal acquisition conditions, have motivated the work reported in this study. The proposed adaptive template enhancement mechanism transforms the feature-level instances by treating each feature dimension separately, hence resulting in improved class separation and better query-class matching. The proposed new instance-based learning algorithm is compared with a few related algorithms in a number of scenarios. A clinical grade 64-electrode EEG database, as well as a low-quality (high-noise level) EEG database obtained with a low-cost system using a single dry sensor have been used for evaluations in biometric person recognition. The proposed approach demonstrates significantly improved classification accuracy in both identification and verification scenarios. In particular, this new method is seen to provide a good classification performance for noisy EEG data, indicating its potential suitability for a wide range of applications
Model-based viewpoint invariant human activity recognition from uncalibrated monocular video sequence
There is growing interest in human activity recognition systems, motivated by their numerous promising applications in many domains. Despite much progress, most researchers have narrowed the problem towards fixed camera viewpoint owing to inherent difficulty to train their systems across all possible viewpoints. Fixed viewpoint systems are impractical in real scenarios. Therefore, we attempt to relax the fixed viewpoint assumption and present a novel and simple framework to recognize and classify human activities from uncalibrated monocular video source from any viewpoint. The proposed framework comprises two stages: 3D human pose estimation and human activity recognition. In the pose estimation stage, we estimate 3D human pose by a simple search-based and tracking-based technique. In the activity recognition stage, we use Nearest Neighbor, with Dynamic Time Warping as a distance measure, to classify multivariate time series which emanate from streams of pose vectors from multiple video frames. We have performed some experiments to evaluate the accuracy of the two stages separately. The encouraging experimental results demonstrate the effectiveness of our framework
Model driven segmentation and the detection of bone fractures
Bibliography: leaves 83-90.The introduction of lower dosage image acquisition devices and the increase in computational power means that there is an increased focus on producing diagnostic aids for the medical trauma environment. The focus of this research is to explore whether geometric criteria can be used to detect bone fractures from Computed Tomography data. Conventional image processing of CT data is aimed at the production of simple iso-surfaces for surgical planning or diagnosis - such methods are not suitable for the automated detection of fractures. Our hypothesis is that through a model-based technique a triangulated surface representing the bone can be speedily and accurately produced. And, that there is sufficient structural information present that by examining the geometric structure of this representation we can accurately detect bone fractures. In this dissertation we describe the algorithms and framework that we built to facilitate the detection of bone fractures and evaluate the validity of our approach
Analysis of Dynamic Magnetic Resonance Breast Images
Dynamic Magnetic Resonance Imaging is a non-invasive technique that provides an
image sequence based on dynamic information for locating lesions and investigating their
structures.
In this thesis we develop new methodology for analysing dynamic Magnetic Resonance
image sequences of the breast. This methodology comprises an image restoration step
that reduces random distortions affecting the data and an image classification step that
identifies normal, benign or malignant tumoral tissues.
In the first part of this thesis we present a non-parametric and a parametric
approach for image restoration and classification. Both methods are developed within
the Bayesian framework. A prior distribution modelling both spatial homogeneity and
temporal continuity between neighbouring image pixels is employed. Statistical inference
is performed by means of a Metropolis-Hastings algorithm with a specially chosen proposal
distribution that out-performs other algorithms of the same family. We also provide novel
procedures for estimating the hyper-parameters of the prior models and the normalizing
constant so making the Bayesian methodology automatic.
In the second part of this thesis we present new methodology for image classification
based on deformable templates of a prototype shape. Our approach uses higher level
knowledge about the tumour structure than the spatio-temporal prior distribution of our
Bayesian methodology. The prototype shape is deformed to identify the structure of the
malignant tumoral tissue by minimizing a novel objective function over the parameters of a
set of non-affine transformations. Since these transformations can destroy the connectivity
of the shape, we develop a new filter that restores connectivity without smoothing the
shape.
The restoration and classification results obtained from a small sample of image
sequences are very encouraging. In order to validate these results on a larger sample,
in the last part of the thesis we present a user friendly software package that implements
our methodology
Recommended from our members
Holoscopic 3D image depth estimation and segmentation techniques
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonToday’s 3D imaging techniques offer significant benefits over conventional 2D imaging techniques. The presence of natural depth information in the scene affords the observer an overall improved sense of reality and naturalness. A variety of systems attempting to reach this goal have been designed by many independent research groups, such as stereoscopic and auto-stereoscopic systems. Though the images displayed by such systems tend to cause eye strain, fatigue and headaches after prolonged viewing as users are required to focus on the screen plane/accommodation to converge their eyes to a point in space in a different plane/convergence. Holoscopy is a 3D technology that targets overcoming the above limitations of current 3D technology and was recently developed at Brunel University. This work is part W4.1 of the 3D VIVANT project that is funded by the EU under the ICT program and coordinated by Dr. Aman Aggoun at Brunel University, West London, UK. The objective of the work described in this thesis is to develop estimation and segmentation techniques that are capable of estimating precise 3D depth, and are applicable for holoscopic 3D imaging system. Particular emphasis is given to the task of automatic techniques i.e. favours algorithms with broad generalisation abilities, as no constraints are placed on the setting. Algorithms that provide invariance to most appearance based variation of objects in the scene (e.g. viewpoint changes, deformable objects, presence of noise and changes in lighting). Moreover, have the ability to estimate depth information from both types of holoscopic 3D images i.e. Unidirectional and Omni-directional which gives horizontal parallax and full parallax (vertical and horizontal), respectively. The main aim of this research is to develop 3D depth estimation and 3D image segmentation techniques with great precision. In particular, emphasis on automation of thresholding techniques and cues identifications for development of robust algorithms. A method for depth-through-disparity feature analysis has been built based on the existing correlation between the pixels at a one micro-lens pitch which has been exploited to extract the viewpoint images (VPIs). The corresponding displacement among the VPIs has been exploited to estimate the depth information map via setting and extracting reliable sets of local features. ii Feature-based-point and feature-based-edge are two novel automatic thresholding techniques for detecting and extracting features that have been used in this approach. These techniques offer a solution to the problem of setting and extracting reliable features automatically to improve the performance of the depth estimation related to the generalizations, speed and quality. Due to the resolution limitation of the extracted VPIs, obtaining an accurate 3D depth map is challenging. Therefore, sub-pixel shift and integration is a novel interpolation technique that has been used in this approach to generate super-resolution VPIs. By shift and integration of a set of up-sampled low resolution VPIs, the new information contained in each viewpoint is exploited to obtain a super resolution VPI. This produces a high resolution perspective VPI with wide Field Of View (FOV). This means that the holoscopic 3D image system can be converted into a multi-view 3D image pixel format. Both depth accuracy and a fast execution time have been achieved that improved the 3D depth map. For a 3D object to be recognized the related foreground regions and depth information map needs to be identified. Two novel unsupervised segmentation methods that generate interactive depth maps from single viewpoint segmentation were developed. Both techniques offer new improvements over the existing methods due to their simple use and being fully automatic; therefore, producing the 3D depth interactive map without human interaction. The final contribution is a performance evaluation, to provide an equitable measurement for the extent of the success of the proposed techniques for foreground object segmentation, 3D depth interactive map creation and the generation of 2D super-resolution viewpoint techniques. The no-reference image quality assessment metrics and their correlation with the human perception of quality are used with the help of human participants in a subjective manner
A window to the past through modern urban environments: Developing a photogrammetric workflow for the orientation parameter estimation of historical images
The ongoing process of digitization in archives is providing access to ever-increasing historical image collections. In many of these repositories, images can typically be viewed in a list or gallery view. Due to the growing number of digitized objects, this type of visualization is becoming increasingly complex. Among other things, it is difficult to determine how many photographs show a particular object and spatial information can only be communicated via metadata.
Within the scope of this thesis, research is conducted on the automated determination and provision of this spatial data. Enhanced visualization options make this information more eas- ily accessible to scientists as well as citizens. Different types of visualizations can be presented in three-dimensional (3D), Virtual Reality (VR) or Augmented Reality (AR) applications. However, applications of this type require the estimation of the photographer’s point of view. In the photogrammetric context, this is referred to as estimating the interior and exterior orientation parameters of the camera. For determination of orientation parameters for single images, there are the established methods of Direct Linear Transformation (DLT) or photogrammetric space resection. Using these methods requires the assignment of measured object points to their homologue image points. This is feasible for single images, but quickly becomes impractical due to the large amount of images available in archives. Thus, for larger image collections, usually the Structure-from-Motion (SfM) method is chosen, which allows the simultaneous estimation of the interior as well as the exterior orientation of the cameras. While this method yields good results especially for sequential, contemporary image data, its application to unsorted historical photographs poses a major challenge.
In the context of this work, which is mainly limited to scenarios of urban terrestrial photographs, the reasons for failure of the SfM process are identified. In contrast to sequential image collections, pairs of images from different points in time or from varying viewpoints show huge differences in terms of scene representation such as deviations in the lighting situation, building state, or seasonal changes. Since homologue image points have to be found automatically in image pairs or image sequences in the feature matching procedure of SfM, these image differences
pose the most complex problem.
In order to test different feature matching methods, it is necessary to use a pre-oriented historical dataset. Since such a benchmark dataset did not exist yet, eight historical image triples (corresponding to 24 image pairs) are oriented in this work by manual selection of homologue image points. This dataset allows the evaluation of frequently new published methods in feature matching. The initial methods used, which are based on algorithmic procedures for feature matching (e.g., Scale Invariant Feature Transform (SIFT)), provide satisfactory results for only few of the image pairs in this dataset. By introducing methods that use neural networks for feature detection and feature description, homologue features can be reliably found for a large fraction of image pairs in the benchmark dataset.
In addition to a successful feature matching strategy, determining camera orientation requires an initial estimate of the principal distance. Hence for historical images, the principal distance cannot be directly determined as the camera information is usually lost during the process of digitizing the analog original. A possible solution to this problem is to use three vanishing points that are automatically detected in the historical image and from which the principal distance can then be determined. The combination of principal distance estimation and robust feature matching is integrated into the SfM process and allows the determination of the interior
and exterior camera orientation parameters of historical images. Based on
these results, a workflow is designed that allows archives to be directly connected to 3D applications.
A search query in archives is usually performed using keywords, which have to be assigned to the corresponding object as metadata. Therefore, a keyword search for a specific building also results in hits on drawings, paintings, events, interior or detailed views directly connected to this building. However, for the successful application of SfM in an urban context, primarily the photographic exterior view of the building is of interest. While the images for a single building can be sorted by hand, this process is too time-consuming for multiple buildings.
Therefore, in collaboration with the Competence Center for Scalable Data Services and Solutions (ScaDS), an approach is developed to filter historical photographs by image similarities. This method reliably enables the search for content-similar views via the selection of one or more query images. By linking this content-based image retrieval with the SfM approach, automatic determination of camera parameters for a large number of historical photographs is possible. The developed method represents a significant improvement over commercial and open-source SfM standard solutions.
The result of this work is a complete workflow from archive to application that automatically filters images and calculates the camera parameters. The expected accuracy of a few meters for the camera position is sufficient for the presented applications in this work, but offer further potential for improvement. A connection to archives, which will automatically exchange photographs and positions via interfaces, is currently under development. This makes it possible to retrieve interior and exterior orientation parameters directly from historical photography as metadata which opens up new fields of research.:1 Introduction 1
1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4
1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5
1.3.3 Feature detection, feature description and feature matching . . . . . . 6
1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13
1.3.3.5 Joint methods . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.4 Initial parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.5 Bundle adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.6 Dense reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.7 Georeferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Generation of a benchmark dataset using historical photographs for the evaluation
of feature matching methods 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Image differences based on digitization and image medium . . . . . . . 30
2.1.2 Image differences based on different cameras and acquisition technique 31
2.1.3 Object differences based on different dates of acquisition . . . . . . . . 31
2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 The image dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Comparison of different feature detection and description methods . . . . . . 35
2.4.1 Oriented FAST and Rotated BRIEF (ORB) . . . . . . . . . . . . . . . 36
2.4.2 Maximally Stable Extremal Region Detector (MSER) . . . . . . . . . 36
2.4.3 Radiation-invariant Feature Transform (RIFT) . . . . . . . . . . . . . 36
2.4.4 Feature matching and outlier removal . . . . . . . . . . . . . . . . . . 36
2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Photogrammetry as a link between image repository and 4D applications 45
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
IX
Contents
3.2 Multimodal access on repositories . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 Conventional access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Virtual access using online collections . . . . . . . . . . . . . . . . . . 48
3.2.3 Virtual museums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Workflow and access strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.3 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.4 Browser access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.5 VR and AR access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 An adapted Structure-from-Motion Workflow for the orientation of historical
images 69
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 Historical images for 3D reconstruction . . . . . . . . . . . . . . . . . 72
4.2.2 Algorithmic Feature Detection and Matching . . . . . . . . . . . . . . 73
4.2.3 Feature Detection and Matching using Convolutional Neural Networks 74
4.3 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.1 Step 1: Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4.2 Step 2.1: Feature Detection and Matching . . . . . . . . . . . . . . . . 78
4.4.3 Step 2.2: Vanishing Point Detection and Principal Distance Estimation 80
4.4.4 Step 3: Scene Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.5 Comparison with Three Other State-of-the-Art SfM Workflows . . . . 81
4.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Fully automated pose estimation of historical images 97
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.1 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.2 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . 101
5.3 Data Preparation: Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.1 Experiment and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3.2.1 Layer Extraction Approach (LEA) . . . . . . . . . . . . . . . 104
5.3.2.2 Attentive Deep Local Features (DELF) Approach . . . . . . 105
5.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 Camera Pose Estimation of Historical Images Using Photogrammetric Methods 110
5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.1.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.1.2 Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . 115
5.4.2.2 Geometric Verification and Camera Pose Estimation . . . . . 116
5.4.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6 Related publications 129
6.1 Photogrammetric analysis of historical image repositores for virtual reconstruction
in the field of digital humanities . . . . . . . . . . . . . . . . . . . . . . . 130
6.2 Feature matching of historical images based on geometry of quadrilaterals . . 131
6.3 Geo-information technologies for a multimodal access on historical photographs
and maps for research and communication in urban history . . . . . . . . . . 132
6.4 An automated pipeline for a browser-based, city-scale mobile 4D VR application
based on historical images . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.5 Software and content design of a browser-based mobile 4D VR application to
explore historical city architecture . . . . . . . . . . . . . . . . . . . . . . . . 134
7 Synthesis 135
7.1 Summary of the developed workflows . . . . . . . . . . . . . . . . . . . . . . . 135
7.1.1 Error assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.1.2 Accuracy estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.1.3 Transfer of the workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.2 Developments and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8 Appendix 149
8.1 Setup for the feature matching evaluation . . . . . . . . . . . . . . . . . . . . 149
8.2 Transformation from COLMAP coordinate system to OpenGL . . . . . . . . 150
References 151
List of Figures 165
List of Tables 167
List of Abbreviations 169Der andauernde Prozess der Digitalisierung in Archiven ermöglicht den Zugriff auf immer größer werdende historische Bildbestände. In vielen Repositorien können die Bilder typischerweise in einer Listen- oder Gallerieansicht betrachtet werden. Aufgrund der steigenden Zahl an digitalisierten Objekten wird diese Art der Visualisierung zunehmend unübersichtlicher. Es kann u.a. nur noch schwierig bestimmt werden, wie viele Fotografien ein bestimmtes Motiv zeigen. Des Weiteren können räumliche Informationen bisher nur über Metadaten vermittelt werden.
Im Rahmen der Arbeit wird an der automatisierten Ermittlung und Bereitstellung dieser räumlichen Daten geforscht. Erweiterte Visualisierungsmöglichkeiten machen diese Informationen Wissenschaftlern sowie Bürgern einfacher zugänglich. Diese Visualisierungen können u.a. in drei-dimensionalen (3D), Virtual Reality (VR) oder Augmented Reality (AR) Anwendungen präsentiert werden. Allerdings erfordern Anwendungen dieser Art die Schätzung des Standpunktes des Fotografen. Im photogrammetrischen Kontext spricht man dabei von der Schätzung der inneren und äußeren Orientierungsparameter der Kamera. Zur Bestimmung der Orientierungsparameter für Einzelbilder existieren die etablierten Verfahren der direkten linearen Transformation oder des photogrammetrischen Rückwärtsschnittes. Dazu muss eine Zuordnung von gemessenen Objektpunkten zu ihren homologen Bildpunkten erfolgen. Das ist für einzelne Bilder realisierbar, wird aber aufgrund der großen Menge an Bildern in Archiven schnell nicht mehr praktikabel. Für größere Bildverbände wird im photogrammetrischen Kontext somit üblicherweise das Verfahren Structure-from-Motion (SfM) gewählt, das die simultane Schätzung der inneren sowie der äußeren Orientierung der Kameras ermöglicht. Während diese Methode vor allem für sequenzielle, gegenwärtige Bildverbände gute Ergebnisse liefert, stellt die Anwendung auf unsortierten historischen Fotografien eine große Herausforderung dar.
Im Rahmen der Arbeit, die sich größtenteils auf Szenarien stadträumlicher terrestrischer Fotografien beschränkt, werden zuerst die Gründe für das Scheitern des SfM Prozesses identifiziert. Im Gegensatz zu sequenziellen Bildverbänden zeigen Bildpaare aus unterschiedlichen zeitlichen Epochen oder von unterschiedlichen Standpunkten enorme Differenzen hinsichtlich der Szenendarstellung. Dies können u.a. Unterschiede in der Beleuchtungssituation, des
Aufnahmezeitpunktes oder Schäden am originalen analogen Medium sein. Da für die Merkmalszuordnung in SfM automatisiert homologe Bildpunkte in Bildpaaren bzw. Bildsequenzen gefunden werden müssen, stellen diese Bilddifferenzen die größte Schwierigkeit dar.
Um verschiedene Verfahren der Merkmalszuordnung testen zu können, ist es notwendig einen vororientierten historischen Datensatz zu verwenden. Da solch ein Benchmark-Datensatz noch nicht existierte, werden im Rahmen der Arbeit durch manuelle Selektion homologer Bildpunkte acht historische Bildtripel (entspricht 24 Bildpaaren) orientiert, die anschließend genutzt werden, um neu publizierte Verfahren bei der Merkmalszuordnung zu evaluieren. Die ersten verwendeten Methoden, die algorithmische Verfahren zur Merkmalszuordnung nutzen (z.B. Scale Invariant Feature Transform (SIFT)), liefern nur für wenige Bildpaare des Datensatzes zufriedenstellende Ergebnisse. Erst durch die Verwendung von Verfahren, die neuronale Netze zur Merkmalsdetektion und Merkmalsbeschreibung einsetzen, können für einen großen Teil der historischen Bilder des Benchmark-Datensatzes zuverlässig homologe Bildpunkte gefunden werden.
Die Bestimmung der Kameraorientierung erfordert zusätzlich zur Merkmalszuordnung eine initiale Schätzung der Kamerakonstante, die jedoch im Zuge der Digitalisierung des analogen Bildes nicht mehr direkt zu ermitteln ist. Eine mögliche Lösung dieses Problems ist die Verwendung von drei Fluchtpunkten, die automatisiert im historischen Bild detektiert werden und aus denen dann die Kamerakonstante bestimmt werden kann. Die Kombination aus Schätzung der Kamerakonstante und robuster Merkmalszuordnung wird in den SfM Prozess integriert und erlaubt die Bestimmung der Kameraorientierung historischer Bilder.
Auf Grundlage dieser Ergebnisse wird ein Arbeitsablauf konzipiert, der es ermöglicht, Archive mittels dieses photogrammetrischen Verfahrens direkt an 3D-Anwendungen anzubinden.
Eine Suchanfrage in Archiven erfolgt üblicherweise über Schlagworte, die dann als Metadaten dem entsprechenden Objekt zugeordnet sein müssen. Eine Suche nach einem bestimmten Gebäude generiert deshalb u.a. Treffer zu Zeichnungen, Gemälden, Veranstaltungen, Innen- oder Detailansichten. Für die erfolgreiche Anwendung von SfM im stadträumlichen Kontext interessiert jedoch v.a. die fotografische Außenansicht des Gebäudes. Während die Bilder für ein einzelnes Gebäude von Hand sortiert werden können, ist dieser Prozess für mehrere
Gebäude zu zeitaufwendig.
Daher wird in Zusammenarbeit mit dem Competence Center for Scalable Data Services and Solutions (ScaDS) ein Ansatz entwickelt, um historische Fotografien über Bildähnlichkeiten zu filtern. Dieser ermöglicht zuverlässig über die Auswahl eines oder mehrerer Suchbilder die Suche nach inhaltsähnlichen Ansichten. Durch die Verknüpfung der inhaltsbasierten Suche mit dem SfM Ansatz ist es möglich, automatisiert für eine große Anzahl historischer Fotografien die Kameraparameter zu bestimmen. Das entwickelte Verfahren stellt eine deutliche Verbesserung im Vergleich zu kommerziellen und open-source SfM Standardlösungen dar.
Das Ergebnis dieser Arbeit ist ein kompletter Arbeitsablauf vom Archiv bis zur Applikation, der automatisch Bilder filtert und diese orientiert. Die zu erwartende Genauigkeit von wenigen Metern für die Kameraposition sind ausreichend für die dargestellten Anwendungen in dieser Arbeit, bieten aber weiteres Verbesserungspotential. Eine Anbindung an Archive, die über Schnittstellen automatisch Fotografien und Positionen austauschen soll, befindet sich bereits in der Entwicklung. Dadurch ist es möglich, innere und äußere Orientierungsparameter direkt von der historischen Fotografie als Metadaten abzurufen, was neue Forschungsfelder eröffnet.:1 Introduction 1
1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4
1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5
1.3.3 Feature detection, feature description and feature matching . . . . . . 6
1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13
1.3.3.5 Joint methods . . . . . . . . . . . . . . . .
- …