53 research outputs found

    SCNet: Learning Semantic Correspondence

    Get PDF
    This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category. Previous approaches focus on either combining a spatial regularizer with hand-crafted features, or learning a correspondence model for appearance only. We propose instead a convolutional neural network architecture, called SCNet, for learning a geometrically plausible model for semantic correspondence. SCNet uses region proposals as matching primitives, and explicitly incorporates geometric consistency in its loss function. It is trained on image pairs obtained from the PASCAL VOC 2007 keypoint dataset, and a comparative evaluation on several standard benchmarks demonstrates that the proposed approach substantially outperforms both recent deep learning architectures and previous methods based on hand-crafted features.Comment: ICCV 201

    Image Retrieval based on Bag-of-Words model

    Full text link
    This article gives a survey for bag-of-words (BoW) or bag-of-features model in image retrieval system. In recent years, large-scale image retrieval shows significant potential in both industry applications and research problems. As local descriptors like SIFT demonstrate great discriminative power in solving vision problems like object recognition, image classification and annotation, more and more state-of-the-art large scale image retrieval systems are trying to rely on them. A common way to achieve this is first quantizing local descriptors into visual words, and then applying scalable textual indexing and retrieval schemes. We call this model as bag-of-words or bag-of-features model. The goal of this survey is to give an overview of this model and introduce different strategies when building the system based on this model

    A window to the past through modern urban environments: Developing a photogrammetric workflow for the orientation parameter estimation of historical images

    Get PDF
    The ongoing process of digitization in archives is providing access to ever-increasing historical image collections. In many of these repositories, images can typically be viewed in a list or gallery view. Due to the growing number of digitized objects, this type of visualization is becoming increasingly complex. Among other things, it is difficult to determine how many photographs show a particular object and spatial information can only be communicated via metadata. Within the scope of this thesis, research is conducted on the automated determination and provision of this spatial data. Enhanced visualization options make this information more eas- ily accessible to scientists as well as citizens. Different types of visualizations can be presented in three-dimensional (3D), Virtual Reality (VR) or Augmented Reality (AR) applications. However, applications of this type require the estimation of the photographer’s point of view. In the photogrammetric context, this is referred to as estimating the interior and exterior orientation parameters of the camera. For determination of orientation parameters for single images, there are the established methods of Direct Linear Transformation (DLT) or photogrammetric space resection. Using these methods requires the assignment of measured object points to their homologue image points. This is feasible for single images, but quickly becomes impractical due to the large amount of images available in archives. Thus, for larger image collections, usually the Structure-from-Motion (SfM) method is chosen, which allows the simultaneous estimation of the interior as well as the exterior orientation of the cameras. While this method yields good results especially for sequential, contemporary image data, its application to unsorted historical photographs poses a major challenge. In the context of this work, which is mainly limited to scenarios of urban terrestrial photographs, the reasons for failure of the SfM process are identified. In contrast to sequential image collections, pairs of images from different points in time or from varying viewpoints show huge differences in terms of scene representation such as deviations in the lighting situation, building state, or seasonal changes. Since homologue image points have to be found automatically in image pairs or image sequences in the feature matching procedure of SfM, these image differences pose the most complex problem. In order to test different feature matching methods, it is necessary to use a pre-oriented historical dataset. Since such a benchmark dataset did not exist yet, eight historical image triples (corresponding to 24 image pairs) are oriented in this work by manual selection of homologue image points. This dataset allows the evaluation of frequently new published methods in feature matching. The initial methods used, which are based on algorithmic procedures for feature matching (e.g., Scale Invariant Feature Transform (SIFT)), provide satisfactory results for only few of the image pairs in this dataset. By introducing methods that use neural networks for feature detection and feature description, homologue features can be reliably found for a large fraction of image pairs in the benchmark dataset. In addition to a successful feature matching strategy, determining camera orientation requires an initial estimate of the principal distance. Hence for historical images, the principal distance cannot be directly determined as the camera information is usually lost during the process of digitizing the analog original. A possible solution to this problem is to use three vanishing points that are automatically detected in the historical image and from which the principal distance can then be determined. The combination of principal distance estimation and robust feature matching is integrated into the SfM process and allows the determination of the interior and exterior camera orientation parameters of historical images. Based on these results, a workflow is designed that allows archives to be directly connected to 3D applications. A search query in archives is usually performed using keywords, which have to be assigned to the corresponding object as metadata. Therefore, a keyword search for a specific building also results in hits on drawings, paintings, events, interior or detailed views directly connected to this building. However, for the successful application of SfM in an urban context, primarily the photographic exterior view of the building is of interest. While the images for a single building can be sorted by hand, this process is too time-consuming for multiple buildings. Therefore, in collaboration with the Competence Center for Scalable Data Services and Solutions (ScaDS), an approach is developed to filter historical photographs by image similarities. This method reliably enables the search for content-similar views via the selection of one or more query images. By linking this content-based image retrieval with the SfM approach, automatic determination of camera parameters for a large number of historical photographs is possible. The developed method represents a significant improvement over commercial and open-source SfM standard solutions. The result of this work is a complete workflow from archive to application that automatically filters images and calculates the camera parameters. The expected accuracy of a few meters for the camera position is sufficient for the presented applications in this work, but offer further potential for improvement. A connection to archives, which will automatically exchange photographs and positions via interfaces, is currently under development. This makes it possible to retrieve interior and exterior orientation parameters directly from historical photography as metadata which opens up new fields of research.:1 Introduction 1 1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4 1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature detection, feature description and feature matching . . . . . . 6 1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13 1.3.3.5 Joint methods . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.4 Initial parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.5 Bundle adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.6 Dense reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.7 Georeferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Generation of a benchmark dataset using historical photographs for the evaluation of feature matching methods 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.1 Image differences based on digitization and image medium . . . . . . . 30 2.1.2 Image differences based on different cameras and acquisition technique 31 2.1.3 Object differences based on different dates of acquisition . . . . . . . . 31 2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 The image dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Comparison of different feature detection and description methods . . . . . . 35 2.4.1 Oriented FAST and Rotated BRIEF (ORB) . . . . . . . . . . . . . . . 36 2.4.2 Maximally Stable Extremal Region Detector (MSER) . . . . . . . . . 36 2.4.3 Radiation-invariant Feature Transform (RIFT) . . . . . . . . . . . . . 36 2.4.4 Feature matching and outlier removal . . . . . . . . . . . . . . . . . . 36 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 Photogrammetry as a link between image repository and 4D applications 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 IX Contents 3.2 Multimodal access on repositories . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.1 Conventional access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Virtual access using online collections . . . . . . . . . . . . . . . . . . 48 3.2.3 Virtual museums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Workflow and access strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.4 Browser access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.5 VR and AR access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4 An adapted Structure-from-Motion Workflow for the orientation of historical images 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.1 Historical images for 3D reconstruction . . . . . . . . . . . . . . . . . 72 4.2.2 Algorithmic Feature Detection and Matching . . . . . . . . . . . . . . 73 4.2.3 Feature Detection and Matching using Convolutional Neural Networks 74 4.3 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.1 Step 1: Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.2 Step 2.1: Feature Detection and Matching . . . . . . . . . . . . . . . . 78 4.4.3 Step 2.2: Vanishing Point Detection and Principal Distance Estimation 80 4.4.4 Step 3: Scene Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 80 4.4.5 Comparison with Three Other State-of-the-Art SfM Workflows . . . . 81 4.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5 Fully automated pose estimation of historical images 97 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.1 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.2 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . 101 5.3 Data Preparation: Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 Experiment and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.2.1 Layer Extraction Approach (LEA) . . . . . . . . . . . . . . . 104 5.3.2.2 Attentive Deep Local Features (DELF) Approach . . . . . . 105 5.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4 Camera Pose Estimation of Historical Images Using Photogrammetric Methods 110 5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.1.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.1.2 Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . 115 5.4.2.2 Geometric Verification and Camera Pose Estimation . . . . . 116 5.4.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6 Related publications 129 6.1 Photogrammetric analysis of historical image repositores for virtual reconstruction in the field of digital humanities . . . . . . . . . . . . . . . . . . . . . . . 130 6.2 Feature matching of historical images based on geometry of quadrilaterals . . 131 6.3 Geo-information technologies for a multimodal access on historical photographs and maps for research and communication in urban history . . . . . . . . . . 132 6.4 An automated pipeline for a browser-based, city-scale mobile 4D VR application based on historical images . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.5 Software and content design of a browser-based mobile 4D VR application to explore historical city architecture . . . . . . . . . . . . . . . . . . . . . . . . 134 7 Synthesis 135 7.1 Summary of the developed workflows . . . . . . . . . . . . . . . . . . . . . . . 135 7.1.1 Error assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1.2 Accuracy estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.1.3 Transfer of the workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Developments and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8 Appendix 149 8.1 Setup for the feature matching evaluation . . . . . . . . . . . . . . . . . . . . 149 8.2 Transformation from COLMAP coordinate system to OpenGL . . . . . . . . 150 References 151 List of Figures 165 List of Tables 167 List of Abbreviations 169Der andauernde Prozess der Digitalisierung in Archiven ermöglicht den Zugriff auf immer größer werdende historische Bildbestände. In vielen Repositorien können die Bilder typischerweise in einer Listen- oder Gallerieansicht betrachtet werden. Aufgrund der steigenden Zahl an digitalisierten Objekten wird diese Art der Visualisierung zunehmend unübersichtlicher. Es kann u.a. nur noch schwierig bestimmt werden, wie viele Fotografien ein bestimmtes Motiv zeigen. Des Weiteren können räumliche Informationen bisher nur über Metadaten vermittelt werden. Im Rahmen der Arbeit wird an der automatisierten Ermittlung und Bereitstellung dieser räumlichen Daten geforscht. Erweiterte Visualisierungsmöglichkeiten machen diese Informationen Wissenschaftlern sowie Bürgern einfacher zugänglich. Diese Visualisierungen können u.a. in drei-dimensionalen (3D), Virtual Reality (VR) oder Augmented Reality (AR) Anwendungen präsentiert werden. Allerdings erfordern Anwendungen dieser Art die Schätzung des Standpunktes des Fotografen. Im photogrammetrischen Kontext spricht man dabei von der Schätzung der inneren und äußeren Orientierungsparameter der Kamera. Zur Bestimmung der Orientierungsparameter für Einzelbilder existieren die etablierten Verfahren der direkten linearen Transformation oder des photogrammetrischen Rückwärtsschnittes. Dazu muss eine Zuordnung von gemessenen Objektpunkten zu ihren homologen Bildpunkten erfolgen. Das ist für einzelne Bilder realisierbar, wird aber aufgrund der großen Menge an Bildern in Archiven schnell nicht mehr praktikabel. Für größere Bildverbände wird im photogrammetrischen Kontext somit üblicherweise das Verfahren Structure-from-Motion (SfM) gewählt, das die simultane Schätzung der inneren sowie der äußeren Orientierung der Kameras ermöglicht. Während diese Methode vor allem für sequenzielle, gegenwärtige Bildverbände gute Ergebnisse liefert, stellt die Anwendung auf unsortierten historischen Fotografien eine große Herausforderung dar. Im Rahmen der Arbeit, die sich größtenteils auf Szenarien stadträumlicher terrestrischer Fotografien beschränkt, werden zuerst die Gründe für das Scheitern des SfM Prozesses identifiziert. Im Gegensatz zu sequenziellen Bildverbänden zeigen Bildpaare aus unterschiedlichen zeitlichen Epochen oder von unterschiedlichen Standpunkten enorme Differenzen hinsichtlich der Szenendarstellung. Dies können u.a. Unterschiede in der Beleuchtungssituation, des Aufnahmezeitpunktes oder Schäden am originalen analogen Medium sein. Da für die Merkmalszuordnung in SfM automatisiert homologe Bildpunkte in Bildpaaren bzw. Bildsequenzen gefunden werden müssen, stellen diese Bilddifferenzen die größte Schwierigkeit dar. Um verschiedene Verfahren der Merkmalszuordnung testen zu können, ist es notwendig einen vororientierten historischen Datensatz zu verwenden. Da solch ein Benchmark-Datensatz noch nicht existierte, werden im Rahmen der Arbeit durch manuelle Selektion homologer Bildpunkte acht historische Bildtripel (entspricht 24 Bildpaaren) orientiert, die anschließend genutzt werden, um neu publizierte Verfahren bei der Merkmalszuordnung zu evaluieren. Die ersten verwendeten Methoden, die algorithmische Verfahren zur Merkmalszuordnung nutzen (z.B. Scale Invariant Feature Transform (SIFT)), liefern nur für wenige Bildpaare des Datensatzes zufriedenstellende Ergebnisse. Erst durch die Verwendung von Verfahren, die neuronale Netze zur Merkmalsdetektion und Merkmalsbeschreibung einsetzen, können für einen großen Teil der historischen Bilder des Benchmark-Datensatzes zuverlässig homologe Bildpunkte gefunden werden. Die Bestimmung der Kameraorientierung erfordert zusätzlich zur Merkmalszuordnung eine initiale Schätzung der Kamerakonstante, die jedoch im Zuge der Digitalisierung des analogen Bildes nicht mehr direkt zu ermitteln ist. Eine mögliche Lösung dieses Problems ist die Verwendung von drei Fluchtpunkten, die automatisiert im historischen Bild detektiert werden und aus denen dann die Kamerakonstante bestimmt werden kann. Die Kombination aus Schätzung der Kamerakonstante und robuster Merkmalszuordnung wird in den SfM Prozess integriert und erlaubt die Bestimmung der Kameraorientierung historischer Bilder. Auf Grundlage dieser Ergebnisse wird ein Arbeitsablauf konzipiert, der es ermöglicht, Archive mittels dieses photogrammetrischen Verfahrens direkt an 3D-Anwendungen anzubinden. Eine Suchanfrage in Archiven erfolgt üblicherweise über Schlagworte, die dann als Metadaten dem entsprechenden Objekt zugeordnet sein müssen. Eine Suche nach einem bestimmten Gebäude generiert deshalb u.a. Treffer zu Zeichnungen, Gemälden, Veranstaltungen, Innen- oder Detailansichten. Für die erfolgreiche Anwendung von SfM im stadträumlichen Kontext interessiert jedoch v.a. die fotografische Außenansicht des Gebäudes. Während die Bilder für ein einzelnes Gebäude von Hand sortiert werden können, ist dieser Prozess für mehrere Gebäude zu zeitaufwendig. Daher wird in Zusammenarbeit mit dem Competence Center for Scalable Data Services and Solutions (ScaDS) ein Ansatz entwickelt, um historische Fotografien über Bildähnlichkeiten zu filtern. Dieser ermöglicht zuverlässig über die Auswahl eines oder mehrerer Suchbilder die Suche nach inhaltsähnlichen Ansichten. Durch die Verknüpfung der inhaltsbasierten Suche mit dem SfM Ansatz ist es möglich, automatisiert für eine große Anzahl historischer Fotografien die Kameraparameter zu bestimmen. Das entwickelte Verfahren stellt eine deutliche Verbesserung im Vergleich zu kommerziellen und open-source SfM Standardlösungen dar. Das Ergebnis dieser Arbeit ist ein kompletter Arbeitsablauf vom Archiv bis zur Applikation, der automatisch Bilder filtert und diese orientiert. Die zu erwartende Genauigkeit von wenigen Metern für die Kameraposition sind ausreichend für die dargestellten Anwendungen in dieser Arbeit, bieten aber weiteres Verbesserungspotential. Eine Anbindung an Archive, die über Schnittstellen automatisch Fotografien und Positionen austauschen soll, befindet sich bereits in der Entwicklung. Dadurch ist es möglich, innere und äußere Orientierungsparameter direkt von der historischen Fotografie als Metadaten abzurufen, was neue Forschungsfelder eröffnet.:1 Introduction 1 1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4 1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature detection, feature description and feature matching . . . . . . 6 1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13 1.3.3.5 Joint methods . . . . . . . . . . . . . . . .

    Visual Odometer on Videos of Endoscopic Capsules (VOVEC)

    Get PDF
    Desde a sua introdução em 2001, as cápsulas endoscópicas tornaram-se o principal método para obter imagens do intestino - uma região de difícil acesso através de métodos de endoscopia tradicionais - revolucionando a maneira como os diagnósticos no campo das doenças do intestino delgado são feitos. Estas cápsulas com dimensões comparáveis a um comprimido vitamínico tiram partido de uma câmera wireless para criar vídeos de 8 a 10 horas do trato digestivo dos pacientes. Devido à longa duração dos vídeos produzidos, o diagnóstico humano é moroso, entediante e propício a erros. Para além disto, depois de encontrada uma lesão, a informação da sua localização é escassa e dependente de hardware externo, levando a que uma solução baseada apensa em software com precisão melhorada seja bastante desejada. Este trabalho advém desta necessidade e, tendo-a em mente, propomos a implementação de dois métodos baseados em deep-learning, visando melhorar em relação às limitações dos sistemas atuais de localização de cápsulas endoscópicas. Para treinar e testar as nossas redes neuronais, um dataset que contém 111 vídeos da cápsula PillCam SB3 e 338 da cápsula PillCam SB2 foi utilizado, cortesia do Centro Hospitalar do Porto (CHP).O primeiro método consiste numa simples estimação do deslocamento da cápsula ao longo do intestino delgado utilizando uma HomographyNet, uma abordagem de deep-learning supervisionado usada para o cálculo de homografias entre imagens.Já no segundo método uma posição relativa 3D da cápsula é fornecida ao longo do intestino delgado, recorrendo a um método não-supervisionado de deep-learning denominado SfMLearner. Este método combina uma DepthNet e uma PoseNet para aprender a profundidade da imagem e a posição da cápsula em simultâneo.Since its introduction in 2001, capsule endoscopy has become the leading screening method for the small bowel - a region not easily accessible with traditional endoscopy techniques - revolutionizing the way diagnostics work in the field of small bowel diseases. These capsules are vitamin-sized and leverage from a small wireless camera to create 8 to 10 hour videos of the patients digestive tract. Due to the long duration of the videos produced, the human-based diagnosis is elongated, tedious and error-prone. Moreover, once a lesion is found, the localization information is scarce and hardware dependent, entailing desirability for a software-only endoscopic capsule localization system with added precision. This work stems from this need and, bearing this in mind, we propose the implementation of two deep-learning based methods to improve upon the limitations of the techniques used so far for the capsule position estimation. To train and test our networks, a dataset of 111 PillCam SB3 and 338 PillCam SB2 videos were used, courtesy of Centro Hospitalar do Porto (CHP).The first method consists in a simple capsule displacement estimation throughout the small bowel utilizing HomographyNet, a deep learning supervised approach that is used for homography computation between images. (DeTone et al. (2016))Differently, the second proposed method is intended to provide a 3D position along the small intestine, utilizing a deep learning unsupervised approach labeled SfMLearner, which takes advantage of a combination between a DepthNet and a PoseNet to learn depth and ego-motion from video simultaneously. (Zhou et al. (2017)

    Defect Detection and Classification in Sewer Pipeline Inspection Videos Using Deep Neural Networks

    Get PDF
    Sewer pipelines as a critical civil infrastructure become a concern for municipalities as they are getting near to the end of their service lives. Meanwhile, new environmental laws and regulations, city expansions, and budget constraints make it harder to maintain these networks. On the other hand, access and inspect sewer pipelines by human-entry based methods are problematic and risky. Current practice for sewer pipeline assessment uses various types of equipment to inspect the condition of pipelines. One of the most used technologies for sewer pipelines inspection is Closed Circuit Television (CCTV). However, application of CCTV method in extensive sewer networks involves certified operators to inspect hours of videos, which is time-consuming, labor-intensive, and error prone. The main objective of this research is to develop a framework for automated defect detection and classification in sewer CCTV inspection videos using computer vision techniques and deep neural networks. This study presents innovative algorithms to deal with the complexity of feature extraction and pattern recognition in sewer inspection videos due to lighting conditions, illumination variations, and unknown patterns of various sewer defects. Therefore, this research includes two main sub-models to first identify and localize anomalies in sewer inspection videos, and in the next phase, detect and classify the defects among the recognized anomalous frames. In the first phase, an innovative approach is proposed for identifying the frames with potential anomalies and localizing them in the pipe segment which is being inspected. The normal and anomalous frames are classified utilizing a one-class support vector machine (OC-SVM). The proposed approach employs 3D Scale Invariant Feature Transform (SIFT) to extract spatio-temporal features and capture scene dynamic statistics in sewer CCTV videos. The OC-SVM is trained by the frame-features which are considered normal, and the outliers to this model are considered abnormal frames. In the next step, the identified anomalous frames are located by recognizing the present text information in them using an end-to-end text recognition approach. The proposed localization approach is performed in two steps, first the text regions are detected using maximally stable extremal regions (MSER) algorithm, then the text characters are recognized using a convolutional neural network (CNN). The performance of the proposed model is tested using videos from real-world sewer inspection reports, where the accuracies of 95% and 86% were achieved for anomaly detection and frame localization, respectively. Identifying the anomalous frames and excluding the normal frames from further analysis could reduce the time and cost of detection. It also ensures the accuracy and quality of assessment by reducing the number of neglected anomalous frames caused by operator error. In the second phase, a defect detection framework is proposed to provide defect detection and classification among the identified anomalous frames. First, a deep Convolutional Neural Network (CNN) which is pre-trained using transfer learning, is used as a feature extractor. In the next step, the remaining convolutional layers of the constructed model are trained by the provided dataset from various types of sewer defects to detect and classify defects in the anomalous frames. The proposed methodology was validated by referencing the ground truth data of a dataset including four defects, and the mAP of 81.3% was achieved. It is expected that the developed model can help sewer inspectors in much faster and more accurate pipeline inspection. The whole framework would decrease the condition assessment time and increase the accuracy of sewer assessment reports
    corecore