27 research outputs found

    Visionary Ophthalmics: Confluence of Computer Vision and Deep Learning for Ophthalmology

    Get PDF
    Ophthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have sprung into the forefront following advances in GPU hardware. This development raises important questions regarding how to best leverage insights from both modern deep learning approaches and more classical computer vision approaches for a given problem. In this dissertation, we tackle challenging computer vision problems in ophthalmology using methods all across this spectrum. Perhaps our most significant work is a highly successful iris registration algorithm for use in laser eye surgery. This algorithm relies on matching features extracted from the structure tensor and a Gabor wavelet – a classically driven approach that does not utilize modern machine learning. However, drawing on insight from the deep learning revolution, we demonstrate successful application of backpropagation to optimize the registration significantly faster than the alternative of relying on finite differences. Towards the other end of the spectrum, we also present a novel framework for improving RANSAC segmentation algorithms by utilizing a convolutional neural network (CNN) trained on a RANSAC-based loss function. Finally, we apply state-of-the-art deep learning methods to solve the problem of pathological fluid detection in optical coherence tomography images of the human retina, using a novel retina-specific data augmentation technique to greatly expand the data set. Altogether, our work demonstrates benefits of applying a holistic view of computer vision, which leverages deep learning and associated insights without neglecting techniques and insights from the previous era

    Integrasjon av et minimalistisk sett av sensorer for kartlegging og lokalisering av landbruksroboter

    Get PDF
    Robots have recently become ubiquitous in many aspects of daily life. For in-house applications there is vacuuming, mopping and lawn-mowing robots. Swarms of robots have been used in Amazon warehouses for several years. Autonomous driving cars, despite being set back by several safety issues, are undeniably becoming the standard of the automobile industry. Not just being useful for commercial applications, robots can perform various tasks, such as inspecting hazardous sites, taking part in search-and-rescue missions. Regardless of end-user applications, autonomy plays a crucial role in modern robots. The essential capabilities required for autonomous operations are mapping, localization and navigation. The goal of this thesis is to develop a new approach to solve the problems of mapping, localization, and navigation for autonomous robots in agriculture. This type of environment poses some unique challenges such as repetitive patterns, large-scale sparse features environments, in comparison to other scenarios such as urban/cities, where the abundance of good features such as pavements, buildings, road lanes, traffic signs, etc., exists. In outdoor agricultural environments, a robot can rely on a Global Navigation Satellite System (GNSS) to determine its whereabouts. It is often limited to the robot's activities to accessible GNSS signal areas. It would fail for indoor environments. In this case, different types of exteroceptive sensors such as (RGB, Depth, Thermal) cameras, laser scanner, Light Detection and Ranging (LiDAR) and proprioceptive sensors such as Inertial Measurement Unit (IMU), wheel-encoders can be fused to better estimate the robot's states. Generic approaches of combining several different sensors often yield superior estimation results but they are not always optimal in terms of cost-effectiveness, high modularity, reusability, and interchangeability. For agricultural robots, it is equally important for being robust for long term operations as well as being cost-effective for mass production. We tackle this challenge by exploring and selectively using a handful of sensors such as RGB-D cameras, LiDAR and IMU for representative agricultural environments. The sensor fusion algorithms provide high precision and robustness for mapping and localization while at the same time assuring cost-effectiveness by employing only the necessary sensors for a task at hand. In this thesis, we extend the LiDAR mapping and localization methods for normal urban/city scenarios to cope with the agricultural environments where the presence of slopes, vegetation, trees render the traditional approaches to fail. Our mapping method substantially reduces the memory footprint for map storing, which is important for large-scale farms. We show how to handle the localization problem in dynamic growing strawberry polytunnels by using only a stereo visual-inertial (VI) and depth sensor to extract and track only invariant features. This eliminates the need for remapping to deal with dynamic scenes. Also, for a demonstration of the minimalistic requirement for autonomous agricultural robots, we show the ability to autonomously traverse between rows in a difficult environment of zigzag-liked polytunnel using only a laser scanner. Furthermore, we present an autonomous navigation capability by using only a camera without explicitly performing mapping or localization. Finally, our mapping and localization methods are generic and platform-agnostic, which can be applied to different types of agricultural robots. All contributions presented in this thesis have been tested and validated on real robots in real agricultural environments. All approaches have been published or submitted in peer-reviewed conference papers and journal articles.Roboter har nylig blitt standard i mange deler av hverdagen. I hjemmet har vi støvsuger-, vaske- og gressklippende roboter. Svermer med roboter har blitt brukt av Amazons varehus i mange år. Autonome selvkjørende biler, til tross for å ha vært satt tilbake av sikkerhetshensyn, er udiskutabelt på vei til å bli standarden innen bilbransjen. Roboter har mer nytte enn rent kommersielt bruk. Roboter kan utføre forskjellige oppgaver, som å inspisere farlige områder og delta i leteoppdrag. Uansett hva sluttbrukeren velger å gjøre, spiller autonomi en viktig rolle i moderne roboter. De essensielle egenskapene for autonome operasjoner i landbruket er kartlegging, lokalisering og navigering. Denne type miljø gir spesielle utfordringer som repetitive mønstre og storskala miljø med få landskapsdetaljer, sammenlignet med andre steder, som urbane-/bymiljø, hvor det finnes mange landskapsdetaljer som fortau, bygninger, trafikkfelt, trafikkskilt, etc. I utendørs jordbruksmiljø kan en robot bruke Global Navigation Satellite System (GNSS) til å navigere sine omgivelser. Dette begrenser robotens aktiviteter til områder med tilgjengelig GNSS signaler. Dette vil ikke fungere i miljøer innendørs. I ett slikt tilfelle vil reseptorer mot det eksterne miljø som (RGB-, dybde-, temperatur-) kameraer, laserskannere, «Light detection and Ranging» (LiDAR) og propriopsjonære detektorer som treghetssensorer (IMU) og hjulenkodere kunne brukes sammen for å bedre kunne estimere robotens tilstand. Generisk kombinering av forskjellige sensorer fører til overlegne estimeringsresultater, men er ofte suboptimale med hensyn på kostnadseffektivitet, moduleringingsgrad og utbyttbarhet. For landbruksroboter så er det like viktig med robusthet for lang tids bruk som kostnadseffektivitet for masseproduksjon. Vi taklet denne utfordringen med å utforske og selektivt velge en håndfull sensorer som RGB-D kameraer, LiDAR og IMU for representative landbruksmiljø. Algoritmen som kombinerer sensorsignalene gir en høy presisjonsgrad og robusthet for kartlegging og lokalisering, og gir samtidig kostnadseffektivitet med å bare bruke de nødvendige sensorene for oppgaven som skal utføres. I denne avhandlingen utvider vi en LiDAR kartlegging og lokaliseringsmetode normalt brukt i urbane/bymiljø til å takle landbruksmiljø, hvor hellinger, vegetasjon og trær gjør at tradisjonelle metoder mislykkes. Vår metode reduserer signifikant lagringsbehovet for kartlagring, noe som er viktig for storskala gårder. Vi viser hvordan lokaliseringsproblemet i dynamisk voksende jordbær-polytuneller kan løses ved å bruke en stereo visuel inertiel (VI) og en dybdesensor for å ekstrahere statiske objekter. Dette eliminerer behovet å kartlegge på nytt for å klare dynamiske scener. I tillegg demonstrerer vi de minimalistiske kravene for autonome jordbruksroboter. Vi viser robotens evne til å bevege seg autonomt mellom rader i ett vanskelig miljø med polytuneller i sikksakk-mønstre ved bruk av kun en laserskanner. Videre presenterer vi en autonom navigeringsevne ved bruk av kun ett kamera uten å eksplisitt kartlegge eller lokalisere. Til slutt viser vi at kartleggings- og lokaliseringsmetodene er generiske og platform-agnostiske, noe som kan brukes med flere typer jordbruksroboter. Alle bidrag presentert i denne avhandlingen har blitt testet og validert med ekte roboter i ekte landbruksmiljø. Alle forsøk har blitt publisert eller sendt til fagfellevurderte konferansepapirer og journalartikler

    Computer Vision Applications for Autonomous Aerial Vehicles

    Get PDF
    Undoubtedly, unmanned aerial vehicles (UAVs) have experienced a great leap forward over the last decade. It is not surprising anymore to see a UAV being used to accomplish a certain task, which was previously carried out by humans or a former technology. The proliferation of special vision sensors, such as depth cameras, lidar sensors and thermal cameras, and major breakthroughs in computer vision and machine learning fields accelerated the advance of UAV research and technology. However, due to certain unique challenges imposed by UAVs, such as limited payload capacity, unreliable communication link with the ground stations and data safety, UAVs are compelled to perform many tasks on their onboard embedded processing units, which makes it difficult to readily implement the most advanced algorithms on UAVs. This thesis focuses on computer vision and machine learning applications for UAVs equipped with onboard embedded platforms, and presents algorithms that utilize data from multiple modalities. The presented work covers a broad spectrum of algorithms and applications for UAVs, such as indoor UAV perception, 3D understanding with deep learning, UAV localization, and structural inspection with UAVs. Visual guidance and scene understanding without relying on pre-installed tags or markers is the desired approach for fully autonomous navigation of UAVs in conjunction with the global positioning systems (GPS), or especially when GPS information is either unavailable or unreliable. Thus, semantic and geometric understanding of the surroundings become vital to utilize vision as guidance in the autonomous navigation pipelines. In this context, first, robust altitude measurement, safe landing zone detection and doorway detection methods are presented for autonomous UAVs operating indoors. These approaches are implemented on Google Project Tango platform, which is an embedded platform equipped with various sensors including a depth camera. Next, a modified capsule network for 3D object classification is presented with weight optimization so that the network can be fit and run on memory-constrained platforms. Then, a semantic segmentation method for 3D point clouds is developed for a more general visual perception on a UAV equipped with a 3D vision sensor. Next, this thesis presents algorithms for structural health monitoring applications involving UAVs. First, a 3D point cloud-based, drift-free and lightweight localization method is presented for depth camera-equipped UAVs that perform bridge inspection, where GPS signal is unreliable. Next, a thermal leakage detection algorithm is presented for detecting thermal anomalies on building envelopes using aerial thermography from UAVs. Then, building on our thermal anomaly identification expertise gained on the previous task, a novel performance anomaly identification metric (AIM) is presented for more reliable performance evaluation of thermal anomaly identification methods

    Perception of Unstructured Environments for Autonomous Off-Road Vehicles

    Get PDF
    Autonome Fahrzeuge benötigen die Fähigkeit zur Perzeption als eine notwendige Voraussetzung für eine kontrollierbare und sichere Interaktion, um ihre Umgebung wahrzunehmen und zu verstehen. Perzeption für strukturierte Innen- und Außenumgebungen deckt wirtschaftlich lukrative Bereiche, wie den autonomen Personentransport oder die Industrierobotik ab, während die Perzeption unstrukturierter Umgebungen im Forschungsfeld der Umgebungswahrnehmung stark unterrepräsentiert ist. Die analysierten unstrukturierten Umgebungen stellen eine besondere Herausforderung dar, da die vorhandenen, natürlichen und gewachsenen Geometrien meist keine homogene Struktur aufweisen und ähnliche Texturen sowie schwer zu trennende Objekte dominieren. Dies erschwert die Erfassung dieser Umgebungen und deren Interpretation, sodass Perzeptionsmethoden speziell für diesen Anwendungsbereich konzipiert und optimiert werden müssen. In dieser Dissertation werden neuartige und optimierte Perzeptionsmethoden für unstrukturierte Umgebungen vorgeschlagen und in einer ganzheitlichen, dreistufigen Pipeline für autonome Geländefahrzeuge kombiniert: Low-Level-, Mid-Level- und High-Level-Perzeption. Die vorgeschlagenen klassischen Methoden und maschinellen Lernmethoden (ML) zur Perzeption bzw.~Wahrnehmung ergänzen sich gegenseitig. Darüber hinaus ermöglicht die Kombination von Perzeptions- und Validierungsmethoden für jede Ebene eine zuverlässige Wahrnehmung der möglicherweise unbekannten Umgebung, wobei lose und eng gekoppelte Validierungsmethoden kombiniert werden, um eine ausreichende, aber flexible Bewertung der vorgeschlagenen Perzeptionsmethoden zu gewährleisten. Alle Methoden wurden als einzelne Module innerhalb der in dieser Arbeit vorgeschlagenen Perzeptions- und Validierungspipeline entwickelt, und ihre flexible Kombination ermöglicht verschiedene Pipelinedesigns für eine Vielzahl von Geländefahrzeugen und Anwendungsfällen je nach Bedarf. Low-Level-Perzeption gewährleistet eine eng gekoppelte Konfidenzbewertung für rohe 2D- und 3D-Sensordaten, um Sensorausfälle zu erkennen und eine ausreichende Genauigkeit der Sensordaten zu gewährleisten. Darüber hinaus werden neuartige Kalibrierungs- und Registrierungsansätze für Multisensorsysteme in der Perzeption vorgestellt, welche lediglich die Struktur der Umgebung nutzen, um die erfassten Sensordaten zu registrieren: ein halbautomatischer Registrierungsansatz zur Registrierung mehrerer 3D~Light Detection and Ranging (LiDAR) Sensoren und ein vertrauensbasiertes Framework, welches verschiedene Registrierungsmethoden kombiniert und die Registrierung verschiedener Sensoren mit unterschiedlichen Messprinzipien ermöglicht. Dabei validiert die Kombination mehrerer Registrierungsmethoden die Registrierungsergebnisse in einer eng gekoppelten Weise. Mid-Level-Perzeption ermöglicht die 3D-Rekonstruktion unstrukturierter Umgebungen mit zwei Verfahren zur Schätzung der Disparität von Stereobildern: ein klassisches, korrelationsbasiertes Verfahren für Hyperspektralbilder, welches eine begrenzte Menge an Test- und Validierungsdaten erfordert, und ein zweites Verfahren, welches die Disparität aus Graustufenbildern mit neuronalen Faltungsnetzen (CNNs) schätzt. Neuartige Disparitätsfehlermetriken und eine Evaluierungs-Toolbox für die 3D-Rekonstruktion von Stereobildern ergänzen die vorgeschlagenen Methoden zur Disparitätsschätzung aus Stereobildern und ermöglichen deren lose gekoppelte Validierung. High-Level-Perzeption konzentriert sich auf die Interpretation von einzelnen 3D-Punktwolken zur Befahrbarkeitsanalyse, Objekterkennung und Hindernisvermeidung. Eine Domänentransferanalyse für State-of-the-art-Methoden zur semantischen 3D-Segmentierung liefert Empfehlungen für eine möglichst exakte Segmentierung in neuen Zieldomänen ohne eine Generierung neuer Trainingsdaten. Der vorgestellte Trainingsansatz für 3D-Segmentierungsverfahren mit CNNs kann die benötigte Menge an Trainingsdaten weiter reduzieren. Methoden zur Erklärbarkeit künstlicher Intelligenz vor und nach der Modellierung ermöglichen eine lose gekoppelte Validierung der vorgeschlagenen High-Level-Methoden mit Datensatzbewertung und modellunabhängigen Erklärungen für CNN-Vorhersagen. Altlastensanierung und Militärlogistik sind die beiden Hauptanwendungsfälle in unstrukturierten Umgebungen, welche in dieser Arbeit behandelt werden. Diese Anwendungsszenarien zeigen auch, wie die Lücke zwischen der Entwicklung einzelner Methoden und ihrer Integration in die Verarbeitungskette für autonome Geländefahrzeuge mit Lokalisierung, Kartierung, Planung und Steuerung geschlossen werden kann. Zusammenfassend lässt sich sagen, dass die vorgeschlagene Pipeline flexible Perzeptionslösungen für autonome Geländefahrzeuge bietet und die begleitende Validierung eine exakte und vertrauenswürdige Perzeption unstrukturierter Umgebungen gewährleistet
    corecore