45 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Computer Science 2019 APR Self-Study & Documents

    Get PDF
    UNM Computer Science APR self-study report and review team report for Spring 2019, fulfilling requirements of the Higher Learning Commission

    From Restoring Human Vision to Enhancing Computer Vision

    Get PDF
    The central theme of this work is enabling vision, which includes two subtopics: restoring vision for blind humans, and enhancing computer vision models in visual recognition. Chapter 1 first provides a gentle introduction to relevant high level principles of human visual computations and summarizes two fundamental questions that vision answers: "what" and "where." Chapters 2, 3, and 4 contain three published projects that are anchored by those two fundamental questions. Chapter 2 introduces a cognitive assistant to restore visual function for blind humans by focusing on an interface powered by audio augmented reality. The assistant communicates the "what" and "where" aspects of visual scenes by a combination of natural language and spatialized sound. We experimentally demonstrated that the assistant enables many aspects of visual functions for naive blind users. Chapters 3 and 4 develop data augmentation methods to address the data inefficiency problem in neural network based computer visual recognition models. In Chapter 3, a 3D-simulation based data augmentation method is developed for improving the generalization of visual classification models for rare classes. In Chapter 4, a fast and efficient data augmentation method is developed for the newly formulated panoptic segmentation task. The method improves performance of state-of-the-art panoptic segmentation models and generalizes across dataset domains, sizes, model architectures, and backbones.</p

    On the use of smartphones as novel photogrammetric water gauging instruments: Developing tools for crowdsourcing water levels

    Get PDF
    The term global climate change is omnipresent since the beginning of the last decade. Changes in the global climate are associated with an increase in heavy rainfalls that can cause nearly unpredictable flash floods. Consequently, spatio-temporally high-resolution monitoring of rivers becomes increasingly important. Water gauging stations continuously and precisely measure water levels. However, they are rather expensive in purchase and maintenance and are preferably installed at water bodies relevant for water management. Small-scale catchments remain often ungauged. In order to increase the data density of hydrometric monitoring networks and thus to improve the prediction quality of flood events, new, flexible and cost-effective water level measurement technologies are required. They should be oriented towards the accuracy requirements of conventional measurement systems and facilitate the observation of water levels at virtually any time, even at the smallest rivers. A possible solution is the development of a photogrammetric smartphone application (app) for crowdsourcing water levels, which merely requires voluntary users to take pictures of a river section to determine the water level. Today’s smartphones integrate high-resolution cameras, a variety of sensors, powerful processors, and mass storage. However, they are designed for the mass market and use low-cost hardware that cannot comply with the quality of geodetic measurement technology. In order to investigate the potential for mobile measurement applications, research was conducted on the smartphone as a photogrammetric measurement instrument as part of the doctoral project. The studies deal with the geometric stability of smartphone cameras regarding device-internal temperature changes and with the accuracy potential of rotation parameters measured with smartphone sensors. The results show a high, temperature-related variability of the interior orientation parameters, which is why the calibration of the camera should be carried out during the immediate measurement. The results of the sensor investigations show considerable inaccuracies when measuring rotation parameters, especially the compass angle (errors up to 90° were observed). The same applies to position parameters measured by global navigation satellite system (GNSS) receivers built into smartphones. According to the literature, positional accuracies of about 5 m are possible in best conditions. Otherwise, errors of several 10 m are to be expected. As a result, direct georeferencing of image measurements using current smartphone technology should be discouraged. In consideration of the results, the water gauging app Open Water Levels (OWL) was developed, whose methodological development and implementation constituted the core of the thesis project. OWL enables the flexible measurement of water levels via crowdsourcing without requiring additional equipment or being limited to specific river sections. Data acquisition and processing take place directly in the field, so that the water level information is immediately available. In practice, the user captures a short time-lapse sequence of a river bank with OWL, which is used to calculate a spatio-temporal texture that enables the detection of the water line. In order to translate the image measurement into 3D object space, a synthetic, photo-realistic image of the situation is created from existing 3D data of the river section to be investigated. Necessary approximations of the image orientation parameters are measured by smartphone sensors and GNSS. The assignment of camera image and synthetic image allows for the determination of the interior and exterior orientation parameters by means of space resection and finally the transfer of the image-measured 2D water line into the 3D object space to derive the prevalent water level in the reference system of the 3D data. In comparison with conventionally measured water levels, OWL reveals an accuracy potential of 2 cm on average, provided that synthetic image and camera image exhibit consistent image contents and that the water line can be reliably detected. In the present dissertation, related geometric and radiometric problems are comprehensively discussed. Furthermore, possible solutions, based on advancing developments in smartphone technology and image processing as well as the increasing availability of 3D reference data, are presented in the synthesis of the work. The app Open Water Levels, which is currently available as a beta version and has been tested on selected devices, provides a basis, which, with continuous further development, aims to achieve a final release for crowdsourcing water levels towards the establishment of new and the expansion of existing monitoring networks.Der Begriff des globalen Klimawandels ist seit Beginn des letzten Jahrzehnts allgegenwĂ€rtig. Die VerĂ€nderung des Weltklimas ist mit einer Zunahme von Starkregenereignissen verbunden, die nahezu unvorhersehbare Sturzfluten verursachen können. Folglich gewinnt die raumzeitlich hochaufgelöste Überwachung von FließgewĂ€ssern zunehmend an Bedeutung. Pegelmessstationen erfassen kontinuierlich und prĂ€zise WasserstĂ€nde, sind jedoch in Anschaffung und Wartung sehr teuer und werden vorzugsweise an wasserwirtschaftlich-relevanten GewĂ€ssern installiert. Kleinere GewĂ€sser bleiben hĂ€ufig unbeobachtet. Um die Datendichte hydrometrischer Messnetze zu erhöhen und somit die VorhersagequalitĂ€t von Hochwasserereignissen zu verbessern, sind neue, kostengĂŒnstige und flexibel einsetzbare Wasserstandsmesstechnologien erforderlich. Diese sollten sich an den Genauigkeitsanforderungen konventioneller Messsysteme orientieren und die Beobachtung von WasserstĂ€nden zu praktisch jedem Zeitpunkt, selbst an den kleinsten FlĂŒssen, ermöglichen. Ein Lösungsvorschlag ist die Entwicklung einer photogrammetrischen Smartphone-Anwendung (App) zum Crowdsourcing von WasserstĂ€nden mit welcher freiwillige Nutzer lediglich Bilder eines Flussabschnitts aufnehmen mĂŒssen, um daraus den Wasserstand zu bestimmen. Heutige Smartphones integrieren hochauflösende Kameras, eine Vielzahl von Sensoren, leistungsfĂ€hige Prozessoren und Massenspeicher. Sie sind jedoch fĂŒr den Massenmarkt konzipiert und verwenden kostengĂŒnstige Hardware, die nicht der QualitĂ€t geodĂ€tischer Messtechnik entsprechen kann. Um das Einsatzpotential in mobilen Messanwendungen zu eruieren, sind Untersuchungen zum Smartphone als photogrammetrisches Messinstrument im Rahmen des Promotionsprojekts durchgefĂŒhrt worden. Die Studien befassen sich mit der geometrischen StabilitĂ€t von Smartphone-Kameras bezĂŒglich gerĂ€teinterner TemperaturĂ€nderungen und mit dem Genauigkeitspotential von mit Smartphone-Sensoren gemessenen Rotationsparametern. Die Ergebnisse zeigen eine starke, temperaturbedingte VariabilitĂ€t der inneren Orientierungsparameter, weshalb die Kalibrierung der Kamera zum unmittelbaren Messzeitpunkt erfolgen sollte. Die Ergebnisse der Sensoruntersuchungen zeigen große Ungenauigkeiten bei der Messung der Rotationsparameter, insbesondere des Kompasswinkels (Fehler von bis zu 90° festgestellt). Selbiges gilt auch fĂŒr Positionsparameter, gemessen durch in Smartphones eingebaute EmpfĂ€nger fĂŒr Signale globaler Navigationssatellitensysteme (GNSS). Wie aus der Literatur zu entnehmen ist, lassen sich unter besten Bedingungen Lagegenauigkeiten von etwa 5 m erreichen. Abseits davon sind Fehler von mehreren 10 m zu erwarten. Infolgedessen ist von einer direkten Georeferenzierung von Bildmessungen mittels aktueller Smartphone-Technologie abzusehen. Unter BerĂŒcksichtigung der gewonnenen Erkenntnisse wurde die Pegel-App Open Water Levels (OWL) entwickelt, deren methodische Entwicklung und Implementierung den Kern der Arbeit bildete. OWL ermöglicht die flexible Messung von WasserstĂ€nden via Crowdsourcing, ohne dabei zusĂ€tzliche AusrĂŒstung zu verlangen oder auf spezifische Flussabschnitte beschrĂ€nkt zu sein. Datenaufnahme und Verarbeitung erfolgen direkt im Feld, so dass die Pegelinformationen sofort verfĂŒgbar sind. Praktisch nimmt der Anwender mit OWL eine kurze Zeitraffersequenz eines Flussufers auf, die zur Berechnung einer Raum-Zeit-Textur dient und die Erkennung der Wasserlinie ermöglicht. Zur Übersetzung der Bildmessung in den 3D-Objektraum wird aus vorhandenen 3D-Daten des zu untersuchenden Flussabschnittes ein synthetisches, photorealistisches Abbild der Aufnahmesituation erstellt. Erforderliche NĂ€herungen der Bildorientierungsparameter werden von Smartphone-Sensoren und GNSS gemessen. Die Zuordnung von Kamerabild und synthetischem Bild erlaubt die Bestimmung der inneren und Ă€ußeren Orientierungsparameter mittels rĂ€umlichen RĂŒckwĂ€rtsschnitt. Nach Rekonstruktion der Aufnahmesituation lĂ€sst sich die im Bild gemessene 2D-Wasserlinie in den 3D-Objektraum projizieren und der vorherrschende Wasserstand im Referenzsystem der 3D-Daten ableiten. Im Soll-Ist-Vergleich mit konventionell gemessenen Pegeldaten zeigt OWL ein erreichbares Genauigkeitspotential von durchschnittlich 2 cm, insofern synthetisches und reales Kamerabild einen möglichst konsistenten Bildinhalt aufweisen und die Wasserlinie zuverlĂ€ssig detektiert werden kann. In der vorliegenden Dissertation werden damit verbundene geometrische und radiometrische Probleme ausfĂŒhrlich diskutiert sowie LösungsansĂ€tze, auf der Basis fortschreitender Entwicklungen von Smartphone-Technologie und Bildverarbeitung sowie der zunehmenden VerfĂŒgbarkeit von 3D-Referenzdaten, in der Synthese der Arbeit vorgestellt. Mit der gegenwĂ€rtig als Betaversion vorliegenden und auf ausgewĂ€hlten GerĂ€ten getesteten App Open Water Levels wurde eine Basis geschaffen, die mit kontinuierlicher Weiterentwicklung eine finale Freigabe fĂŒr das Crowdsourcing von WasserstĂ€nden und damit den Aufbau neuer und die Erweiterung bestehender Monitoring-Netzwerke anstrebt

    Proceedings of the OSM Science 2023

    Get PDF
    Proceedings of the OSM Science at State of the Map Europe 202

    Efficient Dense Registration, Segmentation, and Modeling Methods for RGB-D Environment Perception

    Get PDF
    One perspective for artificial intelligence research is to build machines that perform tasks autonomously in our complex everyday environments. This setting poses challenges to the development of perception skills: A robot should be able to perceive its location and objects in its surrounding, while the objects and the robot itself could also be moving. Objects may not only be composed of rigid parts, but could be non-rigidly deformable or appear in a variety of similar shapes. Furthermore, it could be relevant to the task to observe object semantics. For a robot acting fluently and immediately, these perception challenges demand efficient methods. This theses presents novel approaches to robot perception with RGB-D sensors. It develops efficient registration, segmentation, and modeling methods for scene and object perception. We propose multi-resolution surfel maps as a concise representation for RGB-D measurements. We develop probabilistic registration methods that handle rigid scenes, scenes with multiple rigid parts that move differently, and scenes that undergo non-rigid deformations. We use these methods to learn and perceive 3D models of scenes and objects in both static and dynamic environments. For learning models of static scenes, we propose a real-time capable simultaneous localization and mapping approach. It aligns key views in RGB-D video using our rigid registration method and optimizes the pose graph of the key views. The acquired models are then perceived in live images through detection and tracking within a Bayesian filtering framework. An assumption frequently made for environment mapping is that the observed scene remains static during the mapping process. Through rigid multi-body registration, we take advantage of releasing this assumption: Our registration method segments views into parts that move independently between the views and simultaneously estimates their motion. Within simultaneous motion segmentation, localization, and mapping, we separate scenes into objects by their motion. Our approach acquires 3D models of objects and concurrently infers hierarchical part relations between them using probabilistic reasoning. It can be applied for interactive learning of objects and their part decomposition. Endowing robots with manipulation skills for a large variety of objects is a tedious endeavor if the skill is programmed for every instance of an object class. Furthermore, slight deformations of an instance could not be handled by an inflexible program. Deformable registration is useful to perceive such shape variations, e.g., between specific instances of a tool. We develop an efficient deformable registration method and apply it for the transfer of robot manipulation skills between varying object instances. On the object-class level, we segment images using random decision forest classifiers in real-time. The probabilistic labelings of individual images are fused in 3D semantic maps within a Bayesian framework. We combine our object-class segmentation method with simultaneous localization and mapping to achieve online semantic mapping in real-time. The methods developed in this thesis are evaluated in experiments on publicly available benchmark datasets and novel own datasets. We publicly demonstrate several of our perception approaches within integrated robot systems in the mobile manipulation context.Effiziente Dichte Registrierungs-, Segmentierungs- und Modellierungsmethoden fĂŒr die RGB-D Umgebungswahrnehmung In dieser Arbeit beschĂ€ftigen wir uns mit Herausforderungen der visuellen Wahrnehmung fĂŒr intelligente Roboter in Alltagsumgebungen. Solche Roboter sollen sich selbst in ihrer Umgebung zurechtfinden, und Wissen ĂŒber den Verbleib von Objekten erwerben können. Die Schwierigkeit dieser Aufgaben erhöht sich in dynamischen Umgebungen, in denen ein Roboter die Bewegung einzelner Teile differenzieren und auch wahrnehmen muss, wie sich diese Teile bewegen. Bewegt sich ein Roboter selbstĂ€ndig in dieser Umgebung, muss er auch seine eigene Bewegung von der VerĂ€nderung der Umgebung unterscheiden. Szenen können sich aber nicht nur durch die Bewegung starrer Teile verĂ€ndern. Auch die Teile selbst können ihre Form in nicht-rigider Weise Ă€ndern. Eine weitere Herausforderung stellt die semantische Interpretation von Szenengeometrie und -aussehen dar. Damit intelligente Roboter unmittelbar und flĂŒssig handeln können, sind effiziente Algorithmen fĂŒr diese Wahrnehmungsprobleme erforderlich. Im ersten Teil dieser Arbeit entwickeln wir effiziente Methoden zur ReprĂ€sentation und Registrierung von RGB-D Messungen. ZunĂ€chst stellen wir Multi-Resolutions-OberflĂ€chenelement-Karten (engl. multi-resolution surfel maps, MRSMaps) als eine kompakte ReprĂ€sentation von RGB-D Messungen vor, die unseren effizienten Registrierungsmethoden zugrunde liegt. Bilder können effizient in dieser ReprĂ€sentation aggregiert werde, wobei auch mehrere Bilder aus verschiedenen Blickpunkten integriert werden können, um Modelle von Szenen und Objekte aus vielfĂ€ltigen Ansichten darzustellen. FĂŒr die effiziente, robuste und genaue Registrierung von MRSMaps wird eine Methode vorgestellt, die Rigidheit der betrachteten Szene voraussetzt. Die Registrierung schĂ€tzt die Kamerabewegung zwischen den Bildern und gewinnt ihre Effizienz durch die Ausnutzung der kompakten multi-resolutionalen Darstellung der Karten. Die Registrierungsmethode erzielt hohe Bildverarbeitungsraten auf einer CPU. Wir demonstrieren hohe Effizienz, Genauigkeit und Robustheit unserer Methode im Vergleich zum bisherigen Stand der Forschung auf VergleichsdatensĂ€tzen. In einem weiteren Registrierungsansatz lösen wir uns von der Annahme, dass die betrachtete Szene zwischen Bildern statisch ist. Wir erlauben nun, dass sich rigide Teile der Szene bewegen dĂŒrfen, und erweitern unser rigides Registrierungsverfahren auf diesen Fall. Unser Ansatz segmentiert das Bild in Bereiche einzelner Teile, die sich unterschiedlich zwischen Bildern bewegen. Wir demonstrieren hohe Segmentierungsgenauigkeit und Genauigkeit in der BewegungsschĂ€tzung unter Echtzeitbedingungen fĂŒr die Verarbeitung. Schließlich entwickeln wir ein Verfahren fĂŒr die Wahrnehmung von nicht-rigiden Deformationen zwischen zwei MRSMaps. Auch hier nutzen wir die multi-resolutionale Struktur in den Karten fĂŒr ein effizientes Registrieren von grob zu fein. Wir schlagen Methoden vor, um aus den geschĂ€tzten Deformationen die lokale Bewegung zwischen den Bildern zu berechnen. Wir evaluieren Genauigkeit und Effizienz des Registrierungsverfahrens. Der zweite Teil dieser Arbeit widmet sich der Verwendung unserer KartenreprĂ€sentation und Registrierungsmethoden fĂŒr die Wahrnehmung von Szenen und Objekten. Wir verwenden MRSMaps und unsere rigide Registrierungsmethode, um dichte 3D Modelle von Szenen und Objekten zu lernen. Die rĂ€umlichen Beziehungen zwischen SchlĂŒsselansichten, die wir durch Registrierung schĂ€tzen, werden in einem Simultanen Lokalisierungs- und Kartierungsverfahren (engl. simultaneous localization and mapping, SLAM) gegeneinander abgewogen, um die Blickposen der SchlĂŒsselansichten zu schĂ€tzen. FĂŒr das Verfolgen der Kamerapose bezĂŒglich der Modelle in Echtzeit, kombinieren wir die Genauigkeit unserer Registrierung mit der Robustheit von Partikelfiltern. Zu Beginn der Posenverfolgung, oder wenn das Objekt aufgrund von Verdeckungen oder extremen Bewegungen nicht weiter verfolgt werden konnte, initialisieren wir das Filter durch Objektdetektion. Anschließend wenden wir unsere erweiterten Registrierungsverfahren fĂŒr die Wahrnehmung in nicht-rigiden Szenen und fĂŒr die Übertragung von ObjekthandhabungsfĂ€higkeiten von Robotern an. Wir erweitern unseren rigiden Kartierungsansatz auf dynamische Szenen, in denen sich rigide Teile bewegen. Die Bewegungssegmente in SchlĂŒsselansichten werden zueinander in Bezug gesetzt, um Äquivalenz- und Teilebeziehungen von Objekten probabilistisch zu inferieren, denen die Segmente entsprechen. Auch hier liefert unsere Registrierungsmethode die Bewegung der Kamera bezĂŒglich der Objekte, die wir in einem SLAM Verfahren optimieren. Aus diesen Blickposen wiederum können wir die Bewegungssegmente in dichten Objektmodellen vereinen. Objekte einer Klasse teilen oft eine gemeinsame Topologie von funktionalen Elementen, die durch Formkorrespondenzen ermittelt werden kann. Wir verwenden unsere deformierbare Registrierung, um solche Korrespondenzen zu finden und die Handhabung eines Objektes durch einen Roboter auf neue Objektinstanzen derselben Klasse zu ĂŒbertragen. Schließlich entwickeln wir einen echtzeitfĂ€higen Ansatz, der Kategorien von Objekten in RGB-D Bildern erkennt und segmentiert. Die Segmentierung basiert auf Ensemblen randomisierter EntscheidungsbĂ€ume, die Geometrie- und Texturmerkmale zur Klassifikation verwenden. Wir fusionieren Segmentierungen von Einzelbildern einer Szene aus mehreren Ansichten in einer semantischen Objektklassenkarte mit Hilfe unseres SLAM-Verfahrens. Die vorgestellten Methoden werden auf öffentlich verfĂŒgbaren VergleichsdatensĂ€tzen und eigenen DatensĂ€tzen evaluiert. Einige unserer AnsĂ€tze wurden auch in integrierten Robotersystemen fĂŒr mobile Objekthantierungsaufgaben öffentlich demonstriert. Sie waren ein wichtiger Bestandteil fĂŒr das Gewinnen der RoboCup-Roboterwettbewerbe in der RoboCup@Home Liga in den Jahren 2011, 2012 und 2013

    Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

    Get PDF
    abstract: Parents fulfill a pivotal role in early childhood development of social and communication skills. In children with autism, the development of these skills can be delayed. Applied behavioral analysis (ABA) techniques have been created to aid in skill acquisition. Among these, pivotal response treatment (PRT) has been empirically shown to foster improvements. Research into PRT implementation has also shown that parents can be trained to be effective interventionists for their children. The current difficulty in PRT training is how to disseminate training to parents who need it, and how to support and motivate practitioners after training. Evaluation of the parents’ fidelity to implementation is often undertaken using video probes that depict the dyadic interaction occurring between the parent and the child during PRT sessions. These videos are time consuming for clinicians to process, and often result in only minimal feedback for the parents. Current trends in technology could be utilized to alleviate the manual cost of extracting data from the videos, affording greater opportunities for providing clinician created feedback as well as automated assessments. The naturalistic context of the video probes along with the dependence on ubiquitous recording devices creates a difficult scenario for classification tasks. The domain of the PRT video probes can be expected to have high levels of both aleatory and epistemic uncertainty. Addressing these challenges requires examination of the multimodal data along with implementation and evaluation of classification algorithms. This is explored through the use of a new dataset of PRT videos. The relationship between the parent and the clinician is important. The clinician can provide support and help build self-efficacy in addition to providing knowledge and modeling of treatment procedures. Facilitating this relationship along with automated feedback not only provides the opportunity to present expert feedback to the parent, but also allows the clinician to aid in personalizing the classification models. By utilizing a human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the classification models by providing additional labeled samples. This will allow the system to improve classification and provides a person-centered approach to extracting multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    17th SC@RUG 2020 proceedings 2019-2020

    Get PDF

    Fuzzy logic based approach for object feature tracking

    Get PDF
    This thesis introduces a novel technique for feature tracking in sequences of greyscale images based on fuzzy logic. A versatile and modular methodology for feature tracking using fuzzy sets and inference engines is presented. Moreover, an extension of this methodology to perform the correct tracking of multiple features is also presented. To perform feature tracking three membership functions are initially defined. A membership function related to the distinctive property of the feature to be tracked. A membership function is related to the fact of considering that the feature has smooth movement between each image sequence and a membership function concerns its expected future location. Applying these functions to the image pixels, the corresponding fuzzy sets are obtained and then mathematically manipulated to serve as input to an inference engine. Situations such as occlusion or detection failure of features are overcome using estimated positions calculated using a motion model and a state vector of the feature. This methodology was previously applied to track a single feature identified by the user. Several performance tests were conducted on sequences of both synthetic and real images. Experimental results are presented, analysed and discussed. Although this methodology could be applied directly to multiple feature tracking, an extension of this methodology has been developed within that purpose. In this new method, the processing sequence of each feature is dynamic and hierarchical. Dynamic because this sequence can change over time and hierarchical because features with higher priority will be processed first. Thus, the process gives preference to features whose location are easier to predict compared with features whose knowledge of their behavior is less predictable. When this priority value becomes too low, the feature will no longer tracked by the algorithm. To access the performance of this new approach, sequences of images where several features specified by the user are to be tracked were used. In the final part of this work, conclusions drawn from this work as well as the definition of some guidelines for future research are presented.Nesta tese Ă© introduzida uma nova tĂ©cnica de seguimento de pontos caracterĂ­sticos de objectos em sequĂȘncias de imagens em escala de cinzentos baseada em lĂłgica difusa. É apresentada uma metodologia versĂĄtil e modular para o seguimento de objectos utilizando conjuntos difusos e motores de inferĂȘncia. É tambĂ©m apresentada uma extensĂŁo desta metodologia para o correcto seguimento de mĂșltiplos pontos caracterĂ­sticos. Para se realizar o seguimento sĂŁo definidas inicialmente trĂȘs funçÔes de pertença. Uma função de pertença estĂĄ relacionada com a propriedade distintiva do objecto que desejamos seguir, outra estĂĄ relacionada com o facto de se considerar que o objecto tem uma movimentação suave entre cada imagem da sequĂȘncia e outra função de pertença referente Ă  sua previsĂ­vel localização futura. Aplicando estas funçÔes de pertença aos pĂ­xeis da imagem, obtĂȘm-se os correspondentes conjuntos difusos, que serĂŁo manipulados matematicamente e servirĂŁo como entrada num motor de inferĂȘncia. SituaçÔes como a oclusĂŁo ou falha na detecção dos pontos caracterĂ­sticos sĂŁo ultrapassadas utilizando posiçÔes estimadas calculadas a partir do modelo de movimento e a um vector de estados do objecto. Esta metodologia foi inicialmente aplicada no seguimento de um objecto assinalado pelo utilizador. Foram realizados vĂĄrios testes de desempenho em sequĂȘncias de imagens sintĂ©ticas e tambĂ©m reais. Os resultados experimentais obtidos sĂŁo apresentados, analisados e discutidos. Embora esta metodologia pudesse ser aplicada directamente ao seguimento de mĂșltiplos pontos caracterĂ­sticos, foi desenvolvida uma extensĂŁo desta metodologia para esse fim. Nesta nova metodologia a sequĂȘncia de processamento de cada ponto caracterĂ­stico Ă© dinĂąmica e hierĂĄrquica. DinĂąmica por ser variĂĄvel ao longo do tempo e hierĂĄrquica por existir uma hierarquia de prioridades relativamente aos pontos caracterĂ­sticos a serem seguidos e que determina a ordem pela qual esses pontos sĂŁo processados. Desta forma, o processo dĂĄ preferĂȘncia a pontos caracterĂ­sticos cuja localização Ă© mais fĂĄcil de prever comparativamente a pontos caracterĂ­sticos cujo conhecimento do seu comportamento seja menos previsĂ­vel. Quando esse valor de prioridade se torna demasiado baixo, esse ponto caracterĂ­stico deixa de ser seguido pelo algoritmo. Para se observar o desempenho desta nova abordagem foram utilizadas sequĂȘncias de imagens onde vĂĄrias caracterĂ­sticas indicadas pelo utilizador sĂŁo seguidas. Na parte final deste trabalho sĂŁo apresentadas as conclusĂ”es resultantes a partir do desenvolvimento deste trabalho, bem como a definição de algumas linhas de investigação futura
    corecore