45 research outputs found
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Computer Science 2019 APR Self-Study & Documents
UNM Computer Science APR self-study report and review team report for Spring 2019, fulfilling requirements of the Higher Learning Commission
From Restoring Human Vision to Enhancing Computer Vision
The central theme of this work is enabling vision, which includes two subtopics: restoring vision for blind humans, and enhancing computer vision models in visual recognition. Chapter 1 first provides a gentle introduction to relevant high level principles of human visual computations and summarizes two fundamental questions that vision answers: "what" and "where." Chapters 2, 3, and 4 contain three published projects that are anchored by those two fundamental questions.
Chapter 2 introduces a cognitive assistant to restore visual function for blind humans by focusing on an interface powered by audio augmented reality. The assistant communicates the "what" and "where" aspects of visual scenes by a combination of natural language and spatialized sound. We experimentally demonstrated that the assistant enables many aspects of visual functions for naive blind users.
Chapters 3 and 4 develop data augmentation methods to address the data inefficiency problem in neural network based computer visual recognition models. In Chapter 3, a 3D-simulation based data augmentation method is developed for improving the generalization of visual classification models for rare classes. In Chapter 4, a fast and efficient data augmentation method is developed for the newly formulated panoptic segmentation task. The method improves performance of state-of-the-art panoptic segmentation models and generalizes across dataset domains, sizes, model architectures, and backbones.</p
On the use of smartphones as novel photogrammetric water gauging instruments: Developing tools for crowdsourcing water levels
The term global climate change is omnipresent since the beginning of the last decade. Changes in the global climate are associated with an increase in heavy rainfalls that can cause nearly unpredictable flash floods. Consequently, spatio-temporally high-resolution monitoring of rivers becomes increasingly important.
Water gauging stations continuously and precisely measure water levels. However, they are rather expensive in purchase and maintenance and are preferably installed at water bodies relevant for water management. Small-scale catchments remain often ungauged. In order to increase the data density of hydrometric monitoring networks and thus to improve the prediction quality of flood events, new, flexible and cost-effective water level measurement technologies are required. They should be oriented towards the accuracy requirements of conventional measurement systems and facilitate the observation of water levels at virtually any time, even at the smallest rivers.
A possible solution is the development of a photogrammetric smartphone application (app) for crowdsourcing water levels, which merely requires voluntary users to take pictures of a river section to determine the water level. Todayâs smartphones integrate high-resolution cameras, a variety of sensors, powerful processors, and mass storage. However, they are designed for the mass market and use low-cost hardware that cannot comply with the quality of geodetic measurement technology.
In order to investigate the potential for mobile measurement applications, research was conducted on the smartphone as a photogrammetric measurement instrument as part of the doctoral project. The studies deal with the geometric stability of smartphone cameras regarding device-internal temperature changes and with the accuracy potential of rotation parameters measured with smartphone sensors.
The results show a high, temperature-related variability of the interior orientation parameters, which is why the calibration of the camera should be carried out during the immediate measurement. The results of the sensor investigations show considerable inaccuracies when measuring rotation parameters, especially the compass angle (errors up to 90° were observed). The same applies to position parameters measured by global navigation satellite system (GNSS) receivers built into smartphones. According to the literature, positional accuracies of about 5 m are possible in best conditions. Otherwise, errors of several 10 m are to be expected. As a result, direct georeferencing of image measurements using current smartphone technology should be discouraged.
In consideration of the results, the water gauging app Open Water Levels (OWL) was developed, whose methodological development and implementation constituted the core of the thesis project. OWL enables the flexible measurement of water levels via crowdsourcing without requiring additional equipment or being limited to specific river sections. Data acquisition and processing take place directly in the field, so that the water level information is immediately available.
In practice, the user captures a short time-lapse sequence of a river bank with OWL, which is used to calculate a spatio-temporal texture that enables the detection of the water line. In order to translate the image measurement into 3D object space, a synthetic, photo-realistic image of the situation is created from existing 3D data of the river section to be investigated. Necessary approximations of the image orientation parameters are measured by smartphone sensors and GNSS. The assignment of camera image and synthetic image allows for the determination of the interior and exterior orientation parameters by means of space resection and finally the transfer of the image-measured 2D water line into the 3D object space to derive the prevalent water level in the reference system of the 3D data.
In comparison with conventionally measured water levels, OWL reveals an accuracy potential of 2 cm on average, provided that synthetic image and camera image exhibit consistent image contents and that the water line can be reliably detected. In the present dissertation, related geometric and radiometric problems are comprehensively discussed. Furthermore, possible solutions, based on advancing developments in smartphone technology and image processing as well as the increasing availability of 3D reference data, are presented in the synthesis of the work.
The app Open Water Levels, which is currently available as a beta version and has been tested on selected devices, provides a basis, which, with continuous further development, aims to achieve a final release for crowdsourcing water levels towards the establishment of new and the expansion of existing monitoring networks.Der Begriff des globalen Klimawandels ist seit Beginn des letzten Jahrzehnts allgegenwĂ€rtig. Die VerĂ€nderung des Weltklimas ist mit einer Zunahme von Starkregenereignissen verbunden, die nahezu unvorhersehbare Sturzfluten verursachen können. Folglich gewinnt die raumzeitlich hochaufgelöste Ăberwachung von FlieĂgewĂ€ssern zunehmend an Bedeutung.
Pegelmessstationen erfassen kontinuierlich und prĂ€zise WasserstĂ€nde, sind jedoch in Anschaffung und Wartung sehr teuer und werden vorzugsweise an wasserwirtschaftlich-relevanten GewĂ€ssern installiert. Kleinere GewĂ€sser bleiben hĂ€ufig unbeobachtet. Um die Datendichte hydrometrischer Messnetze zu erhöhen und somit die VorhersagequalitĂ€t von Hochwasserereignissen zu verbessern, sind neue, kostengĂŒnstige und flexibel einsetzbare Wasserstandsmesstechnologien erforderlich. Diese sollten sich an den Genauigkeitsanforderungen konventioneller Messsysteme orientieren und die Beobachtung von WasserstĂ€nden zu praktisch jedem Zeitpunkt, selbst an den kleinsten FlĂŒssen, ermöglichen.
Ein Lösungsvorschlag ist die Entwicklung einer photogrammetrischen Smartphone-Anwendung (App) zum Crowdsourcing von WasserstĂ€nden mit welcher freiwillige Nutzer lediglich Bilder eines Flussabschnitts aufnehmen mĂŒssen, um daraus den Wasserstand zu bestimmen. Heutige Smartphones integrieren hochauflösende Kameras, eine Vielzahl von Sensoren, leistungsfĂ€hige Prozessoren und Massenspeicher. Sie sind jedoch fĂŒr den Massenmarkt konzipiert und verwenden kostengĂŒnstige Hardware, die nicht der QualitĂ€t geodĂ€tischer Messtechnik entsprechen kann.
Um das Einsatzpotential in mobilen Messanwendungen zu eruieren, sind Untersuchungen zum Smartphone als photogrammetrisches Messinstrument im Rahmen des Promotionsprojekts durchgefĂŒhrt worden. Die Studien befassen sich mit der geometrischen StabilitĂ€t von Smartphone-Kameras bezĂŒglich gerĂ€teinterner TemperaturĂ€nderungen und mit dem Genauigkeitspotential von mit Smartphone-Sensoren gemessenen Rotationsparametern.
Die Ergebnisse zeigen eine starke, temperaturbedingte VariabilitĂ€t der inneren Orientierungsparameter, weshalb die Kalibrierung der Kamera zum unmittelbaren Messzeitpunkt erfolgen sollte. Die Ergebnisse der Sensoruntersuchungen zeigen groĂe Ungenauigkeiten bei der Messung der Rotationsparameter, insbesondere des Kompasswinkels (Fehler von bis zu 90° festgestellt). Selbiges gilt auch fĂŒr Positionsparameter, gemessen durch in Smartphones eingebaute EmpfĂ€nger fĂŒr Signale globaler Navigationssatellitensysteme (GNSS). Wie aus der Literatur zu entnehmen ist, lassen sich unter besten Bedingungen Lagegenauigkeiten von etwa 5 m erreichen. Abseits davon sind Fehler von mehreren 10 m zu erwarten. Infolgedessen ist von einer direkten Georeferenzierung von Bildmessungen mittels aktueller Smartphone-Technologie abzusehen.
Unter BerĂŒcksichtigung der gewonnenen Erkenntnisse wurde die Pegel-App Open Water Levels (OWL) entwickelt, deren methodische Entwicklung und Implementierung den Kern der Arbeit bildete. OWL ermöglicht die flexible Messung von WasserstĂ€nden via Crowdsourcing, ohne dabei zusĂ€tzliche AusrĂŒstung zu verlangen oder auf spezifische Flussabschnitte beschrĂ€nkt zu sein. Datenaufnahme und Verarbeitung erfolgen direkt im Feld, so dass die Pegelinformationen sofort verfĂŒgbar sind.
Praktisch nimmt der Anwender mit OWL eine kurze Zeitraffersequenz eines Flussufers auf, die zur Berechnung einer Raum-Zeit-Textur dient und die Erkennung der Wasserlinie ermöglicht. Zur Ăbersetzung der Bildmessung in den 3D-Objektraum wird aus vorhandenen 3D-Daten des zu untersuchenden Flussabschnittes ein synthetisches, photorealistisches Abbild der Aufnahmesituation erstellt. Erforderliche NĂ€herungen der Bildorientierungsparameter werden von Smartphone-Sensoren und GNSS gemessen. Die Zuordnung von Kamerabild und synthetischem Bild erlaubt die Bestimmung der inneren und Ă€uĂeren Orientierungsparameter mittels rĂ€umlichen RĂŒckwĂ€rtsschnitt. Nach Rekonstruktion der Aufnahmesituation lĂ€sst sich die im Bild gemessene 2D-Wasserlinie in den 3D-Objektraum projizieren und der vorherrschende Wasserstand im Referenzsystem der 3D-Daten ableiten.
Im Soll-Ist-Vergleich mit konventionell gemessenen Pegeldaten zeigt OWL ein erreichbares Genauigkeitspotential von durchschnittlich 2 cm, insofern synthetisches und reales Kamerabild einen möglichst konsistenten Bildinhalt aufweisen und die Wasserlinie zuverlĂ€ssig detektiert werden kann. In der vorliegenden Dissertation werden damit verbundene geometrische und radiometrische Probleme ausfĂŒhrlich diskutiert sowie LösungsansĂ€tze, auf der Basis fortschreitender Entwicklungen von Smartphone-Technologie und Bildverarbeitung sowie der zunehmenden VerfĂŒgbarkeit von 3D-Referenzdaten, in der Synthese der Arbeit vorgestellt.
Mit der gegenwĂ€rtig als Betaversion vorliegenden und auf ausgewĂ€hlten GerĂ€ten getesteten App Open Water Levels wurde eine Basis geschaffen, die mit kontinuierlicher Weiterentwicklung eine finale Freigabe fĂŒr das Crowdsourcing von WasserstĂ€nden und damit den Aufbau neuer und die Erweiterung bestehender Monitoring-Netzwerke anstrebt
Proceedings of the OSM Science 2023
Proceedings of the OSM Science at State of the Map Europe 202
Efficient Dense Registration, Segmentation, and Modeling Methods for RGB-D Environment Perception
One perspective for artificial intelligence research is to build machines that perform tasks autonomously in our complex everyday environments. This setting poses challenges to the development of perception skills: A robot should be able to perceive its location and objects in its surrounding, while the objects and the robot itself could also be moving. Objects may not only be composed of rigid parts, but could be non-rigidly deformable or appear in a variety of similar shapes. Furthermore, it could be relevant to the task to observe object semantics. For a robot acting fluently and immediately, these perception challenges demand efficient methods. This theses presents novel approaches to robot perception with RGB-D sensors. It develops efficient registration, segmentation, and modeling methods for scene and object perception. We propose multi-resolution surfel maps as a concise representation for RGB-D measurements. We develop probabilistic registration methods that handle rigid scenes, scenes with multiple rigid parts that move differently, and scenes that undergo non-rigid deformations. We use these methods to learn and perceive 3D models of scenes and objects in both static and dynamic environments. For learning models of static scenes, we propose a real-time capable simultaneous localization and mapping approach. It aligns key views in RGB-D video using our rigid registration method and optimizes the pose graph of the key views. The acquired models are then perceived in live images through detection and tracking within a Bayesian filtering framework. An assumption frequently made for environment mapping is that the observed scene remains static during the mapping process. Through rigid multi-body registration, we take advantage of releasing this assumption: Our registration method segments views into parts that move independently between the views and simultaneously estimates their motion. Within simultaneous motion segmentation, localization, and mapping, we separate scenes into objects by their motion. Our approach acquires 3D models of objects and concurrently infers hierarchical part relations between them using probabilistic reasoning. It can be applied for interactive learning of objects and their part decomposition. Endowing robots with manipulation skills for a large variety of objects is a tedious endeavor if the skill is programmed for every instance of an object class. Furthermore, slight deformations of an instance could not be handled by an inflexible program. Deformable registration is useful to perceive such shape variations, e.g., between specific instances of a tool. We develop an efficient deformable registration method and apply it for the transfer of robot manipulation skills between varying object instances. On the object-class level, we segment images using random decision forest classifiers in real-time. The probabilistic labelings of individual images are fused in 3D semantic maps within a Bayesian framework. We combine our object-class segmentation method with simultaneous localization and mapping to achieve online semantic mapping in real-time. The methods developed in this thesis are evaluated in experiments on publicly available benchmark datasets and novel own datasets. We publicly demonstrate several of our perception approaches within integrated robot systems in the mobile manipulation context.Effiziente Dichte Registrierungs-, Segmentierungs- und Modellierungsmethoden fĂŒr die RGB-D Umgebungswahrnehmung In dieser Arbeit beschĂ€ftigen wir uns mit Herausforderungen der visuellen Wahrnehmung fĂŒr intelligente Roboter in Alltagsumgebungen. Solche Roboter sollen sich selbst in ihrer Umgebung zurechtfinden, und Wissen ĂŒber den Verbleib von Objekten erwerben können. Die Schwierigkeit dieser Aufgaben erhöht sich in dynamischen Umgebungen, in denen ein Roboter die Bewegung einzelner Teile differenzieren und auch wahrnehmen muss, wie sich diese Teile bewegen. Bewegt sich ein Roboter selbstĂ€ndig in dieser Umgebung, muss er auch seine eigene Bewegung von der VerĂ€nderung der Umgebung unterscheiden. Szenen können sich aber nicht nur durch die Bewegung starrer Teile verĂ€ndern. Auch die Teile selbst können ihre Form in nicht-rigider Weise Ă€ndern. Eine weitere Herausforderung stellt die semantische Interpretation von Szenengeometrie und -aussehen dar. Damit intelligente Roboter unmittelbar und flĂŒssig handeln können, sind effiziente Algorithmen fĂŒr diese Wahrnehmungsprobleme erforderlich. Im ersten Teil dieser Arbeit entwickeln wir effiziente Methoden zur ReprĂ€sentation und Registrierung von RGB-D Messungen. ZunĂ€chst stellen wir Multi-Resolutions-OberflĂ€chenelement-Karten (engl. multi-resolution surfel maps, MRSMaps) als eine kompakte ReprĂ€sentation von RGB-D Messungen vor, die unseren effizienten Registrierungsmethoden zugrunde liegt. Bilder können effizient in dieser ReprĂ€sentation aggregiert werde, wobei auch mehrere Bilder aus verschiedenen Blickpunkten integriert werden können, um Modelle von Szenen und Objekte aus vielfĂ€ltigen Ansichten darzustellen. FĂŒr die effiziente, robuste und genaue Registrierung von MRSMaps wird eine Methode vorgestellt, die Rigidheit der betrachteten Szene voraussetzt. Die Registrierung schĂ€tzt die Kamerabewegung zwischen den Bildern und gewinnt ihre Effizienz durch die Ausnutzung der kompakten multi-resolutionalen Darstellung der Karten. Die Registrierungsmethode erzielt hohe Bildverarbeitungsraten auf einer CPU. Wir demonstrieren hohe Effizienz, Genauigkeit und Robustheit unserer Methode im Vergleich zum bisherigen Stand der Forschung auf VergleichsdatensĂ€tzen. In einem weiteren Registrierungsansatz lösen wir uns von der Annahme, dass die betrachtete Szene zwischen Bildern statisch ist. Wir erlauben nun, dass sich rigide Teile der Szene bewegen dĂŒrfen, und erweitern unser rigides Registrierungsverfahren auf diesen Fall. Unser Ansatz segmentiert das Bild in Bereiche einzelner Teile, die sich unterschiedlich zwischen Bildern bewegen. Wir demonstrieren hohe Segmentierungsgenauigkeit und Genauigkeit in der BewegungsschĂ€tzung unter Echtzeitbedingungen fĂŒr die Verarbeitung. SchlieĂlich entwickeln wir ein Verfahren fĂŒr die Wahrnehmung von nicht-rigiden Deformationen zwischen zwei MRSMaps. Auch hier nutzen wir die multi-resolutionale Struktur in den Karten fĂŒr ein effizientes Registrieren von grob zu fein. Wir schlagen Methoden vor, um aus den geschĂ€tzten Deformationen die lokale Bewegung zwischen den Bildern zu berechnen. Wir evaluieren Genauigkeit und Effizienz des Registrierungsverfahrens. Der zweite Teil dieser Arbeit widmet sich der Verwendung unserer KartenreprĂ€sentation und Registrierungsmethoden fĂŒr die Wahrnehmung von Szenen und Objekten. Wir verwenden MRSMaps und unsere rigide Registrierungsmethode, um dichte 3D Modelle von Szenen und Objekten zu lernen. Die rĂ€umlichen Beziehungen zwischen SchlĂŒsselansichten, die wir durch Registrierung schĂ€tzen, werden in einem Simultanen Lokalisierungs- und Kartierungsverfahren (engl. simultaneous localization and mapping, SLAM) gegeneinander abgewogen, um die Blickposen der SchlĂŒsselansichten zu schĂ€tzen. FĂŒr das Verfolgen der Kamerapose bezĂŒglich der Modelle in Echtzeit, kombinieren wir die Genauigkeit unserer Registrierung mit der Robustheit von Partikelfiltern. Zu Beginn der Posenverfolgung, oder wenn das Objekt aufgrund von Verdeckungen oder extremen Bewegungen nicht weiter verfolgt werden konnte, initialisieren wir das Filter durch Objektdetektion. AnschlieĂend wenden wir unsere erweiterten Registrierungsverfahren fĂŒr die Wahrnehmung in nicht-rigiden Szenen und fĂŒr die Ăbertragung von ObjekthandhabungsfĂ€higkeiten von Robotern an. Wir erweitern unseren rigiden Kartierungsansatz auf dynamische Szenen, in denen sich rigide Teile bewegen. Die Bewegungssegmente in SchlĂŒsselansichten werden zueinander in Bezug gesetzt, um Ăquivalenz- und Teilebeziehungen von Objekten probabilistisch zu inferieren, denen die Segmente entsprechen. Auch hier liefert unsere Registrierungsmethode die Bewegung der Kamera bezĂŒglich der Objekte, die wir in einem SLAM Verfahren optimieren. Aus diesen Blickposen wiederum können wir die Bewegungssegmente in dichten Objektmodellen vereinen. Objekte einer Klasse teilen oft eine gemeinsame Topologie von funktionalen Elementen, die durch Formkorrespondenzen ermittelt werden kann. Wir verwenden unsere deformierbare Registrierung, um solche Korrespondenzen zu finden und die Handhabung eines Objektes durch einen Roboter auf neue Objektinstanzen derselben Klasse zu ĂŒbertragen. SchlieĂlich entwickeln wir einen echtzeitfĂ€higen Ansatz, der Kategorien von Objekten in RGB-D Bildern erkennt und segmentiert. Die Segmentierung basiert auf Ensemblen randomisierter EntscheidungsbĂ€ume, die Geometrie- und Texturmerkmale zur Klassifikation verwenden. Wir fusionieren Segmentierungen von Einzelbildern einer Szene aus mehreren Ansichten in einer semantischen Objektklassenkarte mit Hilfe unseres SLAM-Verfahrens. Die vorgestellten Methoden werden auf öffentlich verfĂŒgbaren VergleichsdatensĂ€tzen und eigenen DatensĂ€tzen evaluiert. Einige unserer AnsĂ€tze wurden auch in integrierten Robotersystemen fĂŒr mobile Objekthantierungsaufgaben öffentlich demonstriert. Sie waren ein wichtiger Bestandteil fĂŒr das Gewinnen der RoboCup-Roboterwettbewerbe in der RoboCup@Home Liga in den Jahren 2011, 2012 und 2013
Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment
abstract: Parents fulfill a pivotal role in early childhood development of social and communication
skills. In children with autism, the development of these skills can be delayed. Applied
behavioral analysis (ABA) techniques have been created to aid in skill acquisition.
Among these, pivotal response treatment (PRT) has been empirically shown to foster
improvements. Research into PRT implementation has also shown that parents can be
trained to be effective interventionists for their children. The current difficulty in PRT
training is how to disseminate training to parents who need it, and how to support and
motivate practitioners after training.
Evaluation of the parentsâ fidelity to implementation is often undertaken using video
probes that depict the dyadic interaction occurring between the parent and the child during
PRT sessions. These videos are time consuming for clinicians to process, and often result
in only minimal feedback for the parents. Current trends in technology could be utilized to
alleviate the manual cost of extracting data from the videos, affording greater
opportunities for providing clinician created feedback as well as automated assessments.
The naturalistic context of the video probes along with the dependence on ubiquitous
recording devices creates a difficult scenario for classification tasks. The domain of the
PRT video probes can be expected to have high levels of both aleatory and epistemic
uncertainty. Addressing these challenges requires examination of the multimodal data
along with implementation and evaluation of classification algorithms. This is explored
through the use of a new dataset of PRT videos.
The relationship between the parent and the clinician is important. The clinician can
provide support and help build self-efficacy in addition to providing knowledge and
modeling of treatment procedures. Facilitating this relationship along with automated
feedback not only provides the opportunity to present expert feedback to the parent, but
also allows the clinician to aid in personalizing the classification models. By utilizing a
human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the
classification models by providing additional labeled samples. This will allow the system
to improve classification and provides a person-centered approach to extracting
multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Fuzzy logic based approach for object feature tracking
This thesis introduces a novel technique for feature tracking in sequences of
greyscale images based on fuzzy logic. A versatile and modular methodology
for feature tracking using fuzzy sets and inference engines is presented.
Moreover, an extension of this methodology to perform the correct tracking
of multiple features is also presented.
To perform feature tracking three membership functions are initially
defined. A membership function related to the distinctive property of the feature
to be tracked. A membership function is related to the fact of considering
that the feature has smooth movement between each image sequence and a
membership function concerns its expected future location. Applying these
functions to the image pixels, the corresponding fuzzy sets are obtained and
then mathematically manipulated to serve as input to an inference engine.
Situations such as occlusion or detection failure of features are overcome
using estimated positions calculated using a motion model and a state vector
of the feature.
This methodology was previously applied to track a single feature identified
by the user. Several performance tests were conducted on sequences of
both synthetic and real images. Experimental results are presented, analysed
and discussed. Although this methodology could be applied directly to multiple
feature tracking, an extension of this methodology has been developed
within that purpose. In this new method, the processing sequence of each
feature is dynamic and hierarchical. Dynamic because this sequence can
change over time and hierarchical because features with higher priority will
be processed first. Thus, the process gives preference to features whose location
are easier to predict compared with features whose knowledge of their
behavior is less predictable. When this priority value becomes too low, the
feature will no longer tracked by the algorithm. To access the performance
of this new approach, sequences of images where several features specified
by the user are to be tracked were used.
In the final part of this work, conclusions drawn from this work as well as
the definition of some guidelines for future research are presented.Nesta tese Ă© introduzida uma nova tĂ©cnica de seguimento de pontos caracterĂsticos de objectos em sequĂȘncias de imagens em escala de cinzentos baseada em lĂłgica difusa. Ă apresentada uma metodologia versĂĄtil e modular para o seguimento de objectos utilizando conjuntos difusos e motores de inferĂȘncia. Ă tambĂ©m apresentada uma extensĂŁo desta metodologia para o correcto seguimento de mĂșltiplos pontos caracterĂsticos.
Para se realizar o seguimento sĂŁo definidas inicialmente trĂȘs funçÔes de pertença. Uma função de pertença estĂĄ relacionada com a propriedade distintiva do objecto que desejamos seguir, outra estĂĄ relacionada com o facto de se considerar que o objecto tem uma movimentação suave entre cada imagem da sequĂȘncia e outra função de pertença referente Ă sua previsĂvel localização futura. Aplicando estas funçÔes de pertença aos pĂxeis da imagem, obtĂȘm-se os correspondentes conjuntos difusos, que serĂŁo manipulados matematicamente e servirĂŁo como entrada num motor de inferĂȘncia. SituaçÔes como a oclusĂŁo ou falha na detecção dos pontos caracterĂsticos sĂŁo ultrapassadas utilizando posiçÔes estimadas calculadas a partir do modelo de movimento e a um vector de estados do objecto.
Esta metodologia foi inicialmente aplicada no seguimento de um objecto assinalado pelo utilizador. Foram realizados vĂĄrios testes de desempenho em sequĂȘncias de imagens sintĂ©ticas e tambĂ©m reais. Os resultados experimentais obtidos sĂŁo apresentados, analisados e discutidos. Embora esta metodologia pudesse ser aplicada directamente ao seguimento de mĂșltiplos pontos caracterĂsticos, foi desenvolvida uma extensĂŁo desta metodologia para esse fim. Nesta nova metodologia a sequĂȘncia de processamento de cada ponto caracterĂstico Ă© dinĂąmica e hierĂĄrquica. DinĂąmica por ser variĂĄvel ao longo do tempo e hierĂĄrquica por existir uma hierarquia de prioridades relativamente aos pontos caracterĂsticos a serem seguidos e que determina a ordem pela qual esses pontos sĂŁo processados. Desta forma, o processo dĂĄ preferĂȘncia a pontos caracterĂsticos cuja localização Ă© mais fĂĄcil de prever comparativamente a pontos caracterĂsticos cujo conhecimento do seu comportamento seja menos previsĂvel. Quando esse valor de prioridade se torna demasiado baixo, esse ponto caracterĂstico deixa de ser seguido pelo algoritmo. Para se observar o desempenho desta nova abordagem foram utilizadas sequĂȘncias de imagens onde vĂĄrias caracterĂsticas indicadas pelo utilizador sĂŁo seguidas.
Na parte final deste trabalho são apresentadas as conclusÔes resultantes a partir do desenvolvimento deste trabalho, bem como a definição de algumas linhas de investigação futura