5 research outputs found

    When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking

    Get PDF
    The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific literature as gaze tracking. This has become a very hot topic in the field of computer vision during the last decades, with a surprising and continuously growing number of application fields. A very long journey has been made from the first pioneering works, and this continuous search for more accurate solutions process has been further boosted in the last decade when deep neural networks have revolutionized the whole machine learning area, and gaze tracking as well. In this arena, it is being increasingly useful to find guidance through survey/review articles collecting most relevant works and putting clear pros and cons of existing techniques, also by introducing a precise taxonomy. This kind of manuscripts allows researchers and technicians to choose the better way to move towards their application or scientific goals. In the literature, there exist holistic and specifically technological survey documents (even if not updated), but, unfortunately, there is not an overview discussing how the great advancements in computer vision have impacted gaze tracking. Thus, this work represents an attempt to fill this gap, also introducing a wider point of view that brings to a new taxonomy (extending the consolidated ones) by considering gaze tracking as a more exhaustive task that aims at estimating gaze target from different perspectives: from the eye of the beholder (first-person view), from an external camera framing the beholder’s, from a third-person view looking at the scene where the beholder is placed in, and from an external view independent from the beholder

    Review of remote sensing for land administration: Origins, debates, and selected cases

    Get PDF
    Conventionally, land administration—incorporating cadastres and land registration—uses ground-based survey methods. This approach can be traced over millennia. The application of photogrammetry and remote sensing is understood to be far more contemporary, only commencing deeper into the 20th century. This paper seeks to counter this view, contending that these methods are far from recent additions to land administration: successful application dates back much earlier, often complementing ground-based methods. Using now more accessible historical works, made available through archive digitisation, this paper presents an enriched and more complete synthesis of the developments of photogrammetric methods and remote sensing applied to the domain of land administration. Developments from early phototopography and aerial surveys, through to analytical photogrammetric methods, the emergence of satellite remote sensing, digital cameras, and latterly lidar surveys, UAVs, and feature extraction are covered. The synthesis illustrates how debates over the benefits of the technique are hardly new. Neither are well-meaning, although oft-flawed, comparative analyses on criteria relating to time, cost, coverage, and quality. Apart from providing this more holistic view and a timely reminder of previous work, this paper brings contemporary practical value in further demonstrating to land administration practitioners that remote sensing for data capture, and subsequent map production, are an entirely legitimate, if not essential, part of the domain. Contemporary arguments that the tools and approaches do not bring adequate accuracy for land administration purposes are easily countered by the weight of evidence. Indeed, these arguments may be considered to undermine the pragmatism inherent to the surveying discipline, traditionally an essential characteristic of the profession. That said, it is left to land administration practitioners to determine the relevance of these methods for any specific country context. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    Realistic reconstruction and rendering of detailed 3D scenarios from multiple data sources

    Get PDF
    During the last years, we have witnessed significant improvements in digital terrain modeling, mainly through photogrammetric techniques based on satellite and aerial photography, as well as laser scanning. These techniques allow the creation of Digital Elevation Models (DEM) and Digital Surface Models (DSM) that can be streamed over the network and explored through virtual globe applications like Google Earth or NASA WorldWind. The resolution of these 3D scenes has improved noticeably in the last years, reaching in some urban areas resolutions up to 1m or less for DEM and buildings, and less than 10 cm per pixel in the associated aerial imagery. However, in rural, forest or mountainous areas, the typical resolution for elevation datasets ranges between 5 and 30 meters, and typical resolution of corresponding aerial photographs ranges between 25 cm to 1 m. This current level of detail is only sufficient for aerial points of view, but as the viewpoint approaches the surface the terrain loses its realistic appearance. One approach to augment the detail on top of currently available datasets is adding synthetic details in a plausible manner, i.e. including elements that match the features perceived in the aerial view. By combining the real dataset with the instancing of models on the terrain and other procedural detail techniques, the effective resolution can potentially become arbitrary. There are several applications that do not need an exact reproduction of the real elements but would greatly benefit from plausibly enhanced terrain models: videogames and entertainment applications, visual impact assessment (e.g. how a new ski resort would look), virtual tourism, simulations, etc. In this thesis we propose new methods and tools to help the reconstruction and synthesis of high-resolution terrain scenes from currently available data sources, in order to achieve realistically looking ground-level views. In particular, we decided to focus on rural scenarios, mountains and forest areas. Our main goal is the combination of plausible synthetic elements and procedural detail with publicly available real data to create detailed 3D scenes from existing locations. Our research has focused on the following contributions: - An efficient pipeline for aerial imagery segmentation - Plausible terrain enhancement from high-resolution examples - Super-resolution of DEM by transferring details from the aerial photograph - Synthesis of arbitrary tree picture variations from a reduced set of photographs - Reconstruction of 3D tree models from a single image - A compact and efficient tree representation for real-time rendering of forest landscapesDurant els darrers anys, hem presenciat avenços significatius en el modelat digital de terrenys, principalment gràcies a tècniques fotogramètriques, basades en fotografia aèria o satèl·lit, i a escàners làser. Aquestes tècniques permeten crear Models Digitals d'Elevacions (DEM) i Models Digitals de Superfícies (DSM) que es poden retransmetre per la xarxa i ser explorats mitjançant aplicacions de globus virtuals com ara Google Earth o NASA WorldWind. La resolució d'aquestes escenes 3D ha millorat considerablement durant els darrers anys, arribant a algunes àrees urbanes a resolucions d'un metre o menys per al DEM i edificis, i fins a menys de 10 cm per píxel a les fotografies aèries associades. No obstant, en entorns rurals, boscos i zones muntanyoses, la resolució típica per a dades d'elevació es troba entre 5 i 30 metres, i per a les corresponents fotografies aèries varia entre 25 cm i 1m. Aquest nivell de detall només és suficient per a punts de vista aeris, però a mesura que ens apropem a la superfície el terreny perd tot el realisme. Una manera d'augmentar el detall dels conjunts de dades actuals és afegint a l'escena detalls sintètics de manera plausible, és a dir, incloure elements que encaixin amb les característiques que es perceben a la vista aèria. Així, combinant les dades reals amb instàncies de models sobre el terreny i altres tècniques de detall procedural, la resolució efectiva del model pot arribar a ser arbitrària. Hi ha diverses aplicacions per a les quals no cal una reproducció exacta dels elements reals, però que es beneficiarien de models de terreny augmentats de manera plausible: videojocs i aplicacions d'entreteniment, avaluació de l'impacte visual (per exemple, com es veuria una nova estació d'esquí), turisme virtual, simulacions, etc. En aquesta tesi, proposem nous mètodes i eines per ajudar a la reconstrucció i síntesi de terrenys en alta resolució partint de conjunts de dades disponibles públicament, per tal d'aconseguir vistes a nivell de terra realistes. En particular, hem decidit centrar-nos en escenes rurals, muntanyes i àrees boscoses. El nostre principal objectiu és la combinació d'elements sintètics plausibles i detall procedural amb dades reals disponibles públicament per tal de generar escenes 3D d'ubicacions existents. La nostra recerca s'ha centrat en les següents contribucions: - Un pipeline eficient per a segmentació d'imatges aèries - Millora plausible de models de terreny a partir d'exemples d’alta resolució - Super-resolució de models d'elevacions transferint-hi detalls de la fotografia aèria - Síntesis d'un nombre arbitrari de variacions d’imatges d’arbres a partir d'un conjunt reduït de fotografies - Reconstrucció de models 3D d'arbres a partir d'una única fotografia - Una representació compacta i eficient d'arbres per a navegació en temps real d'escenesPostprint (published version

    Binokulare Eigenbewegungsschätzung für Fahrerassistenzanwendungen

    Get PDF
    Driving can be dangerous. Humans become inattentive when performing a monotonous task like driving. Also the risk implied while multi-tasking, like using the cellular phone while driving, can break the concentration of the driver and increase the risk of accidents. Others factors like exhaustion, nervousness and excitement affect the performance of the driver and the response time. Consequently, car manufacturers have developed systems in the last decades which assist the driver under various circumstances. These systems are called driver assistance systems. Driver assistance systems are meant to support the task of driving, and the field of action varies from alerting the driver, with acoustical or optical warnings, to taking control of the car, such as keeping the vehicle in the traffic lane until the driver resumes control. For such a purpose, the vehicle is equipped with on-board sensors which allow the perception of the environment and/or the state of the vehicle. Cameras are sensors which extract useful information about the visual appearance of the environment. Additionally, a binocular system allows the extraction of 3D information. One of the main requirements for most camera-based driver assistance systems is the accurate knowledge of the motion of the vehicle. Some sources of information, like velocimeters and GPS, are of common use in vehicles today. Nevertheless, the resolution and accuracy usually achieved with these systems are not enough for many real-time applications. The computation of ego-motion from sequences of stereo images for the implementation of driving intelligent systems, like autonomous navigation or collision avoidance, constitutes the core of this thesis. This dissertation proposes a framework for the simultaneous computation of the 6 degrees of freedom of ego-motion (rotation and translation in 3D Euclidean space), the estimation of the scene structure and the detection and estimation of independently moving objects. The input is exclusively provided by a binocular system and the framework does not call for any data acquisition strategy, i.e. the stereo images are just processed as they are provided. Stereo allows one to establish correspondences between left and right images, estimating 3D points of the environment via triangulation. Likewise, feature tracking establishes correspondences between the images acquired at different time instances. When both are used together for a large number of points, the result is a set of clouds of 3D points with point-to-point correspondences between clouds. The apparent motion of the 3D points between consecutive frames is caused by a variety of reasons. The most dominant motion for most of the points in the clouds is caused by the ego-motion of the vehicle; as the vehicle moves and images are acquired, the relative position of the world points with respect to the vehicle changes. Motion is also caused by objects moving in the environment. They move independently of the vehicle motion, so the observed motion for these points is the sum of the ego-vehicle motion and the independent motion of the object. A third reason, and of paramount importance in vision applications, is caused by correspondence problems, i.e. the incorrect spatial or temporal assignment of the point-to-point correspondence. Furthermore, all the points in the clouds are actually noisy measurements of the real unknown 3D points of the environment. Solving ego-motion and scene structure from the clouds of points requires some previous analysis of the noise involved in the imaging process, and how it propagates as the data is processed. Therefore, this dissertation analyzes the noise properties of the 3D points obtained through stereo triangulation. This leads to the detection of a bias in the estimation of 3D position, which is corrected with a reformulation of the projection equation. Ego-motion is obtained by finding the rotation and translation between the two clouds of points. This problem is known as absolute orientation, and many solutions based on least squares have been proposed in the literature. This thesis reviews the available closed form solutions to the problem. The proposed framework is divided in three main blocks: 1) stereo and feature tracking computation, 2) ego-motion estimation and 3) estimation of 3D point position and 3D velocity. The first block solves the correspondence problem providing the clouds of points as output. No special implementation of this block is required in this thesis. The ego-motion block computes the motion of the cameras by finding the absolute orientation between the clouds of static points in the environment. Since the cloud of points might contain independently moving objects and outliers generated by false correspondences, the direct computation of the least squares might lead to an erroneous solution. The first contribution of this thesis is an effective rejection rule that detects outliers based on the distance between predicted and measured quantities, and reduces the effects of noisy measurement by assigning appropriate weights to the data. This method is called Smoothness Motion Constraint (SMC). The ego-motion of the camera between two frames is obtained finding the absolute orientation between consecutive clouds of weighted 3D points. The complete ego-motion since initialization is achieved concatenating the individual motion estimates. This leads to a super-linear propagation of the error, since noise is integrated. A second contribution of this dissertation is a predictor/corrector iterative method, which integrates the clouds of 3D points of multiple time instances for the computation of ego-motion. The presented method considerably reduces the accumulation of errors in the estimated ego-position of the camera. Another contribution of this dissertation is a method which recursively estimates the 3D world position of a point and its velocity; by fusing stereo, feature tracking and the estimated ego-motion in a Kalman Filter system. An improved estimation of point position is obtained this way, which is used in the subsequent system cycle resulting in an improved computation of ego-motion. The general contribution of this dissertation is a single framework for the real time computation of scene structure, independently moving objects and ego-motion for automotive applications.Autofahren kann gefährlich sein. Die Fahrleistung wird durch die physischen und psychischen Grenzen des Fahrers und durch externe Faktoren wie das Wetter beeinflusst. Fahrerassistenzsysteme erhöhen den Fahrkomfort und unterstützen den Fahrer, um die Anzahl an Unfällen zu verringern. Fahrerassistenzsysteme unterstützen den Fahrer durch Warnungen mit optischen oder akustischen Signalen bis hin zur Übernahme der Kontrolle über das Auto durch das System. Eine der Hauptvoraussetzungen für die meisten Fahrerassistenzsysteme ist die akkurate Kenntnis der Bewegung des eigenen Fahrzeugs. Heutzutage verfügt man über verschiedene Sensoren, um die Bewegung des Fahrzeugs zu messen, wie zum Beispiel GPS und Tachometer. Doch Auflösung und Genauigkeit dieser Systeme sind nicht ausreichend für viele Echtzeitanwendungen. Die Berechnung der Eigenbewegung aus Stereobildsequenzen für Fahrerassistenzsysteme, z.B. zur autonomen Navigation oder Kollisionsvermeidung, bildet den Kern dieser Arbeit. Diese Dissertation präsentiert ein System zur Echtzeitbewertung einer Szene, inklusive Detektion und Bewertung von unabhängig bewegten Objekten sowie der akkuraten Schätzung der sechs Freiheitsgrade der Eigenbewegung. Diese grundlegenden Bestandteile sind erforderlich, um viele intelligente Automobilanwendungen zu entwickeln, die den Fahrer in unterschiedlichen Verkehrssituationen unterstützen. Das System arbeitet ausschließlich mit einer Stereokameraplattform als Sensor. Um die Eigenbewegung und die Szenenstruktur zu berechnen wird eine Analyse des Rauschens und der Fehlerfortpflanzung im Bildaufbereitungsprozess benötigt. Deshalb werden in dieser Dissertation die Rauscheigenschaften der durch Stereotriangulation erhaltenen 3D-Punkte analysiert. Dies führt zu der Entdeckung eines systematischen Fehlers in der Schätzung der 3D-Position, der sich mit einer Neuformulierung der Projektionsgleichung korrigieren lässt. Die Simulationsergebnisse zeigen, dass eine bedeutende Verringerung des Fehlers in der geschätzten 3D-Punktposition möglich ist. Die Eigenbewegungsschätzung wird gewonnen, indem die Rotation und Translation zwischen Punktwolken geschätzt wird. Dieses Problem ist als „absolute Orientierung” bekannt und viele Lösungen auf Basis der Methode der kleinsten Quadrate sind in der Literatur vorgeschlagen worden. Diese Arbeit rezensiert die verfügbaren geschlossenen Lösungen zu dem Problem. Das vorgestellte System gliedert sich in drei wesentliche Bausteine: 1. Registrierung von Bildmerkmalen, 2. Eigenbewegungsschätzung und 3. iterative Schätzung von 3D-Position und 3D-Geschwindigkeit von Weltpunkten. Der erster Block erhält eine Folge rektifizierter Bilder als Eingabe und liefert daraus eine Liste von verfolgten Bildmerkmalen mit ihrer entsprechenden 3D-Position. Der Block „Eigenbewegungsschätzung” besteht aus vier Hauptschritten in einer Schleife: 1. Bewegungsvorhersage, 2. Anwendung der Glattheitsbedingung für die Bewegung (GBB), 3. absolute Orientierungsberechnung und 4. Bewegungsintegration. Die in dieser Dissertation vorgeschlagene GBB ist eine mächtige Bedingung für die Ablehnung von Ausreißern und für die Zuordnung von Gewichten zu den gemessenen 3D-Punkten. Simulationen werden mit gaußschem und slashschem Rauschen ausgeführt. Die Ergebnisse zeigen die Überlegenheit der GBB-Version über die Standardgewichtungsmethoden. Die Stabilität der Ergebnisse hinsichtlich Ausreißern wurde analysiert mit dem Resultat, dass der „break down point” größer als 50% ist. Wenn die vier Schritte iterativ ausgeführt, werden wird ein Prädiktor-Korrektor-Verfahren gewonnen.Wir nennen diese Schätzung Multi-frameschätzung im Gegensatz zur Zweiframeschätzung, die nur die aktuellen und vorherigen Bildpaare für die Berechnung der Eigenbewegung betrachtet. Die erste Iteration wird zwischen der aktuellen und vorherigen Wolke von Punkten durchgeführt. Jede weitere Iteration integriert eine zusätzliche Punktwolke eines vorherigen Zeitpunkts. Diese Methode reduziert die Fehlerakkumulation bei der Integration von mehreren Schätzungen in einer einzigen globalen Schätzung. Simulationsergebnisse zeigen, dass obwohl der Fehler noch superlinear im Laufe der Zeit zunimmt, die Größe des Fehlers um mehrere Größenordnungen reduziert wird. Der dritte Block besteht aus der iterativen Schätzung von 3D-Position und 3D-Geschwindigkeit von Weltpunkten. Hier wird eine Methode basierend auf einem Kalman Filter verwendet, das Stereo, Featuretracking und Eigenbewegungsdaten fusioniert. Messungen der Position eines Weltpunkts werden durch das Stereokamerasystem gewonnen. Die Differenzierung der Position des geschätzten Punkts erlaubt die zusätzliche Schätzung seiner Geschwindigkeit. Die Messungen werden durch das Messmodell gewonnen, das Stereo- und Bewegungsdaten fusioniert. Simulationsergebnisse validieren das Modell. Die Verringerung der Positionsunsicherheit im Laufe der Zeit wird mit einer Monte-Carlo Simulation erzielt. Experimentelle Ergebnisse werden mit langen Sequenzen von Bildern erzielt. Zusätzliche Tests, einschließlich einer 3D-Rekonstruktion einer Waldszene und der Berechnung der freien Kamerabewegung in einem Indoor-Szenario, wurden durchgeführt. Die Methode zeigt gute Ergebnisse in allen Fällen. Der Algorithmus liefert zudem akzeptable Ergebnisse bei der Schätzung der Lage kleiner Objekte, wie Köpfe und Beine von realen Crash-Test-Dummies

    Remote Sensing for Land Administration 2.0

    Get PDF
    The reprint “Land Administration 2.0” is an extension of the previous reprint “Remote Sensing for Land Administration”, another Special Issue in Remote Sensing. This reprint unpacks the responsible use and integration of emerging remote sensing techniques into the domain of land administration, including land registration, cadastre, land use planning, land valuation, land taxation, and land development. The title was chosen as “Land Administration 2.0” in reference to both this Special Issue being the second volume on the topic “Land Administration” and the next-generation requirements of land administration including demands for 3D, indoor, underground, real-time, high-accuracy, lower-cost, and interoperable land data and information
    corecore