170 research outputs found

    Smart environment monitoring through micro unmanned aerial vehicles

    Get PDF
    In recent years, the improvements of small-scale Unmanned Aerial Vehicles (UAVs) in terms of flight time, automatic control, and remote transmission are promoting the development of a wide range of practical applications. In aerial video surveillance, the monitoring of broad areas still has many challenges due to the achievement of different tasks in real-time, including mosaicking, change detection, and object detection. In this thesis work, a small-scale UAV based vision system to maintain regular surveillance over target areas is proposed. The system works in two modes. The first mode allows to monitor an area of interest by performing several flights. During the first flight, it creates an incremental geo-referenced mosaic of an area of interest and classifies all the known elements (e.g., persons) found on the ground by an improved Faster R-CNN architecture previously trained. In subsequent reconnaissance flights, the system searches for any changes (e.g., disappearance of persons) that may occur in the mosaic by a histogram equalization and RGB-Local Binary Pattern (RGB-LBP) based algorithm. If present, the mosaic is updated. The second mode, allows to perform a real-time classification by using, again, our improved Faster R-CNN model, useful for time-critical operations. Thanks to different design features, the system works in real-time and performs mosaicking and change detection tasks at low-altitude, thus allowing the classification even of small objects. The proposed system was tested by using the whole set of challenging video sequences contained in the UAV Mosaicking and Change Detection (UMCD) dataset and other public datasets. The evaluation of the system by well-known performance metrics has shown remarkable results in terms of mosaic creation and updating, as well as in terms of change detection and object detection

    Analysis of affine motion-compensated prediction and its application in aerial video coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom) contained in aerial video sequences such as captured from unmanned aerial vehicles, an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived by extending a model of purely translational motion-compensated prediction. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdf of the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. It is of particular interest since such a model is considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. As the simplified affine motion model is able to describe most motions contained in aerial surveillance videos, its application in video coding is justified. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. Although the bit rate in motion-compensated prediction can be considerably reduced by using a motion model which is able to describe motion types occurring in the scene, the total video bit rate may remain quite high, depending on the motion estimation accuracy. Thus, at the example of aerial surveillance sequences, a codec independent region of interest- ( ROI -) based aerial video coding system is proposed that exploits the characteristic of such sequences. Assuming the captured scene to be planar, one frame can be projected into another using global motion compensation. Consequently, only new emerging areas have to be encoded. At the decoder, all new areas are registered into a so-called mosaic. From this, reconstructed frames are extracted and concatenated as a video sequence. To also preserve moving objects in the reconstructed video, local motion is detected and encoded in addition to the new areas. The proposed general ROI coding system was evaluated for very low and low bit rates between 100 and 5000 kbit/s for aerial sequences of HD resolution. It is able to reduce the bit rate by 90% compared to common HEVC coding of similar quality. Subjective tests confirm that the overall image quality of the ROI coding system exceeds that of a common HEVC encoder especially at very low bit rates below 1 Mbit/s. To prevent discontinuities introduced by inaccurate global motion estimation, as may be caused by radial lens distortion, a fully automatic in-loop radial distortion compensation is proposed. For this purpose, an unknown radial distortion compensation parameter that is constant for a group of frames is jointly estimated with the global motion. This parameter is optimized to minimize the distortions of the projections of frames in the mosaic. By this approach, the global motion compensation was improved by 0.27dB and discontinuities in the frames extracted from the mosaic are diminished. As an additional benefit, the generation of long-term mosaics becomes possible, constructed by more than 1500 aerial frames with unknown radial lens distortion and without any calibration or manual lens distortion compensation.Bewegungskompensierte PrĂ€diktion wird in Videocodierstandards wie High Efficiency Video Coding (HEVC) als ein SchlĂŒsselelement zur Datenkompression verwendet. Typischerweise kommt dabei ein rein translatorisches Bewegungsmodell zum Einsatz. Um auch nicht-translatorische Bewegungen wie Rotation oder Skalierung (Zoom) beschreiben zu können, welche beispielsweise in von unbemannten Luftfahrzeugen aufgezeichneten Luftbildvideosequenzen enthalten sind, kann ein affines Bewegungsmodell verwendet werden. In dieser Arbeit wird aufbauend auf einem rein translatorischen Bewegungsmodell ein Modell fĂŒr affine bewegungskompensierte PrĂ€diktion hergeleitet. Unter Verwendung der Raten-Verzerrungs-Theorie und des VerschiebungsschĂ€tzfehlers, welcher aus einer inexakten affinen BewegungsschĂ€tzung resultiert, wird die minimal erforderliche Bitrate zur Codierung des PrĂ€diktionsfehlers hergeleitet. FĂŒr die Modellierung wird angenommen, dass die sechs Parameter einer affinen Transformation durch statistisch unabhĂ€ngige SchĂ€tzfehler gestört sind. FĂŒr jeden dieser SchĂ€tzfehler wird angenommen, dass die Wahrscheinlichkeitsdichteverteilung einer mittelwertfreien Gaußverteilung entspricht. Aus der Verbundwahrscheinlichkeitsdichte der SchĂ€tzfehler wird die Wahrscheinlichkeitsdichte des ortsabhĂ€ngigen VerschiebungsschĂ€tzfehlers im Bild berechnet. Letztere wird schließlich zu der minimalen Bitrate in Beziehung gesetzt, welche fĂŒr die Codierung des PrĂ€diktionsfehlers benötigt wird. Analog zur obigen Ableitung des Modells fĂŒr das voll-affine Bewegungsmodell wird ein vereinfachtes affines Bewegungsmodell mit vier Freiheitsgraden untersucht. Ein solches Modell wird derzeit auch im Rahmen der Standardisierung des HEVC-Nachfolgestandards Versatile Video Coding (VVC) evaluiert. Da das vereinfachte Modell bereits die meisten in Luftbildvideosequenzen vorkommenden Bewegungen abbilden kann, ist der Einsatz des vereinfachten affinen Modells in der Videocodierung gerechtfertigt. Beide Modelle liefern wertvolle Informationen ĂŒber die minimal benötigte Bitrate zur Codierung des PrĂ€diktionsfehlers in AbhĂ€ngigkeit von der affinen SchĂ€tzgenauigkeit. Zwar kann die Bitrate mittels bewegungskompensierter PrĂ€diktion durch Wahl eines geeigneten Bewegungsmodells und akkurater affiner BewegungsschĂ€tzung stark reduziert werden, die verbleibende Gesamtbitrate kann allerdings dennoch relativ hoch sein. Deshalb wird am Beispiel von Luftbildvideosequenzen ein Regionen-von-Interesse- (ROI-) basiertes Codiersystem vorgeschlagen, welches spezielle Eigenschaften solcher Sequenzen ausnutzt. Unter der Annahme, dass eine aufgenommene Szene planar ist, kann ein Bild durch globale Bewegungskompensation in ein anderes projiziert werden. Deshalb mĂŒssen vom aktuellen Bild prinzipiell nur noch neu im Bild erscheinende Bereiche codiert werden. Am Decoder werden alle neuen Bildbereiche in einem gemeinsamen Mosaikbild registriert, aus dem schließlich die Einzelbilder der Videosequenz rekonstruiert werden können. Um auch lokale Bewegungen abzubilden, werden bewegte Objekte detektiert und zusĂ€tzlich zu neuen Bildbereichen als ROI codiert. Die LeistungsfĂ€higkeit des ROI-Codiersystems wurde insbesondere fĂŒr sehr niedrige und niedrige Bitraten von 100 bis 5000 kbit/s fĂŒr Bilder in HD-Auflösung evaluiert. Im Vergleich zu einer gewöhnlichen HEVC-Codierung kann die Bitrate um 90% reduziert werden. Durch subjektive Tests wurde bestĂ€tigt, dass das ROI-Codiersystem insbesondere fĂŒr sehr niedrige Bitraten von unter 1 Mbit/s deutlich leistungsfĂ€higer in Bezug auf Detailauflösung und Gesamteindruck ist als ein herkömmliches HEVC-Referenzsystem. Um DiskontinuitĂ€ten in den rekonstruierten Videobildern zu vermeiden, die durch eine durch Linsenverzeichnungen induzierte ungenaue globale BewegungsschĂ€tzung entstehen können, wird eine automatische Radialverzeichnungskorrektur vorgeschlagen. Dabei wird ein unbekannter, jedoch ĂŒber mehrere Bilder konstanter Korrekturparameter gemeinsam mit der globalen Bewegung geschĂ€tzt. Dieser Parameter wird derart optimiert, dass die Projektionen der Bilder in das Mosaik möglichst wenig verzerrt werden. Daraus resultiert eine um 0,27dB verbesserte globale Bewegungskompensation, wodurch weniger DiskontinuitĂ€ten in den aus dem Mosaik rekonstruierten Bildern entstehen. Dieses Verfahren ermöglicht zusĂ€tzlich die Erstellung von Langzeitmosaiken aus ĂŒber 1500 Luftbildern mit unbekannter Radialverzeichnung und ohne manuelle Korrektur

    MusA: Using Indoor Positioning and Navigation to Enhance Cultural Experiences in a museum

    Get PDF
    In recent years there has been a growing interest into the use of multimedia mobile guides in museum environments. Mobile devices have the capabilities to detect the user context and to provide pieces of information suitable to help visitors discovering and following the logical and emotional connections that develop during the visit. In this scenario, location based services (LBS) currently represent an asset, and the choice of the technology to determine users' position, combined with the definition of methods that can effectively convey information, become key issues in the design process. In this work, we present MusA (Museum Assistant), a general framework for the development of multimedia interactive guides for mobile devices. Its main feature is a vision-based indoor positioning system that allows the provision of several LBS, from way-finding to the contextualized communication of cultural contents, aimed at providing a meaningful exploration of exhibits according to visitors' personal interest and curiosity. Starting from the thorough description of the system architecture, the article presents the implementation of two mobile guides, developed to respectively address adults and children, and discusses the evaluation of the user experience and the visitors' appreciation of these application

    A Combined EM and Visual Tracking Probabilistic Model for Robust Mosaicking: Application to Fetoscopy

    Get PDF
    Twin-to-Twin Transfusion Syndrome (TTTS) is a progressive pregnancy complication in which inter-twin vascular connections in the shared placenta result in a blood flow imbalance between the twins. The most effective therapy is to sever these connections by laser photo-coagulation. However, the limited field of view of the fetoscope hinders their identification. A potential solution is to augment the surgeon’s view by creating a mosaic image of the placenta. State-of-the-art mosaicking methods use feature-based ap- proaches, which have three main limitations: (i) they are not robust against corrupt data e.g. blurred frames, (ii) tem- poral information is not used, (iii) the resulting mosaic suf- fers from drift. We introduce a probabilistic temporal model that incorporates electromagnetic and visual tracking data to achieve a robust mosaic with reduced drift. By assuming planarity of the imaged object, the nRT decomposition can be used to parametrize the state vector. Finally, we tackle the non-linear nature of the problem in a numerically stable manner by using the Square Root Unscented Kalman Filter. We show an improvement in performance in terms of robustness as well as a reduction of the drift in comparison to state-of-the-art methods in synthetic, phantom and ex vivo datasets

    Using Linear Features for Aerial Image Sequence Mosaiking

    Get PDF
    With recent advances in sensor technology and digital image processing techniques, automatic image mosaicking has received increased attention in a variety of geospatial applications, ranging from panorama generation and video surveillance to image based rendering. The geometric transformation used to link images in a mosaic is the subject of image orientation, a fundamental photogrammetric task that represents a major research area in digital image analysis. It involves the determination of the parameters that express the location and pose of a camera at the time it captured an image. In aerial applications the typical parameters comprise two translations (along the x and y coordinates) and one rotation (rotation about the z axis). Orientation typically proceeds by extracting from an image control points, i.e. points with known coordinates. Salient points such as road intersections, and building corners are commonly used to perform this task. However, such points may contain minimal information other than their radiometric uniqueness, and, more importantly, in some areas they may be impossible to obtain (e.g. in rural and arid areas). To overcome this problem we introduce an alternative approach that uses linear features such as roads and rivers for image mosaicking. Such features are identified and matched to their counterparts in overlapping imagery. Our matching approach uses critical points (e.g. breakpoints) of linear features and the information conveyed by them (e.g. local curvature values and distance metrics) to match two such features and orient the images in which they are depicted. In this manner we orient overlapping images by comparing breakpoint representations of complete or partial linear features depicted in them. By considering broader feature metrics (instead of single points) in our matching scheme we aim to eliminate the effect of erroneous point matches in image mosaicking. Our approach does not require prior approximate parameters, which are typically an essential requirement for successful convergence of point matching schemes. Furthermore, we show that large rotation variations about the z-axis may be recovered. With the acquired orientation parameters, image sequences are mosaicked. Experiments with synthetic aerial image sequences are included in this thesis to demonstrate the performance of our approach

    The Exploitation of Data from Remote and Human Sensors for Environment Monitoring in the SMAT Project

    Get PDF
    In this paper, we outline the functionalities of a system that integrates and controls a fleet of Unmanned Aircraft Vehicles (UAVs). UAVs have a set of payload sensors employed for territorial surveillance, whose outputs are stored in the system and analysed by the data exploitation functions at different levels. In particular, we detail the second level data exploitation function whose aim is to improve the sensors data interpretation in the post-mission activities. It is concerned with the mosaicking of the aerial images and the cartography enrichment by human sensors—the social media users. We also describe the software architecture for the development of a mash-up (the integration of information and functionalities coming from the Web) and the possibility of using human sensors in the monitoring of the territory, a field in which, traditionally, the involved sensors were only the hardware ones.JRC.H.6-Digital Earth and Reference Dat

    Multiperspective mosaics and layered representation for scene visualization

    Get PDF
    This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene

    An active vision system for tracking and mosaicking on UAV.

    Get PDF
    Lin, Kai Wun.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 120-127).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview of the UAV Project --- p.1Chapter 1.2 --- Challenges on Vision System for UAV --- p.2Chapter 1.3 --- Contributions of this Work --- p.4Chapter 1.4 --- Organization of Thesis --- p.6Chapter 2 --- Image Sensor Selection and Evaluation --- p.8Chapter 2.1 --- Image Sensor Overview --- p.8Chapter 2.1.1 --- Comparing Sensor Features and Performance --- p.9Chapter 2.1.2 --- Rolling Shutter vsGlobal Shutter --- p.10Chapter 2.2 --- Sensor Evaluation through USB Peripheral --- p.11Chapter 2.2.1 --- Interfacing Image Sensor and USB Controller --- p.12Chapter 2.2.2 --- Image Sensor Configuration --- p.14Chapter 2.3 --- Image Data Transmitting and Processing --- p.17Chapter 2.3.1 --- Data Transfer Mode and Buffering on USB Controller --- p.18Chapter 2.3.2 --- Demosaicking of Bayer Image Data --- p.20Chapter 2.4 --- Splitting Images and Exposure Problem --- p.22Chapter 2.4.1 --- Buffer Overflow on USB Controller --- p.22Chapter 2.4.2 --- Image Luminance and Exposure Adjustment --- p.24Chapter 3 --- Embedded System for Vision Processing --- p.26Chapter 3.1 --- Overview of the Embedded System --- p.26Chapter 3.1.1 --- TI OMAP3530 Processor --- p.27Chapter 3.1.2 --- Gumstix Overo Fire Computer-on-Module --- p.27Chapter 3.2 --- Interfacing Camera Module to the Embedded System --- p.28Chapter 3.2.1 --- Image Signal Processing Subsystem --- p.29Chapter 3.2.2 --- Camera Module Adapting Board --- p.30Chapter 3.2.3 --- Image Sensor Driver and Program Development --- p.31Chapter 3.3 --- View-stabilizing Biaxial Camera Platform --- p.34Chapter 3.3.1 --- The New Camera System iv --- p.35Chapter 3.3.2 --- View-stabilizing Pan-tilt Platform --- p.41Chapter 3.4 --- Overall System Architecture and UAV Integration --- p.46Chapter 4 --- Target Tracking and Geo-locating --- p.50Chapter 4.1 --- Camera Calibration --- p.51Chapter 4.1.1 --- The Perspective Camera Model --- p.51Chapter 4.1.2 --- Camera Lens Distortions --- p.53Chapter 4.1.3 --- Calibration Toolbox and Results --- p.54Chapter 4.2 --- Selection of Object Features and Trackers --- p.56Chapter 4.2.1 --- Harris Corner Detection --- p.58Chapter 4.2.2 --- Color Histogram --- p.59Chapter 4.2.3 --- KLT and Mean-shift Tracker --- p.59Chapter 4.3 --- Target Auto-centering --- p.64Chapter 4.3.1 --- Formulation of the PID Controller --- p.65Chapter 4.3.2 --- Control Gain Settings and Tuning --- p.69Chapter 4.4 --- Geo-locating of Tracked Target --- p.69Chapter 4.4.1 --- Coordinate Frame Transformation --- p.70Chapter 4.4.2 --- Depth Estimation and Target Locating --- p.74Chapter 4.5 --- Results and Discussion --- p.77Chapter 5 --- Real-time Aerial Mosaic Building --- p.89Chapter 5.1 --- Motion Model Selection --- p.90Chapter 5.1.1 --- Planar Perspective Motion Model --- p.90Chapter 5.2 --- Feature-based Image Alignment --- p.91Chapter 5.2.1 --- Image Preprocessing --- p.91Chapter 5.2.2 --- Feature Extraction and Matching --- p.92Chapter 5.2.3 --- Image Alignment using RANSAC Algorithm --- p.94Chapter 5.3 --- Image Composition --- p.95Chapter 5.3.1 --- Image Blending with Distance Map --- p.96Chapter 5.3.2 --- Overall Stitching Process --- p.98Chapter 5.4 --- Mosaic Simulation using Google Earth --- p.99Chapter 5.5 --- Results and Discussion --- p.100Chapter 6 --- Conclusion and Further Work --- p.108Chapter A --- System Schematics --- p.111Chapter B --- Image Sensor Sensitivity --- p.118Bibliography --- p.12
    • 

    corecore