2,001 research outputs found
Recommended from our members
Foreground detection of video through the integration of novel multiple detection algorithims
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe main outcomes of this research are the design of a foreground detection algorithm, which is more accurate and less time consuming than existing algorithms. By the term accuracy we mean an exact mask (which satisfies the respective ground truth value) of the foreground object(s). Motion detection being the prior component of foreground detection process can be achieved via pixel based and block based methods, both of which have their own merits and disadvantages. Pixel based methods are efficient in terms of accuracy but a time consuming process, so cannot be recommended for real time applications. On the other hand block based motion estimation has relatively less accuracy but consumes less time and is thus ideal for real-time applications. In the first proposed algorithm, block based motion estimation technique is opted for timely execution. To overcome the issue of accuracy another morphological based technique was adopted called opening-and-closing by reconstruction, which is a pixel based operation so produces higher accuracy and requires lesser time in execution. Morphological operation opening-and-closing by reconstruction finds the maxima and minima inside the foreground object(s). Thus this novel simultaneous process compensates for the lower accuracy of block based motion estimation. To verify the efficiency of this algorithm a complex video consisting of multiple colours, and fast and slow motions at various places was selected. Based on 11 different performance measures the proposed algorithm achieved an average accuracy of more than 24.73% than four of the well-established algorithms. Background subtraction, being the most cited algorithm for foreground detection, encounters the major problem of proper threshold value at run time. For effective value of the threshold at run time in background subtraction algorithm, the primary component of the foreground detection process, motion is used, in this next proposed algorithm. For the said purpose the smooth histogram peaks and valley of the motion were analyzed, which reflects the high and slow motion areas of the moving object(s) in the given frame and generates the threshold value at run time by exploiting the values of peaks and valley. This proposed algorithm was tested using four recommended video sequences including indoor and outdoor shoots, and were compared with five high ranked algorithms. Based on the values of standard performance measures, the proposed algorithm achieved an average of more than 12.30% higher accuracy results
Robust Modular Feature-Based Terrain-Aided Visual Navigation and Mapping
The visual feature-based Terrain-Aided Navigation (TAN) system presented in this thesis addresses the problem of constraining inertial drift introduced into the location estimate of Unmanned Aerial Vehicles (UAVs) in GPS-denied environment. The presented TAN system utilises salient visual features representing semantic or human-interpretable objects (roads, forest and water boundaries) from onboard aerial imagery and associates them to a database of reference features created a-priori, through application of the same feature detection algorithms to satellite imagery. Correlation of the detected features with the reference features via a series of the robust data association steps allows a localisation solution to be achieved with a finite absolute bound precision defined by the certainty of the reference dataset. The feature-based Visual Navigation System (VNS) presented in this thesis was originally developed for a navigation application using simulated multi-year satellite image datasets. The extension of the system application into the mapping domain, in turn, has been based on the real (not simulated) flight data and imagery. In the mapping study the full potential of the system, being a versatile tool for enhancing the accuracy of the information derived from the aerial imagery has been demonstrated. Not only have the visual features, such as road networks, shorelines and water bodies, been used to obtain a position âfixâ, they have also been used in reverse for accurate mapping of vehicles detected on the roads into an inertial space with improved precision. Combined correction of the geo-coding errors and improved aircraft localisation formed a robust solution to the defense mapping application. A system of the proposed design will provide a complete independent navigation solution to an autonomous UAV and additionally give it object tracking capability
Extraction of Unfoliaged Trees from Terrestrial Image Sequences
This thesis presents a generative statistical approach for the fully automatic three-dimensional (3D) extraction and reconstruction of unfoliaged deciduous trees from wide-baseline image sequences. Tree models improve the realism of 3D Geoinformation systems (GIS) by adding a natural touch. Unfoliaged trees are, however, difficult to reconstruct from images due to partially weak contrast, background clutter, occlusions, and particularly the possibly varying order of branches in images from different viewpoints. The proposed approach combines generative modeling by L-systems and statistical maximum a posteriori (MAP) estimation for the extraction of the 3D branching structure of trees. Background estimation is conducted by means of mathematical (gray scale) morphology as basis for generative
modeling. A Gaussian likelihood function based on intensity differences is employed to evaluate the hypotheses. A mechanism has been devised to control the sampling sequence of multiple parameters in the Markov Chain considering their characteristics and the performance in the previous step. A tree is classified into three typical branching types after the extraction of the first level of branches and more specific Production Rules of L-systems are used accordingly. Generic prior distributions for parameters are refined based on already extracted branches in a Bayesian framework and integrated into the MAP estimation. By these means most of the branching structure besides tiny twigs can be reconstructed. Results are presented in the form of VRML (Virtual Reality Modeling Language) models demonstrating the potential of the approach as well as its current shortcomings.Diese Dissertationsschrift stellt einen generativen statistischen Ansatz fĂŒr die vollautomatische drei-dimensionale (3D) Extraktion und Rekonstruktion unbelaubter LaubbĂ€ume aus Bildsequenzen mit groĂer Basis vor. Modelle fĂŒr BĂ€ume verbessern den Realismus von 3D Geoinformationssystemen (GIS), indem sie Letzteren eine natĂŒrliche Note geben. Wegen z.T. schwachem Kontrast, Störobjekten im Hintergrund, Verdeckungen und insbesondere der möglicherweise unterschiedlichen Ordnung der Ăste in Bildern von verschiedenen Blickpunkten sind unbelaubte BĂ€ume aber schwierig zu rekonstruieren. Der vorliegende Ansatz kombiniert generative Modellierung mittels L-Systemen und statistische Maximum A Posteriori (MAP) SchĂ€tzung fĂŒr die Extraktion der 3D Verzweigungsstruktur von BĂ€umen. Hintergrund-SchĂ€tzung wird auf Grundlage von mathematischer (Grauwert) Morphologie als Basis fĂŒr die generative Modellierung durchgefĂŒhrt. FĂŒr die Bewertung der Hypothesen wird eine GauĂsche Likelihood-Funktion basierend auf IntensitĂ€tsunterschieden benutzt. Es wurde ein Mechanismus entworfen, der die Reihenfolge der Verwendung mehrerer Parameter fĂŒr die Markoff-Kette basierend auf deren Charakteristik und Performance im letzten Schritt kontrolliert. Ein Baum wird nach der Extraktion der ersten Stufe von Ăsten in drei typische Verzweigungstypen klassifiziert und es werden entsprechend Produktionsregeln von spezifischen L-Systemen verwendet. Basierend auf bereits extrahierten Ăsten werden generische Prior-Verteilungen fĂŒr die Parameter in einem Bayesâschen Rahmen verfeinert und in die MAP SchĂ€tzung integriert. Damit kann ein groĂer Teil der Verzweigungsstruktur auĂer kleinen Ăsten extrahiert werden. Die Ergebnisse werden als VRML (Virtual Reality Modeling Language) Modelle dargestellt. Sie zeigen das Potenzial aber auch die noch vorhandenen Defizite des Ansatzes
Extracting semantic video objects
Dagan Feng2000-2001 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe
Video object segmentation and tracking.
Thesis (M.Sc.Eng.)-University of KwaZulu-Natal, 2005One of the more complex video processing problems currently vexing researchers is that of
object segmentation. This involves identifying semantically meaningful objects in a scene and
separating them from the background. While the human visual system is capable of performing
this task with minimal effort, development and research in machine vision is yet to yield
techniques that perform the task as effectively and efficiently. The problem is not only difficult
due to the complexity of the mechanisms involved but also because it is an ill-posed problem.
No unique segmentation of a scene exists as what is of interest as a segmented object depends
very much on the application and the scene content. In most situations a priori knowledge of the
nature of the problem is required, often depending on the specific application in which the
segmentation tool is to be used.
This research presents an automatic method of segmenting objects from a video sequence. The
intent is to extract and maintain both the shape and contour information as the object changes
dynamically over time in the sequence. A priori information is incorporated by requesting the
user to tune a set of input parameters prior to execution of the algorithm.
Motion is used as a semantic for video object extraction subject to the assumption that there is
only one moving object in the scene and the only motion in the video sequence is that of the
object of interest. It is further assumed that there is constant illumination and no occlusion of the
object.
A change detection mask is used to detect the moving object followed by morphological
operators to refine the result. The change detection mask yields a model of the moving
components; this is then compared to a contour map of the frame to extract a more accurate
contour of the moving object and this is then used to extract the object of interest itself. Since
the video object is moving as the sequence progresses, it is necessary to update the object over
time. To accomplish this, an object tracker has been implemented based on the Hausdorff objectmatching
algorithm.
The dissertation begins with an overview of segmentation techniques and a discussion of the
approach used in this research. This is followed by a detailed description of the algorithm
covering initial segmentation, object tracking across frames and video object extraction. Finally,
the semantic object extraction results for a variety of video sequences are presented and
evaluated
Self-Supervised Object-in-Gripper Segmentation from Robotic Motions
Accurate object segmentation is a crucial task in the context of robotic
manipulation. However, creating sufficient annotated training data for neural
networks is particularly time consuming and often requires manual labeling. To
this end, we propose a simple, yet robust solution for learning to segment
unknown objects grasped by a robot. Specifically, we exploit motion and
temporal cues in RGB video sequences. Using optical flow estimation we first
learn to predict segmentation masks of our given manipulator. Then, these
annotations are used in combination with motion cues to automatically
distinguish between background, manipulator and unknown, grasped object. In
contrast to existing systems our approach is fully self-supervised and
independent of precise camera calibration, 3D models or potentially imperfect
depth data. We perform a thorough comparison with alternative baselines and
approaches from literature. The object masks and views are shown to be suitable
training data for segmentation networks that generalize to novel environments
and also allow for watertight 3D reconstruction.Comment: 15 pages, 11 figures. Video:
https://www.youtube.com/watch?v=srEwuuIIgz
- âŠ