122 research outputs found

    Data-driven 3D Reconstruction and View Synthesis of Dynamic Scene Elements

    Get PDF
    Our world is filled with living beings and other dynamic elements. It is important to record dynamic things and events for the sake of education, archeology, and culture inheritance. From vintage to modern times, people have recorded dynamic scene elements in different ways, from sequences of cave paintings to frames of motion pictures. This thesis focuses on two key computer vision techniques by which dynamic element representation moves beyond video capture: towards 3D reconstruction and view synthesis. Although previous methods on these two aspects have been adopted to model and represent static scene elements, dynamic scene elements present unique and difficult challenges for the tasks. This thesis focuses on three types of dynamic scene elements, namely 1) dynamic texture with static shape, 2) dynamic shapes with static texture, and 3) dynamic illumination of static scenes. Two research aspects will be explored to represent and visualize them: dynamic 3D reconstruction and dynamic view synthesis. Dynamic 3D reconstruction aims to recover the 3D geometry of dynamic objects and, by modeling the objects’ movements, bring 3D reconstructions to life. Dynamic view synthesis, on the other hand, summarizes or predicts the dynamic appearance change of dynamic objects – for example, the daytime-to-nighttime illumination of a building or the future movements of a rigid body. We first target the problem of reconstructing dynamic textures of objects that have (approximately) fixed 3D shape but time-varying appearance. Examples of such objects include waterfalls, fountains, and electronic billboards. Since the appearance of dynamic-textured objects can be random and complicated, estimating the 3D geometry of these objects from 2D images/video requires novel tools beyond the appearance-based point correspondence methods of traditional 3D computer vision. To perform this 3D reconstruction, we introduce a method that simultaneously 1) segments dynamically textured scene objects in the input images and 2) reconstructs the 3D geometry of the entire scene, assuming a static 3D shape for the dynamically textured objects. Compared to dynamic textures, the appearance change of dynamic shapes is due to physically defined motions like rigid body movements. In these cases, assumptions can be made about the object’s motion constraints in order to identify corresponding points on the object at different timepoints. For example, two points on a rigid object have constant distance between them in the 3D space, no matter how the object moves. Based on this assumption of local rigidity, we propose a robust method to correctly identify point correspondences of two images viewing the same moving object from different viewpoints and at different times. Dense 3D geometry could be obtained from the computed point correspondences. We apply this method on unsynchronized video streams, and observe that the number of inlier correspondences found by this method can be used as indicator for frame alignment among the different streams. To model dynamic scene appearance caused by illumination changes, we propose a framework to find a sequence of images that have similar geometric composition as a single reference image and also show a smooth transition in illumination throughout the day. These images could be registered to visualize patterns of illumination change from a single viewpoint. The final topic of this thesis involves predicting the movements of dynamic shapes in the image domain. Towards this end, we propose deep neural network architectures to predict future views of dynamic motions, such as rigid body movements and flowers blooming. Instead of predicting image pixels from the network, my methods predict pixel offsets and iteratively synthesize future views.Doctor of Philosoph

    Computer Vision Based Structural Identification Framework for Bridge Health Mornitoring

    Get PDF
    The objective of this dissertation is to develop a comprehensive Structural Identification (St-Id) framework with damage for bridge type structures by using cameras and computer vision technologies. The traditional St-Id frameworks rely on using conventional sensors. In this study, the collected input and output data employed in the St-Id system are acquired by series of vision-based measurements. The following novelties are proposed, developed and demonstrated in this project: a) vehicle load (input) modeling using computer vision, b) bridge response (output) using full non-contact approach using video/image processing, c) image-based structural identification using input-output measurements and new damage indicators. The input (loading) data due vehicles such as vehicle weights and vehicle locations on the bridges, are estimated by employing computer vision algorithms (detection, classification, and localization of objects) based on the video images of vehicles. Meanwhile, the output data as structural displacements are also obtained by defining and tracking image key-points of measurement locations. Subsequently, the input and output data sets are analyzed to construct novel types of damage indicators, named Unit Influence Surface (UIS). Finally, the new damage detection and localization framework is introduced that does not require a network of sensors, but much less number of sensors. The main research significance is the first time development of algorithms that transform the measured video images into a form that is highly damage-sensitive/change-sensitive for bridge assessment within the context of Structural Identification with input and output characterization. The study exploits the unique attributes of computer vision systems, where the signal is continuous in space. This requires new adaptations and transformations that can handle computer vision data/signals for structural engineering applications. This research will significantly advance current sensor-based structural health monitoring with computer-vision techniques, leading to practical applications for damage detection of complex structures with a novel approach. By using computer vision algorithms and cameras as special sensors for structural health monitoring, this study proposes an advance approach in bridge monitoring through which certain type of data that could not be collected by conventional sensors such as vehicle loads and location, can be obtained practically and accurately

    Long Range Motion Estimation and Applications

    Get PDF
    Finding correspondences between images underlies many computer vision problems, such as op- tical flow, tracking, stereovision and alignment. Finding these correspondences involves formulating a matching function and optimizing it. This optimization process is often gradient descent, which avoids exhaustive search, but relies on the assumption of being in the basin of attraction of the right local minimum. This is often the case when the displacement is small, and current methods obtain very accurate results for small motions. However, when the motion is large and the matching function is abrupt this assumption is less likely to be true. One traditional way of avoiding this abruptness is to smooth the matching function spatially by blurring the images. As the displacement becomes larger, the amount of blur required to smooth the matching function becomes also larger. This averaging of pixels leads to a loss of detail in the image. Therefore, there is a trade-off between the size of the objects that can be tracked and the displacement that can be captured. In this thesis we address the basic problem of increasing the size of the basin of attraction in a matching function. We use an image descriptor called distribution fields (DFs). By blurring the images in DF space instead of in pixel space, we increase the size of the basin attraction with respect to traditional methods. We show competitive results using DFs both in object tracking and optical flow. Finally we demonstrate an application of capturing large motions for temporal video stitching

    Minimising Human Annotation for Scalable Person Re-Identification

    Get PDF
    PhDAmong the diverse tasks performed by an intelligent distributed multi-camera surveillance system, person re-identification (re-id) is one of the most essential. Re-id refers to associating an individual or a group of people across non-overlapping cameras at different times and locations, and forms the foundation of a variety of applications ranging from security and forensic search to quotidian retail and health care. Though attracted rapidly increasing academic interests over the past decade, it still remains a non-trivial and unsolved problem for launching a practical reid system in real-world environments, due to the ambiguous and noisy feature of surveillance data and the potentially dramatic visual appearance changes caused by uncontrolled variations in human poses and divergent viewing conditions across distributed camera views. To mitigate such visual ambiguity and appearance variations, most existing re-id approaches rely on constructing fully supervised machine learning models with extensively labelled training datasets which is unscalable for practical applications in the real-world. Particularly, human annotators must exhaustively search over a vast quantity of offline collected data, manually label cross-view matched images of a large population between every possible camera pair. Nonetheless, having the prohibitively expensive human efforts dissipated, a trained re-id model is often not easily generalisable and transferable, due to the elastic and dynamic operating conditions of a surveillance system. With such motivations, this thesis proposes several scalable re-id approaches with significantly reduced human supervision, readily applied to practical applications. More specifically, this thesis has developed and investigated four new approaches for reducing human labelling effort in real-world re-id as follows: Chapter 3 The first approach is affinity mining from unlabelled data. Different from most existing supervised approaches, this work aims to model the discriminative information for reid without exploiting human annotations, but from the vast amount of unlabelled person image data, thus applicable to both semi-supervised and unsupervised re-id. It is non-trivial since the human annotated identity matching correspondence is often the key to discriminative re-id modelling. In this chapter, an alternative strategy is explored by specifically mining two types of affinity relationships among unlabelled data: (1) inter-view data affinity and (2) intra-view data affinity. In particular, with such affinity information encoded as constraints, a Regularised Kernel Subspace Learning model is developed to explicitly reduce inter-view appearance variations and meanwhile enhance intra-view appearance disparity for more discriminative re-id matching. Consequently, annotation costs can be immensely alleviated and a scalable re-id model is readily to be leveraged to plenty of unlabelled data which is inexpensive to collect. Chapter 4 The second approach is saliency discovery from unlabelled data. This chapter continues to investigate the problem of what can be learned in unlabelled images without identity labels annotated by human. Other than affinity mining as proposed by Chapter 3, a different solution is proposed. That is, to discover localised visual appearance saliency of person appearances. Intuitively, salient and atypical appearances of human are able to uniquely and representatively describe and identify an individual, whilst also often robust to view changes and detection variances. Motivated by this, an unsupervised Generative Topic Saliency model is proposed to jointly perform foreground extraction, saliency detection, as well as discriminative re-id matching. This approach completely avoids the exhaustive annotation effort for model training, and thus better scales to real-world applications. Moreover, its automatically discovered re-id saliency representations are shown to be semantically interpretable, suitable for generating useful visual analysis for deployable user-oriented software tools. Chapter 5 The third approach is incremental learning from actively labelled data. Since learning from unlabelled data alone yields less discriminative matching results, and in some cases there will be limited human labelling resources available for re-id modelling, this chapter thus investigate the problem of how to maximise a model’s discriminative capability with minimised labelling efforts. The challenges are to (1) automatically select the most representative data from a vast number of noisy/ambiguous unlabelled data in order to maximise model discrimination capacity; and (2) incrementally update the model parameters to accelerate machine responses and reduce human waiting time. To that end, this thesis proposes a regression based re-id model, characterised by its very fast and efficient incremental model updates. Furthermore, an effective active data sampling algorithm with three novel joint exploration-exploitation criteria is designed, to make automatic data selection feasible with notably reduced human labelling costs. Such an approach ensures annotations to be spent only on very few data samples which are most critical to model’s generalisation capability, instead of being exhausted by blindly labelling many noisy and redundant training samples. Chapter 6 The last technical area of this thesis is human-in-the-loop learning from relevance feedback. Whilst former chapters mainly investigate techniques to reduce human supervision for model training, this chapter motivates a novel research area to further minimise human efforts spent in the re-id deployment stage. In real-world applications where camera network and potential gallery size increases dramatically, even the state-of-the-art re-id models generate much inferior re-id performances and human involvements at deployment stage is inevitable. To minimise such human efforts and maximise re-id performance, this thesis explores an alternative approach to re-id by formulating a hybrid human-computer learning paradigm with humans in the model matching loop. Specifically, a Human Verification Incremental Learning model is formulated which does not require any pre-labelled training data, therefore scalable to new camera pairs; Moreover, the proposed model learns cumulatively from human feedback to provide an instant improvement to re-id ranking of each probe on-the-fly, thus scalable to large gallery sizes. It has been demonstrated that the proposed re-id model achieves significantly superior re-id results whilst only consumes much less human supervision effort. For facilitating a holistic understanding about this thesis, the main studies are summarised and framed into a graphical abstract as shown in Figur

    DragonflEYE: a passive approach to aerial collision sensing

    Get PDF
    "This dissertation describes the design, development and test of a passive wide-field optical aircraft collision sensing instrument titled 'DragonflEYE'. Such a ""sense-and-avoid"" instrument is desired for autonomous unmanned aerial systems operating in civilian airspace. The instrument was configured as a network of smart camera nodes and implemented using commercial, off-the-shelf components. An end-to-end imaging train model was developed and important figures of merit were derived. Transfer functions arising from intermediate mediums were discussed and their impact assessed. Multiple prototypes were developed. The expected performance of the instrument was iteratively evaluated on the prototypes, beginning with modeling activities followed by laboratory tests, ground tests and flight tests. A prototype was mounted on a Bell 205 helicopter for flight tests, with a Bell 206 helicopter acting as the target. Raw imagery was recorded alongside ancillary aircraft data, and stored for the offline assessment of performance. The ""range at first detection"" (R0), is presented as a robust measure of sensor performance, based on a suitably defined signal-to-noise ratio. The analysis treats target radiance fluctuations, ground clutter, atmospheric effects, platform motion and random noise elements. Under the measurement conditions, R0 exceeded flight crew acquisition ranges. Secondary figures of merit are also discussed, including time to impact, target size and growth, and the impact of resolution on detection range. The hardware was structured to facilitate a real-time hierarchical image-processing pipeline, with selected image processing techniques introduced. In particular, the height of an observed event above the horizon compensates for angular motion of the helicopter platform.
    • …
    corecore