188 research outputs found

    Adaptive object segmentation and tracking

    Get PDF
    Efficient tracking of deformable objects moving with variable velocities is an important current research problem. In this thesis a robust tracking model is proposed for the automatic detection, recognition and tracking of target objects which are subject to variable orientations and velocities and are viewed under variable ambient lighting conditions. The tracking model can be applied to efficiently track fast moving vehicles and other objects in various complex scenarios. The tracking model is evaluated on both colour visible band and infra-red band video sequences acquired from the air by the Sussex police helicopter and other collaborators. The observations made validate the improved performance of the model over existing methods. The thesis is divided in three major sections. The first section details the development of an enhanced active contour for object segmentation. The second section describes an implementation of a global active contour orientation model. The third section describes the tracking model and assesses it performance on the aerial video sequences. In the first part of the thesis an enhanced active contour snake model using the difference of Gaussian (DoG) filter is reported and discussed in detail. An acquisition method based on the enhanced active contour method developed that can assist the proposed tracking system is tested. The active contour model is further enhanced by the use of a disambiguation framework designed to assist multiple object segmentation which is used to demonstrate that the enhanced active contour model can be used for robust multiple object segmentation and tracking. The active contour model developed not only facilitates the efficient update of the tracking filter but also decreases the latency involved in tracking targets in real-time. As far as computational effort is concerned, the active contour model presented improves the computational cost by 85% compared to existing active contour models. The second part of the thesis introduces the global active contour orientation (GACO) technique for statistical measurement of contoured object orientation. It is an overall object orientation measurement method which uses the proposed active contour model along with statistical measurement techniques. The use of the GACO technique, incorporating the active contour model, to measure object orientation angle is discussed in detail. A real-time door surveillance application based on the GACO technique is developed and evaluated on the i-LIDS door surveillance dataset provided by the UK Home Office. The performance results demonstrate the use of GACO to evaluate the door surveillance dataset gives a success rate of 92%. Finally, a combined approach involving the proposed active contour model and an optimal trade-off maximum average correlation height (OT-MACH) filter for tracking is presented. The implementation of methods for controlling the area of support of the OT-MACH filter is discussed in detail. The proposed active contour method as the area of support for the OT-MACH filter is shown to significantly improve the performance of the OT-MACH filter's ability to track vehicles moving within highly cluttered visible and infra-red band video sequence

    Multiple cue integration for robust tracking in dynamic environments: application to video relighting

    Get PDF
    L'anàlisi de moviment i seguiment d'objectes ha estat un dels pricipals focus d'atenció en la comunitat de visió per computador durant les dues darreres dècades. L'interès per aquesta àrea de recerca resideix en el seu ample ventall d'aplicabilitat, que s'extén des de tasques de navegació de vehicles autònoms i robots, fins a aplications en la indústria de l'entreteniment i realitat virtual.Tot i que s'han aconseguit resultats espectaculars en problemes específics, el seguiment d'objectes continua essent un problema obert, ja que els mètodes disponibles són propensos a ser sensibles a diversos factors i condicions no estacionàries de l'entorn, com ara moviments impredictibles de l'objecte a seguir, canvis suaus o abruptes de la il·luminació, proximitat d'objectes similars o fons confusos. Enfront aquests factors de confusió la integració de múltiples característiques ha demostrat que permet millorar la robustesa dels algoritmes de seguiment. En els darrers anys, degut a la creixent capacitat de càlcul dels ordinadors, hi ha hagut un significatiu increment en el disseny de complexes sistemes de seguiment que consideren simultàniament múltiples característiques de l'objecte. No obstant, la majoria d'aquests algoritmes estan basats enheurístiques i regles ad-hoc formulades per aplications específiques, fent-ne impossible l'extrapolació a noves condicions de l'entorn.En aquesta tesi proposem un marc probabilístic general per integrar el nombre de característiques de l'objecte que siguin necessàries, permetent que interactuin mútuament per tal d'estimar-ne el seu estat amb precisió, i per tant, estimar amb precisió la posició de l'objecte que s'està seguint. Aquest marc, s'utilitza posteriorment per dissenyar un algoritme de seguiment, que es valida en diverses seqüències de vídeo que contenen canvis abruptes de posició i il·luminació, camuflament de l'objecte i deformacions no rígides. Entre les característiques que s'han utilitzat per representar l'objecte, cal destacar la paramatrització robusta del color en un espai de color dependent de l'objecte, que permet distingir-lo del fons més clarament que altres espais de color típicament ulitzats al llarg de la literatura.En la darrera part de la tesi dissenyem una tècnica per re-il·luminar tant escenes estàtiques com en moviment, de les que s'en desconeix la geometria. La re-il·luminació es realitza amb un mètode 'basat en imatges', on la generació de les images de l'escena sota noves condicions d'il·luminació s'aconsegueix a partir de combinacions lineals d'un conjunt d'imatges de referència pre-capturades, i que han estat generades il·luminant l'escena amb patrons de llum coneguts. Com que la posició i intensitat de les fonts d'il.luminació que formen aquests patrons de llum es pot controlar, és natural preguntar-nos: quina és la manera més òptima d'il·luminar una escena per tal de reduir el nombre d'imatges de referència? Demostrem que la millor manera d'il·luminar l'escena (és a dir, la que minimitza el nombre d'imatges de referència) no és utilitzant una seqüència de fonts d'il·luminació puntuals, com es fa generalment, sinó a través d'una seqüència de patrons de llum d'una base d'il·luminació depenent de l'objecte. És important destacar que quan es re-il·luminen seqüències de vídeo, les imatges successives s'han d'alinear respecte a un sistema de coordenades comú. Com que cada imatge ha estat generada per un patró de llum diferent il·uminant l'escena, es produiran canvis d'il·luminació bruscos entre imatges de referència consecutives. Sota aquestes circumstàncies, el mètode de seguiment proposat en aquesta tesi juga un paper fonamental. Finalment, presentem diversos resultats on re-il·luminem seqüències de vídeo reals d'objectes i cares d'actors en moviment. En cada cas, tot i que s'adquireix un únic vídeo, som capaços de re-il·luminar una i altra vegada, controlant la direcció de la llum, la seva intensitat, i el color.Motion analysis and object tracking has been one of the principal focus of attention over the past two decades within the computer vision community. The interest of this research area lies in its wide range of applicability, extending from autonomous vehicle and robot navigation tasks, to entertainment and virtual reality applications.Even though impressive results have been obtained in specific problems, object tracking is still an open problem, since available methods are prone to be sensitive to several artifacts and non-stationary environment conditions, such as unpredictable target movements, gradual or abrupt changes of illumination, proximity of similar objects or cluttered backgrounds. Multiple cue integration has been proved to enhance the robustness of the tracking algorithms in front of such disturbances. In recent years, due to the increasing power of the computers, there has been a significant interest in building complex tracking systems which simultaneously consider multiple cues. However, most of these algorithms are based on heuristics and ad-hoc rules formulated for specific applications, making impossible to extrapolate them to new environment conditions.In this dissertation we propose a general probabilistic framework to integrate as many object features as necessary, permitting them to mutually interact in order to obtain a precise estimation of its state, and thus, a precise estimate of the target position. This framework is utilized to design a tracking algorithm, which is validated on several video sequences involving abrupt position and illumination changes, target camouflaging and non-rigid deformations. Among the utilized features to represent the target, it is important to point out the use of a robust parameterization of the target color in an object dependent colorspace which allows to distinguish the object from the background more clearly than other colorspaces commonly used in the literature.In the last part of the dissertation, we design an approach for relighting static and moving scenes with unknown geometry. The relighting is performed through an -image-based' methodology, where the rendering under new lighting conditions is achieved by linear combinations of a set of pre-acquired reference images of the scene illuminated by known light patterns. Since the placement and brightness of the light sources composing such light patterns can be controlled, it is natural to ask: what is the optimal way to illuminate the scene to reduce the number of reference images that are needed? We show that the best way to light the scene (i.e., the way that minimizes the number of reference images) is not using a sequence of single, compact light sources as is most commonly done, but rather to use a sequence of lighting patterns as given by an object-dependent lighting basis. It is important to note that when relighting video sequences, consecutive images need to be aligned with respect to a common coordinate frame. However, since each frame is generated by a different light pattern illuminating the scene, abrupt illumination changes between consecutive reference images are produced. Under these circumstances, the tracking framework designed in this dissertation plays a central role. Finally, we present several relighting results on real video sequences of moving objects, moving faces, and scenes containing both. In each case, although a single video clip was captured, we are able to relight again and again, controlling the lighting direction, extent, and color.Postprint (published version

    Robust tracking with spatio-velocity snakes: Kalman filtering approach

    Full text link

    Dependent multiple cue integration for robust tracking

    Get PDF
    We propose a new technique for fusing multiple cues to robustly segment an object from its background in video sequences that suffer from abrupt changes of both illumination and position of the target. Robustness is achieved by the integration of appearance and geometric object features and by their estimation using Bayesian filters, such as Kalman or particle filters. In particular, each filter estimates the state of a specific object feature, conditionally dependent on another feature estimated by a distinct filter. This dependence provides improved target representations, permitting us to segment it out from the background even in nonstationary sequences. Considering that the procedure of the Bayesian filters may be described by a "hypotheses generation-hypotheses correction" strategy, the major novelty of our methodology compared to previous approaches is that the mutual dependence between filters is considered during the feature observation, that is, into the "hypotheses-correction" stage, instead of considering it when generating the hypotheses. This proves to be much more effective in terms of accuracy and reliability. The proposed method is analytically justified and applied to develop a robust tracking system that adapts online and simultaneously the color space where the image points are represented, the color distributions, the contour of the object, and its bounding box. Results with synthetic data and real video sequences demonstrate the robustness and versatility of our method.Peer Reviewe

    Modelling and tracking objects with a topology preserving self-organising neural network

    Get PDF
    Human gestures form an integral part in our everyday communication. We use gestures not only to reinforce meaning, but also to describe the shape of objects, to play games, and to communicate in noisy environments. Vision systems that exploit gestures are often limited by inaccuracies inherent in handcrafted models. These models are generated from a collection of training examples which requires segmentation and alignment. Segmentation in gesture recognition typically involves manual intervention, a time consuming process that is feasible only for a limited set of gestures. Ideally gesture models should be automatically acquired via a learning scheme that enables the acquisition of detailed behavioural knowledge only from topological and temporal observation. The research described in this thesis is motivated by a desire to provide a framework for the unsupervised acquisition and tracking of gesture models. In any learning framework, the initialisation of the shapes is very crucial. Hence, it would be beneficial to have a robust model not prone to noise that can automatically correspond the set of shapes. In the first part of this thesis, we develop a framework for building statistical 2D shape models by extracting, labelling and corresponding landmark points using only topological relations derived from competitive hebbian learning. The method is based on the assumption that correspondences can be addressed as an unsupervised classification problem where landmark points are the cluster centres (nodes) in a high-dimensional vector space. The approach is novel in that the network can be used in cases where the topological structure of the input pattern is not known a priori thus no topology of fixed dimensionality is imposed onto the network. In the second part, we propose an approach to minimise the user intervention in the adaptation process, which requires to specify a priori the number of nodes needed to represent an object, by utilising an automatic criterion for maximum node growth. Furthermore, this model is used to represent motion in image sequences by initialising a suitable segmentation that separates the object of interest from the background. The segmentation system takes into consideration some illumination tolerance, images as inputs from ordinary cameras and webcams, some low to medium cluttered background avoiding extremely cluttered backgrounds, and that the objects are at close range from the camera. In the final part, we extend the framework for the automatic modelling and unsupervised tracking of 2D hand gestures in a sequence of k frames. The aim is to use the tracked frames as training examples in order to build the model and maintain correspondences. To do that we add an active step to the Growing Neural Gas (GNG) network, which we call Active Growing Neural Gas (A-GNG) that takes into consideration not only the geometrical position of the nodes, but also the underlined local feature structure of the image, and the distance vector between successive images. The quality of our model is measured through the calculation of the topographic product. The topographic product is our topology preserving measure which quantifies the neighbourhood preservation. In our system we have applied specific restrictions in the velocity and the appearance of the gestures to simplify the difficulty of the motion analysis in the gesture representation. The proposed framework has been validated on applications related to sign language. The work has great potential in Virtual Reality (VR) applications where the learning and the representation of gestures becomes natural without the need of expensive wear cable sensors

    Vision-based techniques for gait recognition

    Full text link
    Global security concerns have raised a proliferation of video surveillance devices. Intelligent surveillance systems seek to discover possible threats automatically and raise alerts. Being able to identify the surveyed object can help determine its threat level. The current generation of devices provide digital video data to be analysed for time varying features to assist in the identification process. Commonly, people queue up to access a facility and approach a video camera in full frontal view. In this environment, a variety of biometrics are available - for example, gait which includes temporal features like stride period. Gait can be measured unobtrusively at a distance. The video data will also include face features, which are short-range biometrics. In this way, one can combine biometrics naturally using one set of data. In this paper we survey current techniques of gait recognition and modelling with the environment in which the research was conducted. We also discuss in detail the issues arising from deriving gait data, such as perspective and occlusion effects, together with the associated computer vision challenges of reliable tracking of human movement. Then, after highlighting these issues and challenges related to gait processing, we proceed to discuss the frameworks combining gait with other biometrics. We then provide motivations for a novel paradigm in biometrics-based human recognition, i.e. the use of the fronto-normal view of gait as a far-range biometrics combined with biometrics operating at a near distance

    A collaborative approach to image segmentation and behavior recognition from image sequences

    Get PDF
    Visual behavior recognition is currently a highly active research area. This is due both to the scientific challenge posed by the complexity of the task, and to the growing interest in its applications, such as automated visual surveillance, human-computer interaction, medical diagnosis or video indexing/retrieval. A large number of different approaches have been developed, whose complexity and underlying models depend on the goals of the particular application which is targeted. The general trend followed by these approaches is the separation of the behavior recognition task into two sequential processes. The first one is a feature extraction process, where features which are considered relevant for the recognition task are extracted from the input image sequence. The second one is the actual recognition process, where the extracted features are classified in terms of the pre-defined behavior classes. One problematic issue of such a two-pass procedure is that the recognition process is highly dependent on the feature extraction process, and does not have the possibility to influence it. Consequently, a failure of the feature extraction process may impair correct recognition. The focus of our thesis is on the recognition of single object behavior from monocular image sequences. We propose a general framework where feature extraction and behavior recognition are performed jointly, thereby allowing the two tasks to mutually improve their results through collaboration and sharing of existing knowledge. The intended collaboration is achieved by introducing a probabilistic temporal model based on a Hidden Markov Model (HMM). In our formulation, behavior is decomposed into a sequence of simple actions and each action is associated with a different probability of observing a particular set of object attributes within the image at a given time. Moreover, our model includes a probabilistic formulation of attribute (feature) extraction in terms of image segmentation. Contrary to existing approaches, segmentation is achieved by taking into account the relative probabilities of each action, which are provided by the underlying HMM. In this context, we solve the joint problem of attribute extraction and behavior recognition by developing a variation of the Viterbi decoding algorithm, adapted to our model. Within the algorithm derivation, we translate the probabilistic attribute extraction formulation into a variational segmentation model. The proposed model is defined as a combination of typical image- and contour-dependent energy terms with a term which encapsulates prior information, offered by the collaborating recognition process. This prior information is introduced by means of a competition between multiple prior terms, corresponding to the different action classes which may have generated the current image. As a result of our algorithm, the recognized behavior is represented as a succession of action classes corresponding to the images in the given sequence. Furthermore, we develop an extension of our general framework, that allows us to deal with a common situation encountered in applications. Namely, we treat the case where behavior is specified in terms of a discrete set of behavior types, made up of different successions of actions, which belong to a shared set of action classes. Therefore, the recognition of behavior requires the estimation of the most probable behavior type and of the corresponding most probable succession of action classes which explains the observed image sequence. To this end, we modify our initial model and develop a corresponding Viterbi decoding algorithm. Both our initial framework and its extension are defined in general terms, involving several free parameters which can be chosen so as to obtain suitable implementations for the targeted applications. In this thesis, we demonstrate the viability of the proposed framework by developing particular implementations for two applications. Both applications belong to the field of gesture recognition and concern finger-counting and finger-spelling. For the finger-counting application, we use our original framework, whereas for the finger-spelling application, we use its proposed extension. For both applications, we instantiate the free parameters of the respective frameworks with particular models and quantities. Then, we explain the training of the obtained models from specific training data. Finally, we present the results obtained by testing our trained models on new image sequences. The test results show the robustness of our models in difficult cases, including noisy images, occlusions of the gesturing hand and cluttered background. For the finger-spelling application, a comparison with the traditional sequential approach to image segmentation and behavior recognition illustrates the superiority of our collaborative model

    Research on robust salient object extraction in image

    Get PDF
    制度:新 ; 文部省報告番号:甲2641号 ; 学位の種類:博士(工学) ; 授与年月日:2008/3/15 ; 早大学位記番号:新480

    Real-time visual tracking using image processing and filtering methods

    Get PDF
    The main goal of this thesis is to develop real-time computer vision algorithms in order to detect and to track targets in uncertain complex environments purely based on a visual sensor. Two major subjects addressed by this work are: 1. The development of fast and robust image segmentation algorithms that are able to search and automatically detect targets in a given image. 2. The development of sound filtering algorithms to reduce the effects of noise in signals from the image processing. The main constraint of this research is that the algorithms should work in real-time with limited computing power on an onboard computer in an aircraft. In particular, we focus on contour tracking which tracks the outline of the target represented by contours in the image plane. This thesis is concerned with three specific categories, namely image segmentation, shape modeling, and signal filtering. We have designed image segmentation algorithms based on geometric active contours implemented via level set methods. Geometric active contours are deformable contours that automatically track the outlines of objects in images. In this approach, the contour in the image plane is represented as the zero-level set of a higher dimensional function. (One example of the higher dimensional function is a three-dimensional surface for a two-dimensional contour.) This approach handles the topological changes (e.g., merging, splitting) of the contour naturally. Although geometric active contours prevail in many fields of computer vision, they suffer from the high computational costs associated with level set methods. Therefore, simplified versions of level set methods such as fast marching methods are often used in problems of real-time visual tracking. This thesis presents the development of a fast and robust segmentation algorithm based on up-to-date extensions of level set methods and geometric active contours, namely a fast implementation of Chan-Vese's (active contour) model (FICVM). The shape prior is a useful cue in the recognition of the true target. For the contour tracker, the outline of the target can be easily disrupted by noise. In geometric active contours, to cope with deviations from the true outline of the target, a higher dimensional function is constructed based on the shape prior, and the contour tracks the outline of an object by considering the difference between the higher dimensional functions obtained from the shape prior and from a measurement in a given image. The higher dimensional function is often a distance map which requires high computational costs for construction. This thesis focuses on the extraction of shape information from only the zero-level set of the higher dimensional function. This strategy compensates for inaccuracies in the calculation of the shape difference that occur when a simplified higher dimensional function is used. This is named as contour-based shape modeling. Filtering is an essential element in tracking problems because of the presence of noise in system models and measurements. The well-known Kalman filter provides an exact solution only for problems which have linear models and Gaussian distributions (linear/Gaussian problems). For nonlinear/non-Gaussian problems, particle filters have received much attention in recent years. Particle filtering is useful in the approximation of complicated posterior probability distribution functions. However, the computational burden of particle filtering prevents it from performing at full capacity in real-time applications. This thesis concentrates on improving the processing time of particle filtering for real-time applications. In principle, we follow the particle filter in the geometric active contour framework. This thesis proposes an advanced blob tracking scheme in which a blob contains shape prior information of the target. This scheme simplifies the sampling process and quickly suggests the samples which have a high probability of being the target. Only for these samples is the contour tracking algorithm applied to obtain a more detailed state estimate. Curve evolution in the contour tracking is realized by the FICVM. The dissimilarity measure is calculated by the contour based shape modeling method and the shape prior is updated when it satisfies certain conditions. The new particle filter is applied to the problems of low contrast and severe daylight conditions, to cluttered environments, and to the appearing/disappearing target tracking. We have also demonstrated the utility of the filtering algorithm for multiple target tracking in the presence of occlusions. This thesis presents several test results from simulations and flight tests. In these tests, the proposed algorithms demonstrated promising results in varied situations of tracking.Ph.D.Committee Chair: Eric N. Johnson; Committee Co-Chair: Allen R. Tannenbaum; Committee Member: Anthony J. Calise; Committee Member: Eric Feron; Committee Member: Patricio A. Vel
    corecore