230 research outputs found

    Modelling and extracting periodically deforming objects by continuous, spatio-temporal shape description

    Get PDF
    This thesis proposes a new model for describing spatio-temporally deforming objects. Through a novel use of Fourier descriptors, it is shown how arbitrary shape description can be extended to include spatio-temporal shape deformation. It is further demonstrated that these new spatio-temporal Fourier descriptors have the ability to be used as the basis for both the recognition and extraction of deforming objects. Application of this new recognition technique to human gait sequences demonstrates recognition rates of over 86% for individual human subjects, signifying that these descriptors possess unique descriptive properties. Based upon the new spatio-temporal Fourier descriptor model, a new technique for the detection and extraction of deforming shapes from an image sequence is presented through a new variant of the Hough transform (the Continuous Deformable Hough Transform) that utilises spatio-temporal shape correlation within an evidence-gathering context. This new technique demonstrates excellent success rates and tolerance to noise, correctly extracting human subjects in image sequences corrupted with noise levels of up to 80%. The technique is also tested extensively using real-world data, thus demonstrating its worth in a modern-day computer vision system. Both the spatio-temporal Fourier descriptor model, the Continuous Deformable Hough Transform, and aspects of their application are fully discussed throughout the thesis, along with ideas and suggestions for future research and development.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Modelling and extracting periodically deforming objects by continuous, spatio-temporal shape description

    Get PDF
    This thesis proposes a new model for describing spatio-temporally deforming objects. Through a novel use of Fourier descriptors, it is shown how arbitrary shape description can be extended to include spatio-temporal shape deformation. It is further demonstrated that these new spatio-temporal Fourier descriptors have the ability to be used as the basis for both the recognition and extraction of deforming objects. Application of this new recognition technique to human gait sequences demonstrates recognition rates of over 86% for individual human subjects, signifying that these descriptors possess unique descriptive properties. Based upon the new spatio-temporal Fourier descriptor model, a new technique for the detection and extraction of deforming shapes from an image sequence is presented through a new variant of the Hough transform (the Continuous Deformable Hough Transform) that utilises spatio-temporal shape correlation within an evidence-gathering context. This new technique demonstrates excellent success rates and tolerance to noise, correctly extracting human subjects in image sequences corrupted with noise levels of up to 80%. The technique is also tested extensively using real-world data, thus demonstrating its worth in a modern-day computer vision system. Both the spatio-temporal Fourier descriptor model, the Continuous Deformable Hough Transform, and aspects of their application are fully discussed throughout the thesis, along with ideas and suggestions for future research and development.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Monocular slam for deformable scenarios.

    Get PDF
    El problema de localizar la posición de un sensor en un mapa incierto que se estima simultáneamente se conoce como Localización y Mapeo Simultáneo --SLAM--. Es un problema desafiante comparable al paradigma del huevo y la gallina. Para ubicar el sensor necesitamos conocer el mapa, pero para construir el mapa, necesitamos la posición del sensor. Cuando se utiliza un sensor visual, por ejemplo, una cámara, se denomina Visual SLAM o VSLAM. Los sensores visuales para SLAM se dividen entre los que proporcionan información de profundidad (por ejemplo, cámaras RGB-D o equipos estéreo) y los que no (por ejemplo, cámaras monoculares o cámaras de eventos). En esta tesis hemos centrado nuestra investigación en SLAM con cámaras monoculares.Debido a la falta de percepción de profundidad, el SLAM monocular es intrínsecamente más duro en comparación con el SLAM con sensores de profundidad. Los trabajos estado del arte en VSLAM monocular han asumido normalmente que la escena permanece rígida durante toda la secuencia, lo que es una suposición factible para entornos industriales y urbanos. El supuesto de rigidez aporta las restricciones suficientes al problema y permite reconstruir un mapa fiable tras procesar varias imágenes. En los últimos años, el interés por el SLAM ha llegado a las áreas médicas donde los algoritmos SLAM podrían ayudar a orientar al cirujano o localizar la posición de un robot. Sin embargo, a diferencia de los escenarios industriales o urbanos, en secuencias dentro del cuerpo, todo puede deformarse eventualmente y la suposición de rigidez acaba siendo inválida en la práctica, y por extensión, también los algoritmos de SLAM monoculares. Por lo tanto, nuestro objetivo es ampliar los límites de los algoritmos de SLAM y concebir el primer sistema SLAM monocular capaz de hacer frente a la deformación de la escena.Los sistemas de SLAM actuales calculan la posición de la cámara y la estructura del mapa en dos subprocesos concurrentes: la localización y el mapeo. La localización se encarga de procesar cada imagen para ubicar el sensor de forma continua, en cambio el mapeo se encarga de construir el mapa de la escena. Nosotros hemos adoptado esta estructura y concebimos tanto la localización deformable como el mapeo deformable ahora capaces de recuperar la escena incluso con deformación.Nuestra primera contribución es la localización deformable. La localización deformable utiliza la estructura del mapa para recuperar la pose de la cámara con una única imagen. Simultáneamente, a medida que el mapa se deforma durante la secuencia, también recupera la deformación del mapa para cada fotograma. Hemos propuesto dos familias de localización deformable. En el primer algoritmo de localización deformable, asumimos que todos los puntos están embebidos en una superficie denominada plantilla. Podemos recuperar la deformación de la superficie gracias a un modelo de deformación global que permite estimar la deformación más probable del objeto. Con nuestro segundo algoritmo de localización deformable, demostramos que es posible recuperar la deformación del mapa sin un modelo de deformación global, representando el mapa como surfels individuales. Nuestros resultados experimentales mostraron que, recuperando la deformación del mapa, ambos métodos superan tanto en robustez como en precisión a los métodos rígidos.Nuestra segunda contribución es la concepción del mapeo deformable. Es el back-end del algoritmo SLAM y procesa un lote de imágenes para recuperar la estructura del mapa para todas las imágenes y hacer crecer el mapa ensamblando las observaciones parciales del mismo. Tanto la localización deformable como el mapeo que se ejecutan en paralelo y juntos ensamblan el primer SLAM monocular deformable: \emph{DefSLAM}. Una evaluación ampliada de nuestro método demostró, tanto en secuencias controladas por laboratorio como en secuencias médicas, que nuestro método procesa con éxito secuencias en las que falla el sistema monocular SLAM actual.Nuestra tercera contribución son dos métodos para explotar la información fotométrica en SLAM monocular deformable. Por un lado, SD-DefSLAM que aprovecha el emparejamiento semi-directo para obtener un emparejamiento mucho más fiable de los puntos del mapa en las nuevas imágenes, como consecuencia, se demostró que es más robusto y estable en secuencias médicas. Por otro lado, proponemos un método de Localización Deformable Directa y Dispersa en el que usamos un error fotométrico directo para rastrear la deformación de un mapa modelado como un conjunto de surfels 3D desconectados. Podemos recuperar la deformación de múltiples superficies desconectadas, deformaciones no isométricas o superficies con una topología cambiante.<br /

    Particle filtering on large dimensional state spaces and applications in computer vision

    Get PDF
    Tracking of spatio-temporal events is a fundamental problem in computer vision and signal processing in general. For example, keeping track of motion activities from video sequences for abnormality detection or spotting neuronal activity patterns inside the brain from fMRI data. To that end, our research has two main aspects with equal emphasis - first, development of efficient Bayesian filtering frameworks for solving real-world tracking problems and second, understanding the temporal evolution dynamics of physical systems/phenomenon and build statistical models for them. These models facilitate prior information to the trackers as well as lead to intelligent signal processing for computer vision and image understanding. The first part of the dissertation deals with the key signal processing aspects of tracking and the challenges involved. In simple terms, tracking basically is the problem of estimating the hidden state of a system from noisy observed data(from sensors). As frequently encountered in real-life, due to the non-linear and non-Gaussian nature of the state spaces involved, Particle Filters (PF) give an approximate Bayesian inference under such problem setup. However, quite often we are faced with large dimensional state spaces together with multimodal observation likelihood due to occlusion and clutter. This makes the existing particle filters very inefficient for practical purposes. In order to tackle these issues, we have developed and implemented efficient particle filters on large dimensional state spaces with applications to various visual tracking problems in computer vision. In the second part of the dissertation, we develop dynamical models for motion activities inspired by human visual cognitive ability of characterizing temporal evolution pattern of shapes. We take a landmark shape based approach for the representation and tracking of motion activities. Basically, we have developed statistical models for the shape change of a configuration of ``landmark points (key points of interest) over time and to use these models for automatic landmark extraction and tracking, filtering and change detection from video sequences. In this regard, we demonstrate superior performance of our Non-Stationary Shape Activity(NSSA) model in comparison to other existing works. Also, owing to the large dimensional state space of this problem, we have utilized efficient particle filters(PF) for motion activity tracking. In the third part of the dissertation, we develop a visual tracking algorithm that is able to track in presence of illumination variations in the scene. In order to do that we build and learn a dynamical model for 2D illumination patterns based on Legendre basis functions. Under our problem formulation, we pose the visual tracking task as a large dimensional tracking problem in a joint motion-illumination space and thus use an efficient PF algorithm called PF-MT(PF with Mode Tracker) for tracking. In addition, we also demonstrate the use of change/abnormality detection framework for tracking across drastic illumination changes. Experiments with real-life video sequences demonstrate the usefulness of the algorithm while many other existing approaches fail. The last part of the dissertation explores the upcoming field of compressive sensing and looks into the possibilities of leveraging from particle filtering ideas to do better sequential reconstruction (i.e. tracking) of sparse signals from a small number of random linear measurements. Our preliminary results show several promising aspects to such an approach and it is an interesting direction of future research with many potential computer vision applications

    A collaborative approach to image segmentation and behavior recognition from image sequences

    Get PDF
    Visual behavior recognition is currently a highly active research area. This is due both to the scientific challenge posed by the complexity of the task, and to the growing interest in its applications, such as automated visual surveillance, human-computer interaction, medical diagnosis or video indexing/retrieval. A large number of different approaches have been developed, whose complexity and underlying models depend on the goals of the particular application which is targeted. The general trend followed by these approaches is the separation of the behavior recognition task into two sequential processes. The first one is a feature extraction process, where features which are considered relevant for the recognition task are extracted from the input image sequence. The second one is the actual recognition process, where the extracted features are classified in terms of the pre-defined behavior classes. One problematic issue of such a two-pass procedure is that the recognition process is highly dependent on the feature extraction process, and does not have the possibility to influence it. Consequently, a failure of the feature extraction process may impair correct recognition. The focus of our thesis is on the recognition of single object behavior from monocular image sequences. We propose a general framework where feature extraction and behavior recognition are performed jointly, thereby allowing the two tasks to mutually improve their results through collaboration and sharing of existing knowledge. The intended collaboration is achieved by introducing a probabilistic temporal model based on a Hidden Markov Model (HMM). In our formulation, behavior is decomposed into a sequence of simple actions and each action is associated with a different probability of observing a particular set of object attributes within the image at a given time. Moreover, our model includes a probabilistic formulation of attribute (feature) extraction in terms of image segmentation. Contrary to existing approaches, segmentation is achieved by taking into account the relative probabilities of each action, which are provided by the underlying HMM. In this context, we solve the joint problem of attribute extraction and behavior recognition by developing a variation of the Viterbi decoding algorithm, adapted to our model. Within the algorithm derivation, we translate the probabilistic attribute extraction formulation into a variational segmentation model. The proposed model is defined as a combination of typical image- and contour-dependent energy terms with a term which encapsulates prior information, offered by the collaborating recognition process. This prior information is introduced by means of a competition between multiple prior terms, corresponding to the different action classes which may have generated the current image. As a result of our algorithm, the recognized behavior is represented as a succession of action classes corresponding to the images in the given sequence. Furthermore, we develop an extension of our general framework, that allows us to deal with a common situation encountered in applications. Namely, we treat the case where behavior is specified in terms of a discrete set of behavior types, made up of different successions of actions, which belong to a shared set of action classes. Therefore, the recognition of behavior requires the estimation of the most probable behavior type and of the corresponding most probable succession of action classes which explains the observed image sequence. To this end, we modify our initial model and develop a corresponding Viterbi decoding algorithm. Both our initial framework and its extension are defined in general terms, involving several free parameters which can be chosen so as to obtain suitable implementations for the targeted applications. In this thesis, we demonstrate the viability of the proposed framework by developing particular implementations for two applications. Both applications belong to the field of gesture recognition and concern finger-counting and finger-spelling. For the finger-counting application, we use our original framework, whereas for the finger-spelling application, we use its proposed extension. For both applications, we instantiate the free parameters of the respective frameworks with particular models and quantities. Then, we explain the training of the obtained models from specific training data. Finally, we present the results obtained by testing our trained models on new image sequences. The test results show the robustness of our models in difficult cases, including noisy images, occlusions of the gesturing hand and cluttered background. For the finger-spelling application, a comparison with the traditional sequential approach to image segmentation and behavior recognition illustrates the superiority of our collaborative model

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Recognizing complex faces and gaits via novel probabilistic models

    Get PDF
    In the field of computer vision, developing automated systems to recognize people under unconstrained scenarios is a partially solved problem. In unconstrained sce- narios a number of common variations and complexities such as occlusion, illumi- nation, cluttered background and so on impose vast uncertainty to the recognition process. Among the various biometrics that have been emerging recently, this dissertation focus on two of them namely face and gait recognition. Firstly we address the problem of recognizing faces with major occlusions amidst other variations such as pose, scale, expression and illumination using a novel PRObabilistic Component based Interpretation Model (PROCIM) inspired by key psychophysical principles that are closely related to reasoning under uncertainty. The model basically employs Bayesian Networks to establish, learn, interpret and exploit intrinsic similarity mappings from the face domain. Then, by incorporating e cient inference strategies, robust decisions are made for successfully recognizing faces under uncertainty. PROCIM reports improved recognition rates over recent approaches. Secondly we address the newly upcoming gait recognition problem and show that PROCIM can be easily adapted to the gait domain as well. We scienti cally de ne and formulate sub-gaits and propose a novel modular training scheme to e ciently learn subtle sub-gait characteristics from the gait domain. Our results show that the proposed model is robust to several uncertainties and yields sig- ni cant recognition performance. Apart from PROCIM, nally we show how a simple component based gait reasoning can be coherently modeled using the re- cently prominent Markov Logic Networks (MLNs) by intuitively fusing imaging, logic and graphs. We have discovered that face and gait domains exhibit interesting similarity map- pings between object entities and their components. We have proposed intuitive probabilistic methods to model these mappings to perform recognition under vari- ous uncertainty elements. Extensive experimental validations justi es the robust- ness of the proposed methods over the state-of-the-art techniques.

    Respiratory organ motion in interventional MRI : tracking, guiding and modeling

    Get PDF
    Respiratory organ motion is one of the major challenges in interventional MRI, particularly in interventions with therapeutic ultrasound in the abdominal region. High-intensity focused ultrasound found an application in interventional MRI for noninvasive treatments of different abnormalities. In order to guide surgical and treatment interventions, organ motion imaging and modeling is commonly required before a treatment start. Accurate tracking of organ motion during various interventional MRI procedures is prerequisite for a successful outcome and safe therapy. In this thesis, an attempt has been made to develop approaches using focused ultrasound which could be used in future clinically for the treatment of abdominal organs, such as the liver and the kidney. Two distinct methods have been presented with its ex vivo and in vivo treatment results. In the first method, an MR-based pencil-beam navigator has been used to track organ motion and provide the motion information for acoustic focal point steering, while in the second approach a hybrid imaging using both ultrasound and magnetic resonance imaging was combined for advanced guiding capabilities. Organ motion modeling and four-dimensional imaging of organ motion is increasingly required before the surgical interventions. However, due to the current safety limitations and hardware restrictions, the MR acquisition of a time-resolved sequence of volumetric images is not possible with high temporal and spatial resolution. A novel multislice acquisition scheme that is based on a two-dimensional navigator, instead of a commonly used pencil-beam navigator, was devised to acquire the data slices and the corresponding navigator simultaneously using a CAIPIRINHA parallel imaging method. The acquisition duration for four-dimensional dataset sampling is reduced compared to the existing approaches, while the image contrast and quality are improved as well. Tracking respiratory organ motion is required in interventional procedures and during MR imaging of moving organs. An MR-based navigator is commonly used, however, it is usually associated with image artifacts, such as signal voids. Spectrally selective navigators can come in handy in cases where the imaging organ is surrounding with an adipose tissue, because it can provide an indirect measure of organ motion. A novel spectrally selective navigator based on a crossed-pair navigator has been developed. Experiments show the advantages of the application of this novel navigator for the volumetric imaging of the liver in vivo, where this navigator was used to gate the gradient-recalled echo sequence

    From light rays to 3D models

    Get PDF
    corecore