122 research outputs found

    Online Mutual Foreground Segmentation for Multispectral Stereo Videos

    Full text link
    The segmentation of video sequences into foreground and background regions is a low-level process commonly used in video content analysis and smart surveillance applications. Using a multispectral camera setup can improve this process by providing more diverse data to help identify objects despite adverse imaging conditions. The registration of several data sources is however not trivial if the appearance of objects produced by each sensor differs substantially. This problem is further complicated when parallax effects cannot be ignored when using close-range stereo pairs. In this work, we present a new method to simultaneously tackle multispectral segmentation and stereo registration. Using an iterative procedure, we estimate the labeling result for one problem using the provisional result of the other. Our approach is based on the alternating minimization of two energy functions that are linked through the use of dynamic priors. We rely on the integration of shape and appearance cues to find proper multispectral correspondences, and to properly segment objects in low contrast regions. We also formulate our model as a frame processing pipeline using higher order terms to improve the temporal coherence of our results. Our method is evaluated under different configurations on multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018

    Video Registration for Multimodal Surveillance Systems

    Get PDF
    RÉSUMÉ Au cours de la dernière décennie, la conception et le déploiement de systèmes de surveillance par caméras thermiques et visibles pour l'analyse des activités humaines a retenu l'attention de la communauté de la vision par ordinateur. Les applications de l'imagerie thermique-visible pour l'analyse des activités humaines couvrent différents domaines, notamment la médecine, la sécurité à bord d'un véhicule et la sécurité des personnes. La motivation derrière un tel système est l'amélioration de la qualité des données dans le but ultime d'améliorer la performance du système de surveillance. Une difficulté fondamentale associée à un système d'imagerie thermique-visible est la mise en registre précise de caractéristiques et d'informations correspondantes à partir d'images avec des différences significatives dans les propriétés des signaux. Dans un cas, on capte des informations de couleur (lumière réfléchie) et dans l'autre cas, on capte la signature thermique (énergie émise). Ce problème est appelé mise en registre d'images et de séquences vidéo. La vidéosurveillance est l'un des domaines d'application le plus étendu de l'imagerie multi-spectrale. La vidéosurveillance automatique dans un environnement réel, que ce soit à l'intérieur ou à l'extérieur, est difficile en raison d'un nombre élevé de facteurs environnementaux tels que les variations d'éclairage, le vent, le brouillard, et les ombres. L'utilisation conjointe de différentes modalités permet d'augmenter la fiabilité des données d'entrée, et de révéler certaines informations sur la scène qui ne sont pas perceptibles par un système d'imagerie unimodal. Les premiers systèmes multimodaux de vidéosurveillance ont été conçus principalement pour des applications militaires. Mais de nos jours, en raison de la réduction du prix des caméras thermiques, ce sujet de recherche s'étend à des applications civiles ayant une variété d'objectifs. Les approches pour la mise en registre d'images pour un système multimodal de vidéosurveillance automatique sont divisées en deux catégories fondées sur la dimension de la scène: les approches qui sont appropriées pour des grandes scènes où les objets sont lointains, et les approches qui conviennent à de petites scènes où les objets sont près des caméras. Dans la littérature, ce sujet de recherche n'est pas bien documenté, en particulier pour le cas de petites scènes avec objets proches. Notre recherche est axée sur la conception de nouvelles solutions de mise en registre pour les deux catégories de scènes dans lesquels il y a plusieurs humains. Les solutions proposées sont incluses dans les quatre articles qui composent cette thèse. Nos méthodes de mise en registre sont des prétraitements pour d'autres tâches d'analyse vidéo telles que le suivi, la localisation de l'humain, l'analyse de comportements, et la catégorisation d'objets. Pour les scènes avec des objets lointains, nous proposons un système itératif qui fait de façon simultanée la mise en registre thermique-visible, la fusion des données et le suivi des personnes. Notre méthode de mise en registre est basée sur une mise en correspondance de trajectoires (en utilisant RANSAC) à partir desquelles on estime une matrice de transformation affine pour transformer globalement des objets d'avant-plan d'une image sur l'autre image. Notre système proposé de vidéosurveillance multimodale est basé sur un nouveau mécanisme de rétroaction entre la mise en registre et le module de suivi, ce qui augmente les performances des deux modules de manière itérative au fil du temps. Nos méthodes sont conçues pour des applications en ligne et aucune calibration des caméras ou de configurations particulières ne sont requises. Pour les petites scènes avec des objets proches, nous introduisons le descripteur Local Self-Similarity (LSS), comme une mesure de similarité viable pour mettre en correspondance les régions du corps humain dans des images thermiques et visibles. Nous avons également démontré théoriquement et quantitativement que LSS, comme mesure de similarité thermique-visible, est plus robuste aux différences entre les textures des régions correspondantes que l'information mutuelle (IM), qui est la mesure de similarité classique pour les applications multimodales. D'autres descripteurs viables, y compris Histogram Of Gradient (HOG), Scale Invariant Feature Transform (SIFT), et Binary Robust Independent Elementary Feature (BRIEF) sont également surclassés par LSS. En outre, nous proposons une approche de mise en registre utilisant LSS et un mécanisme de votes pour obtenir une carte de disparité stéréo dense pour chaque région d'avant-plan dans l'image. La carte de disparité qui en résulte peut alors être utilisée pour aligner l'image de référence sur la seconde image. Nous démontrons que notre méthode surpasse les méthodes dans l'état de l'art, notamment les méthodes basées sur l'information mutuelle. Nos expériences ont été réalisées en utilisant des scénarios réalistes de surveillance d'humains dans une scène de petite taille. En raison des lacunes des approches locales de correspondance stéréo pour l'estimation de disparités précises dans des régions de discontinuité de profondeur, nous proposons une méthode de correspondance stéréo basée sur une approche d'optimisation globale. Nous introduisons un modèle stéréo approprié pour la mise en registre d'images thermique-visible en utilisant une méthode de minimisation de l'énergie en conjonction avec la méthode Belief Propagation (BP) comme méthode pour optimiser l'affectation des disparités par une fonction d'énergie. Dans cette méthode, nous avons intégré les informations de couleur et de mouvement comme contraintes douces pour améliorer la précision d'affectation des disparités dans les cas de discontinuités de profondeur. Bien que les approches de correspondance globale soient plus gourmandes au niveau des ressources de calculs par rapport aux approches de correspondance locale basée sur la stratégie Winner Take All (WTA), l'algorithme efficace BP et la programmation parallèle (OpenMP) en C++ que nous avons utilisés dans notre implémentation, permettent d'accélérer le temps de traitement de manière significative et de rendre nos méthodes viables pour les applications de vidéosurveillance. Nos méthodes sont programmées en C++ et utilisent la bibliothèque OpenCV. Nos méthodes sont conçues pour être facilement intégrées comme prétraitement pour toute application d'analyse vidéo. En d'autres termes, les données d'entrée de nos méthodes pourraient être un flux vidéo en ligne, et pour une analyse plus approfondie, un nouveau module pourrait être ajouté en aval à notre schéma algorithmique. Cette analyse plus approfondie pourrait être le suivi d'objets, la localisation d'êtres humains, et l'analyse de trajectoires pour les applications de surveillance multimodales de grandes scène. Aussi, Il pourrait être l'analyse de comportements, la catégorisation d'objets, et le suivi pour les applications sur des scènes de tailles réduites.---------ABSTRACT Recently, the design and deployment of thermal-visible surveillance systems for human analysis attracted a lot of attention in the computer vision community. Thermal-visible imagery applications for human analysis span different domains including medical, in-vehicle safety system, and surveillance. The motivation of applying such a system is improving the quality of data with the ultimate goal of improving the performance of targeted surveillance system. A fundamental issue associated with a thermal-visible imaging system is the accurate registration of corresponding features and information from images with high differences in imaging characteristics, where one reflects the color information (reflected energy) and another one reflects thermal signature (emitted energy). This problem is named Image/video registration. Video surveillance is one of the most extensive application domains of multispectral imaging. Automatic video surveillance in a realistic environment, either indoor or outdoor, is difficult due to the unlimited number of environmental factors such as illumination variations, wind, fog, and shadows. In a multimodal surveillance system, the joint use of different modalities increases the reliability of input data and reveals some information of the scene that might be missed using a unimodal imaging system. The early multimodal video surveillance systems were designed mainly for military applications. But nowadays, because of the reduction in the price of thermal cameras, this subject of research is extending to civilian applications and has attracted more interests for a variety of the human monitoring objectives. Image registration approaches for an automatic multimodal video surveillance system are divided into two general approaches based on the range of captured scene: the approaches that are appropriate for long-range scenes, and the approaches that are suitable for close-range scenes. In the literature, this subject of research is not well documented, especially for close-range surveillance application domains. Our research is focused on novel image registration solutions for both close-range and long-range scenes featuring multiple humans. The proposed solutions are presented in the four articles included in this thesis. Our registration methods are applicable for further video analysis such as tracking, human localization, behavioral pattern analysis, and object categorization. For far-range video surveillance, we propose an iterative system that consists of simultaneous thermal-visible video registration, sensor fusion, and people tracking. Our video registration is based on a RANSAC object trajectory matching, which estimates an affine transformation matrix to globally transform foreground objects of one image on another one. Our proposed multimodal surveillance system is based on a novel feedback scheme between registration and tracking modules that augments the performance of both modules iteratively over time. Our methods are designed for online applications and no camera calibration or special setup is required. For close-range video surveillance applications, we introduce Local Self-Similarity (LSS) as a viable similarity measure for matching corresponding human body regions of thermal and visible images. We also demonstrate theoretically and quantitatively that LSS, as a thermal-visible similarity measure, is more robust to differences between corresponding regions' textures than the Mutual Information (MI), which is the classic multimodal similarity measure. Other viable local image descriptors including Histogram Of Gradient (HOG), Scale Invariant Feature Transform (SIFT), and Binary Robust Independent Elementary Feature (BRIEF) are also outperformed by LSS. Moreover, we propose a LSS-based dense local stereo correspondence algorithm based on a voting approach, which estimates a dense disparity map for each foreground region in the image. The resulting disparity map can then be used to align the reference image on the second image. We demonstrate that our proposed LSS-based local registration method outperforms similar state-of-the-art MI-based local registration methods in the literature. Our experiments were carried out using realistic human monitoring scenarios in a close-range scene. Due to the shortcomings of local stereo correspondence approaches for estimating accurate disparities in depth discontinuity regions, we propose a novel stereo correspondence method based on a global optimization approach. We introduce a stereo model appropriate for thermal-visible image registration using an energy minimization framework and Belief Propagation (BP) as a method to optimize the disparity assignment via an energy function. In this method, we integrated color and motion visual cues as a soft constraint into an energy function to improve disparity assignment accuracy in depth discontinuities. Although global correspondence approaches are computationally more expensive compared to Winner Take All (WTA) local correspondence approaches, the efficient BP algorithm and parallel processing programming (openMP) in C++ that we used in our implementation, speed up the processing time significantly and make our methods viable for video surveillance applications. Our methods are implemented in C++ using OpenCV library and object-oriented programming. Our methods are designed to be integrated easily for further video analysis. In other words, the input data of our methods could come from two synchronized online video streams. For further analysis a new module could be added in our frame-by-frame algorithmic diagram. Further analysis might be object tracking, human localization, and trajectory pattern analysis for multimodal long-range monitoring applications, and behavior pattern analysis, object categorization, and tracking for close-range applications

    Selective Subtraction: An Extension of Background Subtraction

    Get PDF
    Background subtraction or scene modeling techniques model the background of the scene using the stationarity property and classify the scene into two classes of foreground and background. In doing so, most moving objects become foreground indiscriminately, except for perhaps some waving tree leaves, water ripples, or a water fountain, which are typically learned as part of the background using a large training set of video data. Traditional techniques exhibit a number of limitations including inability to model partial background or subtract partial foreground, inflexibility of the model being used, need for large training data and computational inefficiency. In this thesis, we present our work to address each of these limitations and propose algorithms in two major areas of research within background subtraction namely single-view and multi-view based techniques. We first propose the use of both spatial and temporal properties to model a dynamic scene and show how Mapping Convergence framework within Support Vector Mapping Convergence (SVMC) can be used to minimize training data. We also introduce a novel concept of background as the objects other than the foreground, which may include moving objects in the scene that cannot be learned from a training set because they occur only irregularly and sporadically, e.g. a walking person. We propose a selective subtraction method as an alternative to standard background subtraction, and show that a reference plane in a scene viewed by two cameras can be used as the decision boundary between foreground and background. In our definition, the foreground may actually occur behind a moving object. Our novel use of projective depth as a decision boundary allows us to extend the traditional definition of background subtraction and propose a much more powerful framework. Furthermore, we show that the reference plane can be selected in a very flexible manner, using for example the actual moving objects in the scene, if needed. We present diverse set of examples to show that: (i) the technique performs better than standard background subtraction techniques without the need for training, camera calibration, disparity map estimation, or special camera configurations; (ii) it is potentially more powerful than standard methods because of its flexibility of making it possible to select in real-time what to filter out as background, regardless of whether the object is moving or not, or whether it is a rare event or a frequent one; (iii) the technique can be used for a variety of situations including when images are captured using stationary cameras or hand-held cameras and for both indoor and outdoor scenes. We provide extensive results to show the effectiveness of the proposed framework in a variety of very challenging environments

    Independent component analysis (ICA) applied to ultrasound image processing and tissue characterization

    Get PDF
    As a complicated ubiquitous phenomenon encountered in ultrasound imaging, speckle can be treated as either annoying noise that needs to be reduced or the source from which diagnostic information can be extracted to reveal the underlying properties of tissue. In this study, the application of Independent Component Analysis (ICA), a relatively new statistical signal processing tool appeared in recent years, to both the speckle texture analysis and despeckling problems of B-mode ultrasound images was investigated. It is believed that higher order statistics may provide extra information about the speckle texture beyond the information provided by first and second order statistics only. However, the higher order statistics of speckle texture is still not clearly understood and very difficult to model analytically. Any direct dealing with high order statistics is computationally forbidding. On the one hand, many conventional ultrasound speckle texture analysis algorithms use only first or second order statistics. On the other hand, many multichannel filtering approaches use pre-defined analytical filters which are not adaptive to the data. In this study, an ICA-based multichannel filtering texture analysis algorithm, which considers both higher order statistics and data adaptation, was proposed and tested on the numerically simulated homogeneous speckle textures. The ICA filters were learned directly from the training images. Histogram regularization was conducted to make the speckle images quasi-stationary in the wide sense so as to be adaptive to an ICA algorithm. Both Principal Component Analysis (PCA) and a greedy algorithm were used to reduce the dimension of feature space. Finally, Support Vector Machines (SVM) with Radial Basis Function (RBF) kernel were chosen as the classifier for achieving best classification accuracy. Several representative conventional methods, including both low and high order statistics based methods, and both filtering and non-filtering methods, have been chosen for comparison study. The numerical experiments have shown that the proposed ICA-based algorithm in many cases outperforms other algorithms for comparison. Two-component texture segmentation experiments were conducted and the proposed algorithm showed strong capability of segmenting two visually very similar yet different texture regions with rather fuzzy boundaries and almost the same mean and variance. Through simulating speckle with first order statistics approaching gradually to the Rayleigh model from different non-Rayleigh models, the experiments to some extent reveal how the behavior of higher order statistics changes with the underlying property of tissues. It has been demonstrated that when the speckle approaches the Rayleigh model, both the second and higher order statistics lose the texture differentiation capability. However, when the speckles tend to some non-Rayleigh models, methods based on higher order statistics show strong advantage over those solely based on first or second order statistics. The proposed algorithm may potentially find clinical application in the early detection of soft tissue disease, and also be helpful for better understanding ultrasound speckle phenomenon in the perspective of higher order statistics. For the despeckling problem, an algorithm was proposed which adapted the ICA Sparse Code Shrinkage (ICA-SCS) method for the ultrasound B-mode image despeckling problem by applying an appropriate preprocessing step proposed by other researchers. The preprocessing step makes the speckle noise much closer to the real white Gaussian noise (WGN) hence more amenable to a denoising algorithm such as ICS-SCS that has been strictly designed for additive WGN. A discussion is given on how to obtain the noise-free training image samples in various ways. The experimental results have shown that the proposed method outperforms several classical methods chosen for comparison, including first or second order statistics based methods (such as Wiener filter) and multichannel filtering methods (such as wavelet shrinkage), in the capability of both speckle reduction and edge preservation

    Generalised Pose Estimation Using Depth

    Get PDF
    Estimating the pose of an object, be it articulated, deformable or rigid, is an important task, with applications ranging from Human-Computer Interaction to environmental understanding. The idea of a general pose estimation framework, capable of being rapidly retrained to suit a variety of tasks, is appealing. In this paper a solution isproposed requiring only a set of labelled training images in order to be applied to many pose estimation tasks. This is achieved bytreating pose estimation as a classification problem, with particle filtering used to provide non-discretised estimates. Depth information extracted from a calibrated stereo sequence, is used for background suppression and object scale estimation. The appearance and shape channels are then transformed to Local Binary Pattern histograms, and pose classification is performed via a randomised decision forest. To demonstrate flexibility, the approach is applied to two different situations, articulated hand pose and rigid head orientation, achieving 97% and 84% accurate estimation rates, respectively

    Generative Interpretation of Medical Images

    Get PDF

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Segmentation of diffusion weighted MRI using the level set framework

    Get PDF
    Medical imaging is a rapidly growing field in which diffusion imaging is a recently developed modality. This novel imaging contrast permits in-vivo measurement of the diffusion of water molecules. This is particularly interesting in brain imaging where the diffusion reveals an amazing insight into the neuronal organization and cerebral cytoarchitecture. Diffusion images contain from six up to hundreds of values in each voxel and are represented as tensor fields (Diffusion Tensor Imaging (DTI)) or as fields of functions (High Angular Resolution Diffusion (HARD) imaging). To fully extract the large amount of data contained within these images new processing and analysis tools are needed. The aim of this thesis is the development of such tools. The method we have been mainly focusing on for this purpose is the level set method. The level set method is a numerical and theoretical tool for propagating interfaces. In image processing it is used for propagating curves in 2D or surfaces in 3D for delineation of objects or for regularization purposes. In this thesis we have explored some of the numerous aspects of the level set frame work to see how the diffusion properties can be used for segmentation. For segmentation of tensor fields we have considered similarity measures for comparison of tensors. From these similarity measures several applications of the level set method have been developed for the segmentation of different structures. Different measures of similarity have been used dependent on the application. When segmenting white matter regions in DTI, the similarity measure emphasizes anisotropic regions. The segmentation algorithm itself has a very local dependence since white matter, in general fiber tracts, experiences different diffusion in different parts of the structure. The presented results show segmentations of the major fiber tracts in the brain. Other structures, such as the deep cerebral nuclei, that are mainly composed of gray matter, have more homogenous diffusion properties than white matter structures. Therefore, in these structures we maximize the internal coherence within the entire structure by using a region based approach to the segmentation problem. Segmentations of the thalamus and its nuclei as well as on tensor fields from fluid mechanics are presented. For High Angular Resolution Diffusion (HARD) images, two methods for fiber tract segmentation are presented based on different types of coherence. The coherence is either measured as the similarity between fibers obtained from a tractography algorithm, or the similarity of scalar values in a five-dimensional non-Euclidean space. The similarity between two fibers is determined by a counting strategy and is equal to the number of voxels they have in common. A spectral clustering algorithm is then used for grouping fibers with a high inter-resemblance. When segmenting white matter with the level set method, we propose to expand the space we are working in from a three-dimensional space of Orientation Distribution Functions (ODF) to a five-dimensional space of position and orientation. By a careful definition of this space and an adaptation of the level set to five dimensions the fibers tracts can be segmented as separated structures. We show some preliminary results from segmentations in this 5D space. The approaches proposed in this thesis permit a consideration of the fiber tracts and gray matter structures as an entity, allowing quantitative measures of the diffusion without losing information by simplifying the images to scalar representations
    • …
    corecore