426 research outputs found

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Machine learning for efficient recognition of anatomical structures and abnormalities in biomedical images

    Get PDF
    Three studies have been carried out to investigate new approaches to efficient image segmentation and anomaly detection. The first study investigates the use of deep learning in patch based segmentation. Current approaches to patch based segmentation use low level features such as the sum of squared differences between patches. We argue that better segmentation can be achieved by harnessing the power of deep neural networks. Currently these networks make extensive use of convolutional layers. However, we argue that in the context of patch based segmentation, convolutional layers have little advantage over the canonical artificial neural network architecture. This is because a patch is small, and does not need decomposition and thus will not benefit from convolution. Instead, we make use of the canonical architecture in which neurons only compute dot products, but also incorporate modern techniques of deep learning. The resulting classifier is much faster and less memory-hungry than convolution based networks. In a test application to the segmentation of hippocampus in human brain MR images, we significantly outperformed prior art with a median Dice score up to 90.98% at a near real-time speed (<1s). The second study is an investigation into mouse phenotyping, and develops a high-throughput framework to detect morphological abnormality in mouse embryo micro-CT images. Existing work in this line is centred on, either the detection of phenotype-specific features or comparative analytics. The former approach lacks generality and the latter can often fail, for example, when the abnormality is not associated with severe volume variation. Both these approaches often require image segmentation as a pre-requisite, which is very challenging when applied to embryo phenotyping. A new approach to this problem in which non-rigid registration is combined with robust principal component analysis (RPCA), is proposed. The new framework is able to efficiently perform abnormality detection in a batch of images. It is sensitive to both volumetric and non-volumetric variations, and does not require image segmentation. In a validation study, it successfully distinguished the abnormal VSD and polydactyly phenotypes from the normal, respectively, at 85.19% and 88.89% specificities, with 100% sensitivity in both cases. The third study investigates the RPCA technique in more depth. RPCA is an extension of PCA that tolerates certain levels of data distortion during feature extraction, and is able to decompose images into regular and singular components. It has previously been applied to many computer vision problems (e.g. video surveillance), attaining excellent performance. However these applications commonly rest on a critical condition: in the majority of images being processed, there is a background with very little variation. By contrast in biomedical imaging there is significant natural variation across different images, resulting from inter-subject variability and physiological movements. Non-rigid registration can go some way towards reducing this variance, but cannot eliminate it entirely. To address this problem we propose a modified framework (RPCA-P) that is able to incorporate natural variation priors and adjust outlier tolerance locally, so that voxels associated with structures of higher variability are compensated with a higher tolerance in regularity estimation. An experimental study was applied to the same mouse embryo micro-CT data, and notably improved the detection specificity to 94.12% for the VSD and 90.97% for the polydactyly, while maintaining the sensitivity at 100%.Open Acces

    Models and Methods for Automated Background Density Estimation in Hyperspectral Anomaly Detection

    Get PDF
    Detecting targets with unknown spectral signatures in hyperspectral imagery has been proven to be a topic of great interest in several applications. Because no knowledge about the targets of interest is assumed, this task is performed by searching the image for anomalous pixels, i.e. those pixels deviating from a statistical model of the background. According to the hyperspectral literature, there are two main approaches to Anomaly Detection (AD) thus leading to the definition of different ways for background modeling: global and local. Global AD algorithms are designed to locate small rare objects that are anomalous with respect to the global background, identified by a large portion of the image. On the other hand, in local AD strategies, pixels with significantly different spectral features from a local neighborhood just surrounding the observed pixel are detected as anomalies. In this thesis work, a new scheme is proposed for detecting both global and local anomalies. Specifically, a simplified Likelihood Ratio Test (LRT) decision strategy is derived that involves thresholding the background log-likelihood and, thus, only needs the specification of the background Probability Density Function (PDF). Within this framework, the use of parametric, semi-parametric (in particular finite mixtures), and non-parametric models is investigated for the background PDF estimation. Although such approaches are well known and have been widely employed in multivariate data analysis, they have been seldom applied to estimate the hyperspectral background PDF, mostly due to the difficulty of reliably learning the model parameters without the need of operator intervention, which is highly desirable in practical AD tasks. In fact, this work represents the first attempt to jointly examine such methods in order to asses and discuss the most critical issues related to their employment for PDF estimation of hyperspectral background with specific reference to the detection of anomalous objects in a scene. Specifically, semi- and non-parametric estimators have been successfully employed to estimate the image background PDF with the aim of detecting global anomalies in a scene by means of the use of ad hoc learning procedures. In particular, strategies developed within a Bayesian framework have been considered for automatically estimating the parameters of mixture models and one of the most well-known non-parametric techniques, i.e. the fixed kernel density estimator (FKDE). In this latter, the performance and the modeling ability depend on scale parameters, called bandwidths. It has been shown that the use of bandwidths that are fixed across the entire feature space, as done in the FKDE, is not effective when the sample data exhibit different local peculiarities across the entire data domain, which generally occurs in practical applications. Therefore, some possibilities are investigated to improve the image background PDF estimation of FKDE by allowing the bandwidths to vary over the estimation domain, thus adapting the amount of smoothing to the local density of the data so as to more reliably and accurately follow the background data structure of hyperspectral images of a scene. The use of such variable bandwidth kernel density estimators (VKDE) is also proposed for estimating the background PDF within the considered AD scheme for detecting local anomalies. Such a choice is done with the aim to cope with the problem of non-Gaussian background for improving classical local AD algorithms involving parametric and non-parametric background models. The locally data-adaptive non-parametric model has been chosen since it encompasses the potential, typical of non-parametric PDF estimators, in modeling data regardless of specific distributional assumption together with the benefits deriving from the employment of bandwidths that vary across the data domain. The ability of the proposed AD scheme resulting from the application of different background PDF models and learning methods is experimentally evaluated by employing real hyperspectral images containing objects that are anomalous with respect to the background

    Enhanced independent vector analysis for audio separation in a room environment

    Get PDF
    Independent vector analysis (IVA) is studied as a frequency domain blind source separation method, which can theoretically avoid the permutation problem by retaining the dependency between different frequency bins of the same source vector while removing the dependency between different source vectors. This thesis focuses upon improving the performance of independent vector analysis when it is used to solve the audio separation problem in a room environment. A specific stability problem of IVA, i.e. the block permutation problem, is identified and analyzed. Then a robust IVA method is proposed to solve this problem by exploiting the phase continuity of the unmixing matrix. Moreover, an auxiliary function based IVA algorithm with an overlapped chain type source prior is proposed as well to mitigate this problem. Then an informed IVA scheme is proposed which combines the geometric information of the sources from video to solve the problem by providing an intelligent initialization for optimal convergence. The proposed informed IVA algorithm can also achieve a faster convergence in terms of iteration numbers and better separation performance. A pitch based evaluation method is defined to judge the separation performance objectively when the information describing the mixing matrix and sources is missing. In order to improve the separation performance of IVA, an appropriate multivariate source prior is needed to better preserve the dependency structure within the source vectors. A particular multivariate generalized Gaussian distribution is adopted as the source prior. The nonlinear score function derived from this proposed source prior contains the fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure compared with the original IVA algorithm and thereby improves the separation performance. Copula theory is a central tool to model the nonlinear dependency structure. The t copula is proposed to describe the dependency structure within the frequency domain speech signals due to its tail dependency property, which means if one variable has an extreme value, other variables are expected to have extreme values. A multivariate student's t distribution constructed by using a t copula with the univariate student's t marginal distribution is proposed as the source prior. Then the IVA algorithm with the proposed source prior is derived. The proposed algorithms are tested with real speech signals in different reverberant room environments both using modelled room impulse response and real room recordings. State-of-the-art criteria are used to evaluate the separation performance, and the experimental results confirm the advantage of the proposed algorithms

    Classification non supervisée d’images 3D et extension à la segmentation exploitant les informations de couleur et de profondeur

    Get PDF
    Access to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysisL'accès aux séquences d'images 3D s'est aujourd'hui démocratisé, grâce aux récentes avancées dans le développement des capteurs de profondeur ainsi que des méthodes permettant de manipuler des informations 3D à partir d'images 2D. De ce fait, il y a une attente importante de la part de la communauté scientifique de la vision par ordinateur dans l'intégration de l'information 3D. En effet, des travaux de recherche ont montré que les performances de certaines applications pouvaient être améliorées en intégrant l'information 3D. Cependant, il reste des problèmes à résoudre pour l'analyse et la segmentation de scènes intérieures comme (a) comment l'information 3D peut-elle être exploitée au mieux ? et (b) quelle est la meilleure manière de prendre en compte de manière conjointe les informations couleur et 3D ? Nous abordons ces deux questions dans cette thèse et nous proposons de nouvelles méthodes non supervisées pour la classification d'images 3D et la segmentation prenant en compte de manière conjointe les informations de couleur et de profondeur. A cet effet, nous formulons l'hypothèse que les normales aux surfaces dans les images 3D sont des éléments à prendre en compte pour leur analyse, et leurs distributions sont modélisables à l'aide de lois de mélange. Nous utilisons la méthode dite « Bregman Soft Clustering » afin d'être efficace d'un point de vue calculatoire. De plus, nous étudions plusieurs lois de probabilités permettant de modéliser les distributions de directions : la loi de von Mises-Fisher et la loi de Watson. Les méthodes de classification « basées modèles » proposées sont ensuite validées en utilisant des données de synthèse puis nous montrons leur intérêt pour l'analyse des images 3D (ou de profondeur). Une nouvelle méthode de segmentation d'images couleur et profondeur, appelées aussi images RGB-D, exploitant conjointement la couleur, la position 3D, et la normale locale est alors développée par extension des précédentes méthodes et en introduisant une méthode statistique de fusion de régions « planes » à l'aide d'un graphe. Les résultats montrent que la méthode proposée donne des résultats au moins comparables aux méthodes de l'état de l'art tout en demandant moins de temps de calcul. De plus, elle ouvre des perspectives nouvelles pour la fusion non supervisée des informations de couleur et de géométrie. Nous sommes convaincus que les méthodes proposées dans cette thèse pourront être utilisées pour la classification d'autres types de données comme la parole, les données d'expression en génétique, etc. Elles devraient aussi permettre la réalisation de tâches complexes comme l'analyse conjointe de données contenant des images et de la parol

    New Techniques and Optimizations of Short Echo-time 1H MRI with Applications in Murine Lung

    Get PDF
    Although x-ray computed tomography (CT) is a gold standard for pulmonary imaging, it has high ionizing radiation, which puts patients at greater risk of cancer, particularly in a longitudinal study with cumulative doses. Magnetic resonance imaging (MRI) doesn\u27t involve exposure to ionizing radiation and is especially useful for visualizing soft tissues and organs such as ligaments, cartilage, brain, and heart. Many efforts have been made to apply MRI to study lung function and structure of both humans and animals. However, lung is a unique organ and is very different from other solid organs like the heart and brain due to its complex air-tissue interleaved structure. The magnetic susceptibility differences at the air-tissue interfaces result in very short T2* (~ 1 ms) of lung parenchyma, which is even shorter in small-animal MRI (often at higher field) than in human MRI. Both low proton density and short T2* of lung parenchyma are challenges for pulmonary imaging via MRI because they lead to low signal-to-noise ratio (SNR) in images with traditional Cartesian methods that necessitate longer echo times (≥ 1 ms). This dissertation reports the work of optimizing pulmonary MRI techniques by minimizing the negative effects of low proton density and short T2* of murine lung parenchyma, and the application of these techniques to imaging murine lung. Specifically, echo time (TE) in the Cartesian sequence is minimized, by simultaneous slice select rephasing, phase encoding and read dephasing gradients, in addition to partial Fourier imaging, to reduce signal loss due to T2* relaxation. Radial imaging techniques, often called ultra-short echo-time MRI or UTE MRI, with much shorter time between excitation and data acquisition, were also developed and optimized for pulmonary imaging. Offline reconstruction for UTE data was developed on a Linux system to regrid the non-Cartesian (radial in this dissertation) k-space data for fast Fourier transform. Slabselected UTE was created to fit the field-of-view (FOV) to the imaged lung without fold-in aliasing, which increases TE slightly compared to non-slab-selected UTE. To further reduce TE as well as fit the FOV to the lung without aliasing, UTE with ellipsoidal k-space coverage was developed, which increases resolution and decreases acquisition time. Taking into account T2* effects, point spread function (PSF) analysis was performed to determine the optimal acquisition time for maximal single-voxel SNR. Retrospective self-gating UTE was developed to avoid the use of a ventilator (which may cause lung injury) and to avoid possible prospective gating errors caused by abrupt body motion. Cartesian gradient-recalled-echo imaging (GRE) was first applied to monitor acute cellular rejection in lung transplantation. By repeated imaging in the same animals, both parenchymal signal and lung compliance were measured and were able to detect rejection in the allograft lung. GRE was also used to monitor chronic cellular rejection in a transgenic mouse model after lung transplantation. In addition to parenchymal signal and lung compliance, the percentage of highdensity lung parenchyma was defined and measured to detect chronic rejection. This represents one of the first times quantitative pulmonary MRI has been performed. For 3D radial UTE MRI, 2D golden means (1) were used to determine the direction of radial spokes in k-space, resulting in pseudo-random angular sampling of spherical k-space coverage. Ellipsoidal k-space coverage was generated by expanding spherical coverage to create an ellipsoid in k-space. UTE MRI with ellipsoidal k-space coverage was performed to image healthy mice and phantoms, showing reduced FOV and enhanced in-plane resolution compared to regular UTE. With this modified UTE, T2* of lung parenchyma was measured by an interleaved multi-TE strategy, and T1 of lung parenchyma was measured by a limited flip angle method (2). Retrospective self-gating UTE with ellipsoidal k-space coverage was utilized to monitor the progression of pulmonary fibrosis in a transforming growth factor (TGF)-α transgenic mouse model and compared with histology and pulmonary mechanics. Lung fibrosis progression was not only visualized by MRI images, but also quantified and tracked by the MRIderived lung function parameters like mean lung parenchyma signal, high-density lung volume percentage, and tidal volume. MRI-derived lung function parameters were strongly correlated with the findings of pulmonary mechanics and histology in measuring fibrotic burden. This dissertation demonstrates new techniques and optimizations in GRE and UTE MRI that are employed to minimize TE and image murine lungs to assess lung function and structure and monitor the time course of lung diseases. Importantly, the ability to longitudinally image individual animals by these MRI techniques minimizes the number of animals required in preclinical studies and increases the statistical power of future experiments as each animal can serve at its own control

    Enhancing the Potential of the Conventional Gaussian Mixture Model for Segmentation: from Images to Videos

    Get PDF
    Segmentation in images and videos has continuously played an important role in image processing, pattern recognition and machine vision. Despite having been studied for over three decades, the problem of segmentation remains challenging yet appealing due to its ill-posed nature. Maintaining spatial coherence, particularly at object boundaries, remains difficult for image segmentation. Extending to videos, maintaining spatial and temporal coherence, even partially, proves computationally burdensome for recent methods. Finally, connecting these two, foreground segmentation, also known as background suppression, suffers from noisy or dynamic backgrounds, slow foregrounds and illumination variations, to name a few. This dissertation focuses more on probabilistic model based segmentation, primarily due to its applicability in images as well as videos, its past success and mainly because it can be enhanced by incorporating spatial and temporal cues. The first part of the dissertation focuses on enhancing conventional GMM for image segmentation using Bilateral filter due to its power of spatial smoothing while preserving object boundaries. Quantitative and qualitative evaluations are done to show the improvements over a number of recent approaches. The later part of the dissertation concentrates on enhancing GMM towards foreground segmentation as a connection between image and video segmentation. First, we propose an efficient way to include multiresolution features in GMM. This novel procedure implicitly incorporates spatial information to improve foreground segmentation by suppressing noisy backgrounds. The procedure is shown with Wavelets, and gradually extended to propose a generic framework to include other multiresolution decompositions. Second, we propose a more accurate foreground segmentation method by enhancing GMM with the use of Adaptive Support Weights and Histogram of Gradients. Extensive analyses, quantitative and qualitative experiments are presented to demonstrate their performances as comparable to other state-of-the-art methods. The final part of the dissertation proposes the novel application of GMM towards spatio-temporal video segmentation connecting spatial segmentation for images and temporal segmentation to extract foreground. The proposed approach has a simple architecture and requires a low amount of memory for processing. The analysis section demonstrates the architectural efficiency over other methods while quantitative and qualitative experiments are carried out to show the competitive performance of the proposed method
    • …
    corecore