115 research outputs found

    State of the Art in Face Recognition

    Get PDF
    Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

    Distribution-Dissimilarities in Machine Learning

    Get PDF
    Any binary classifier (or score-function) can be used to define a dissimilarity between two distributions. Many well-known distribution-dissimilarities are actually classifier-based: total variation, KL- or JS-divergence, Hellinger distance, etc. And many recent popular generative modeling algorithms compute or approximate these distribution-dissimilarities by explicitly training a classifier: e.g. generative adversarial networks (GAN) and their variants. This thesis introduces and studies such classifier-based distribution-dissimilarities. After a general introduction, the first part analyzes the influence of the classifiers' capacity on the dissimilarity's strength for the special case of maximum mean discrepancies (MMD) and provides applications. The second part studies applications of classifier-based distribution-dissimilarities in the context of generative modeling and presents two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and final part focuses on adversarial examples, i.e. targeted but imperceptible input-perturbations that lead to drastically different predictions of an artificial classifier. It shows that adversarial vulnerability of neural network based classifiers typically increases with the input-dimension, independently of the network topology

    Similarity search applications in medical images

    Get PDF

    Computer-Aided Assessment of Tuberculosis with Radiological Imaging: From rule-based methods to Deep Learning

    Get PDF
    Mención Internacional en el título de doctorTuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb.) that produces pulmonary damage due to its airborne nature. This fact facilitates the disease fast-spreading, which, according to the World Health Organization (WHO), in 2021 caused 1.2 million deaths and 9.9 million new cases. Traditionally, TB has been considered a binary disease (latent/active) due to the limited specificity of the traditional diagnostic tests. Such a simple model causes difficulties in the longitudinal assessment of pulmonary affectation needed for the development of novel drugs and to control the spread of the disease. Fortunately, X-Ray Computed Tomography (CT) images enable capturing specific manifestations of TB that are undetectable using regular diagnostic tests, which suffer from limited specificity. In conventional workflows, expert radiologists inspect the CT images. However, this procedure is unfeasible to process the thousands of volume images belonging to the different TB animal models and humans required for a suitable (pre-)clinical trial. To achieve suitable results, automatization of different image analysis processes is a must to quantify TB. It is also advisable to measure the uncertainty associated with this process and model causal relationships between the specific mechanisms that characterize each animal model and its level of damage. Thus, in this thesis, we introduce a set of novel methods based on the state of the art Artificial Intelligence (AI) and Computer Vision (CV). Initially, we present an algorithm to assess Pathological Lung Segmentation (PLS) employing an unsupervised rule-based model which was traditionally considered a needed step before biomarker extraction. This procedure allows robust segmentation in a Mtb. infection model (Dice Similarity Coefficient, DSC, 94%±4%, Hausdorff Distance, HD, 8.64mm±7.36mm) of damaged lungs with lesions attached to the parenchyma and affected by respiratory movement artefacts. Next, a Gaussian Mixture Model ruled by an Expectation-Maximization (EM) algorithm is employed to automatically quantify the burden of Mtb.using biomarkers extracted from the segmented CT images. This approach achieves a strong correlation (R2 ≈ 0.8) between our automatic method and manual extraction. Consequently, Chapter 3 introduces a model to automate the identification of TB lesions and the characterization of disease progression. To this aim, the method employs the Statistical Region Merging algorithm to detect lesions subsequently characterized by texture features that feed a Random Forest (RF) estimator. The proposed procedure enables a selection of a simple but powerful model able to classify abnormal tissue. The latest works base their methodology on Deep Learning (DL). Chapter 4 extends the classification of TB lesions. Namely, we introduce a computational model to infer TB manifestations present in each lung lobe of CT scans by employing the associated radiologist reports as ground truth. We do so instead of using the classical manually delimited segmentation masks. The model adjusts the three-dimensional architecture, V-Net, to a multitask classification context in which loss function is weighted by homoscedastic uncertainty. Besides, the method employs Self-Normalizing Neural Networks (SNNs) for regularization. Our results are promising with a Root Mean Square Error of 1.14 in the number of nodules and F1-scores above 0.85 for the most prevalent TB lesions (i.e., conglomerations, cavitations, consolidations, trees in bud) when considering the whole lung. In Chapter 5, we present a DL model capable of extracting disentangled information from images of different animal models, as well as information of the mechanisms that generate the CT volumes. The method provides the segmentation mask of axial slices from three animal models of different species employing a single trained architecture. It also infers the level of TB damage and generates counterfactual images. So, with this methodology, we offer an alternative to promote generalization and explainable AI models. To sum up, the thesis presents a collection of valuable tools to automate the quantification of pathological lungs and moreover extend the methodology to provide more explainable results which are vital for drug development purposes. Chapter 6 elaborates on these conclusions.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidenta: María Jesús Ledesma Carbayo.- Secretario: David Expósito Singh.- Vocal: Clarisa Sánchez Gutiérre

    Image Based Biomarkers from Magnetic Resonance Modalities: Blending Multiple Modalities, Dimensions and Scales.

    Get PDF
    The successful analysis and processing of medical imaging data is a multidisciplinary work that requires the application and combination of knowledge from diverse fields, such as medical engineering, medicine, computer science and pattern classification. Imaging biomarkers are biologic features detectable by imaging modalities and their use offer the prospect of more efficient clinical studies and improvement in both diagnosis and therapy assessment. The use of Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) and its application to the diagnosis and therapy has been extensively validated, nevertheless the issue of an appropriate or optimal processing of data that helps to extract relevant biomarkers to highlight the difference between heterogeneous tissue still remains. Together with DCE-MRI, the data extracted from Diffusion MRI (DWI-MR and DTI-MR) represents a promising and complementary tool. This project initially proposes the exploration of diverse techniques and methodologies for the characterization of tissue, following an analysis and classification of voxel-level time-intensity curves from DCE-MRI data mainly through the exploration of dissimilarity based representations and models. We will explore metrics and representations to correlate the multidimensional data acquired through diverse imaging modalities, a work which starts with the appropriate elastic registration methodology between DCE-MRI and DWI- MR on the breast and its corresponding validation. It has been shown that the combination of multi-modal MRI images improve the discrimination of diseased tissue. However the fusion of dissimilar imaging data for classification and segmentation purposes is not a trivial task, there is an inherent difference in information domains, dimensionality and scales. This work also proposes a multi-view consensus clustering methodology for the integration of multi-modal MR images into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view imaging approach calculates multiple vectorial dissimilarity-spaces for each one of the MRI modalities and makes use of the concepts behind cluster ensembles to combine a set of base unsupervised segmentations into an unified partition of the voxel-based data. The methodology is specially designed for combining DCE-MRI and DTI-MR, for which a manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information.The successful analysis and processing of medical imaging data is a multidisciplinary work that requires the application and combination of knowledge from diverse fields, such as medical engineering, medicine, computer science and pattern classification. Imaging biomarkers are biologic features detectable by imaging modalities and their use offer the prospect of more efficient clinical studies and improvement in both diagnosis and therapy assessment. The use of Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) and its application to the diagnosis and therapy has been extensively validated, nevertheless the issue of an appropriate or optimal processing of data that helps to extract relevant biomarkers to highlight the difference between heterogeneous tissue still remains. Together with DCE-MRI, the data extracted from Diffusion MRI (DWI-MR and DTI-MR) represents a promising and complementary tool. This project initially proposes the exploration of diverse techniques and methodologies for the characterization of tissue, following an analysis and classification of voxel-level time-intensity curves from DCE-MRI data mainly through the exploration of dissimilarity based representations and models. We will explore metrics and representations to correlate the multidimensional data acquired through diverse imaging modalities, a work which starts with the appropriate elastic registration methodology between DCE-MRI and DWI- MR on the breast and its corresponding validation. It has been shown that the combination of multi-modal MRI images improve the discrimination of diseased tissue. However the fusion of dissimilar imaging data for classification and segmentation purposes is not a trivial task, there is an inherent difference in information domains, dimensionality and scales. This work also proposes a multi-view consensus clustering methodology for the integration of multi-modal MR images into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view imaging approach calculates multiple vectorial dissimilarity-spaces for each one of the MRI modalities and makes use of the concepts behind cluster ensembles to combine a set of base unsupervised segmentations into an unified partition of the voxel-based data. The methodology is specially designed for combining DCE-MRI and DTI-MR, for which a manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information

    Similarity search and data mining techniques for advanced database systems.

    Get PDF
    Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques. Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects. The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness. The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections. The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets

    Similarity search and data mining techniques for advanced database systems.

    Get PDF
    Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques. Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects. The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness. The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections. The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets

    Deep Recurrent Networks for Gesture Recognition and Synthesis

    Get PDF
    It is hard to overstate the importance of gesture-based interfaces in many applications nowadays. The adoption of such interfaces stems from the opportunities they create for incorporating natural and fluid user interactions. This highlights the importance of having gesture recognizers that are not only accurate but also easy to adopt. The ever-growing popularity of machine learning has prompted many application developers to integrate automatic methods of recognition into their products. On the one hand, deep learning often tops the list of the most powerful and robust recognizers. These methods have been consistently shown to outperform all other machine learning methods in a variety of tasks. On the other hand, deep networks can be overwhelming to use for a majority of developers, requiring a lot of tuning and tweaking to work as expected. Additionally, these networks are infamous for their requirement for large amounts of training data, further hampering their adoption in scenarios where labeled data is limited. In this dissertation, we aim to bridge the gap between the power of deep learning methods and their adoption into gesture recognition workflows. To this end, we introduce two deep network models for recognition. These models are similar in spirit, but target different application domains: one is designed for segmented gesture recognition, while the other is suitable for continuous data, tackling segmentation and recognition problems simultaneously. The distinguishing characteristic of these networks is their simplicity, small number of free parameters, and their use of common building blocks that come standard with any modern deep learning framework, making them easy to implement, train and adopt. Through evaluations, we show that our proposed models achieve state-of-the-art results in various recognition tasks and application domains spanning different input devices and interaction modalities. We demonstrate that the infamy of deep networks due to their demand for powerful hardware as well as large amounts of data is an unfair assessment. On the contrary, we show that in the absence of such data, our proposed models can be quickly trained while achieving competitive recognition accuracy. Next, we explore the problem of synthetic gesture generation: a measure often taken to address the shortage of labeled data. We extend our proposed recognition models and demonstrate that the same models can be used in a Generative Adversarial Network (GAN) architecture for synthetic gesture generation. Specifically, we show that our original recognizer can be used as the discriminator in such frameworks, while its slightly modified version can act as the gesture generator. We then formulate a novel loss function for our gesture generator, which entirely replaces the need for a discriminator network in our generative model, thereby significantly reducing the complexity of our framework. Through evaluations, we show that our model is able to improve the recognition accuracy of multiple recognizers across a variety of datasets. Through user studies, we additionally show that human evaluators mistake our synthetic samples with the real ones frequently indicating that our synthetic samples are visually realistic. Additional resources for this dissertation (such as demo videos and public source codes) are available at https://www.maghoumi.com/dissertatio

    K-means based clustering and context quantization

    Get PDF
    corecore