115 research outputs found
State of the Art in Face Recognition
Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state
Distribution-Dissimilarities in Machine Learning
Any binary classifier (or score-function) can be used to define a dissimilarity
between two distributions. Many well-known distribution-dissimilarities are
actually classifier-based: total variation, KL- or JS-divergence, Hellinger
distance, etc. And many recent popular generative modeling algorithms compute
or approximate these distribution-dissimilarities by explicitly training a
classifier: e.g. generative adversarial networks (GAN) and their variants.
This thesis introduces and studies such classifier-based
distribution-dissimilarities. After a general introduction, the first part
analyzes the influence of the classifiers' capacity on the dissimilarity's
strength for the special case of maximum mean discrepancies (MMD) and provides
applications. The second part studies applications of classifier-based
distribution-dissimilarities in the context of generative modeling and presents
two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and
final part focuses on adversarial examples, i.e. targeted but imperceptible
input-perturbations that lead to drastically different predictions of an
artificial classifier. It shows that adversarial vulnerability of neural network
based classifiers typically increases with the input-dimension, independently
of the network topology
Computer-Aided Assessment of Tuberculosis with Radiological Imaging: From rule-based methods to Deep Learning
Mención Internacional en el título de doctorTuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb.)
that produces pulmonary damage due to its airborne nature. This fact facilitates the disease
fast-spreading, which, according to the World Health Organization (WHO), in 2021 caused
1.2 million deaths and 9.9 million new cases.
Traditionally, TB has been considered a binary disease (latent/active) due to the limited
specificity of the traditional diagnostic tests. Such a simple model causes difficulties in the
longitudinal assessment of pulmonary affectation needed for the development of novel drugs
and to control the spread of the disease.
Fortunately, X-Ray Computed Tomography (CT) images enable capturing specific manifestations
of TB that are undetectable using regular diagnostic tests, which suffer from
limited specificity. In conventional workflows, expert radiologists inspect the CT images.
However, this procedure is unfeasible to process the thousands of volume images belonging
to the different TB animal models and humans required for a suitable (pre-)clinical trial.
To achieve suitable results, automatization of different image analysis processes is a
must to quantify TB. It is also advisable to measure the uncertainty associated with this
process and model causal relationships between the specific mechanisms that characterize
each animal model and its level of damage. Thus, in this thesis, we introduce a set of novel
methods based on the state of the art Artificial Intelligence (AI) and Computer Vision (CV).
Initially, we present an algorithm to assess Pathological Lung Segmentation (PLS) employing
an unsupervised rule-based model which was traditionally considered a needed
step before biomarker extraction. This procedure allows robust segmentation in a Mtb. infection
model (Dice Similarity Coefficient, DSC, 94%±4%, Hausdorff Distance, HD,
8.64mm±7.36mm) of damaged lungs with lesions attached to the parenchyma and affected
by respiratory movement artefacts.
Next, a Gaussian Mixture Model ruled by an Expectation-Maximization (EM) algorithm
is employed to automatically quantify the burden of Mtb.using biomarkers extracted from the
segmented CT images. This approach achieves a strong correlation (R2 ≈ 0.8) between our
automatic method and manual extraction. Consequently, Chapter 3 introduces a model to automate the identification of TB lesions
and the characterization of disease progression. To this aim, the method employs the
Statistical Region Merging algorithm to detect lesions subsequently characterized by texture
features that feed a Random Forest (RF) estimator. The proposed procedure enables a
selection of a simple but powerful model able to classify abnormal tissue.
The latest works base their methodology on Deep Learning (DL). Chapter 4 extends
the classification of TB lesions. Namely, we introduce a computational model to infer
TB manifestations present in each lung lobe of CT scans by employing the associated
radiologist reports as ground truth. We do so instead of using the classical manually delimited
segmentation masks. The model adjusts the three-dimensional architecture, V-Net, to a multitask
classification context in which loss function is weighted by homoscedastic uncertainty.
Besides, the method employs Self-Normalizing Neural Networks (SNNs) for regularization.
Our results are promising with a Root Mean Square Error of 1.14 in the number of nodules
and F1-scores above 0.85 for the most prevalent TB lesions (i.e., conglomerations, cavitations,
consolidations, trees in bud) when considering the whole lung.
In Chapter 5, we present a DL model capable of extracting disentangled information from
images of different animal models, as well as information of the mechanisms that generate
the CT volumes. The method provides the segmentation mask of axial slices from three
animal models of different species employing a single trained architecture. It also infers the
level of TB damage and generates counterfactual images. So, with this methodology, we
offer an alternative to promote generalization and explainable AI models.
To sum up, the thesis presents a collection of valuable tools to automate the quantification
of pathological lungs and moreover extend the methodology to provide more explainable
results which are vital for drug development purposes. Chapter 6 elaborates on these
conclusions.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidenta: María Jesús Ledesma Carbayo.- Secretario: David Expósito Singh.- Vocal: Clarisa Sánchez Gutiérre
Image Based Biomarkers from Magnetic Resonance Modalities: Blending Multiple Modalities, Dimensions and Scales.
The successful analysis and processing of medical
imaging data is a multidisciplinary work that requires the
application and combination of knowledge from diverse fields,
such as medical engineering, medicine, computer science and
pattern classification. Imaging biomarkers are biologic features
detectable by imaging modalities and their use offer the prospect
of more efficient clinical studies and improvement in both
diagnosis and therapy assessment. The use of Dynamic Contrast
Enhanced Magnetic Resonance Imaging (DCE-MRI) and its
application to the diagnosis and therapy has been extensively
validated, nevertheless the issue of an appropriate or optimal
processing of data that helps to extract relevant biomarkers
to highlight the difference between heterogeneous tissue still
remains. Together with DCE-MRI, the data extracted from
Diffusion MRI (DWI-MR and DTI-MR) represents a promising
and complementary tool. This project initially proposes the
exploration of diverse techniques and methodologies for the
characterization of tissue, following an analysis and classification
of voxel-level time-intensity curves from DCE-MRI data mainly
through the exploration of dissimilarity based representations
and models. We will explore metrics and representations to
correlate the multidimensional data acquired through diverse
imaging modalities, a work which starts with the appropriate
elastic registration methodology between DCE-MRI and DWI-
MR on the breast and its corresponding validation.
It has been shown that the combination of multi-modal MRI
images improve the discrimination of diseased tissue. However the fusion
of dissimilar imaging data for classification and segmentation purposes is
not a trivial task, there is an inherent difference in information domains,
dimensionality and scales. This work also proposes a multi-view consensus
clustering methodology for the integration of multi-modal MR images
into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view
imaging approach calculates multiple vectorial dissimilarity-spaces for
each one of the MRI modalities and makes use of the concepts behind
cluster ensembles to combine a set of base unsupervised segmentations
into an unified partition of the voxel-based data. The methodology is
specially designed for combining DCE-MRI and DTI-MR, for which a
manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information.The successful analysis and processing of medical
imaging data is a multidisciplinary work that requires the
application and combination of knowledge from diverse fields,
such as medical engineering, medicine, computer science and
pattern classification. Imaging biomarkers are biologic features
detectable by imaging modalities and their use offer the prospect
of more efficient clinical studies and improvement in both
diagnosis and therapy assessment. The use of Dynamic Contrast
Enhanced Magnetic Resonance Imaging (DCE-MRI) and its
application to the diagnosis and therapy has been extensively
validated, nevertheless the issue of an appropriate or optimal
processing of data that helps to extract relevant biomarkers
to highlight the difference between heterogeneous tissue still
remains. Together with DCE-MRI, the data extracted from
Diffusion MRI (DWI-MR and DTI-MR) represents a promising
and complementary tool. This project initially proposes the
exploration of diverse techniques and methodologies for the
characterization of tissue, following an analysis and classification
of voxel-level time-intensity curves from DCE-MRI data mainly
through the exploration of dissimilarity based representations
and models. We will explore metrics and representations to
correlate the multidimensional data acquired through diverse
imaging modalities, a work which starts with the appropriate
elastic registration methodology between DCE-MRI and DWI-
MR on the breast and its corresponding validation.
It has been shown that the combination of multi-modal MRI
images improve the discrimination of diseased tissue. However the fusion
of dissimilar imaging data for classification and segmentation purposes is
not a trivial task, there is an inherent difference in information domains,
dimensionality and scales. This work also proposes a multi-view consensus
clustering methodology for the integration of multi-modal MR images
into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view
imaging approach calculates multiple vectorial dissimilarity-spaces for
each one of the MRI modalities and makes use of the concepts behind
cluster ensembles to combine a set of base unsupervised segmentations
into an unified partition of the voxel-based data. The methodology is
specially designed for combining DCE-MRI and DTI-MR, for which a
manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information
Similarity search and data mining techniques for advanced database systems.
Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques.
Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects.
The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness.
The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections.
The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets
Similarity search and data mining techniques for advanced database systems.
Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques.
Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects.
The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness.
The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections.
The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets
Deep Recurrent Networks for Gesture Recognition and Synthesis
It is hard to overstate the importance of gesture-based interfaces in many applications nowadays. The adoption of such interfaces stems from the opportunities they create for incorporating natural and fluid user interactions. This highlights the importance of having gesture recognizers that are not only accurate but also easy to adopt. The ever-growing popularity of machine learning has prompted many application developers to integrate automatic methods of recognition into their products. On the one hand, deep learning often tops the list of the most powerful and robust recognizers. These methods have been consistently shown to outperform all other machine learning methods in a variety of tasks. On the other hand, deep networks can be overwhelming to use for a majority of developers, requiring a lot of tuning and tweaking to work as expected. Additionally, these networks are infamous for their requirement for large amounts of training data, further hampering their adoption in scenarios where labeled data is limited. In this dissertation, we aim to bridge the gap between the power of deep learning methods and their adoption into gesture recognition workflows. To this end, we introduce two deep network models for recognition. These models are similar in spirit, but target different application domains: one is designed for segmented gesture recognition, while the other is suitable for continuous data, tackling segmentation and recognition problems simultaneously. The distinguishing characteristic of these networks is their simplicity, small number of free parameters, and their use of common building blocks that come standard with any modern deep learning framework, making them easy to implement, train and adopt. Through evaluations, we show that our proposed models achieve state-of-the-art results in various recognition tasks and application domains spanning different input devices and interaction modalities. We demonstrate that the infamy of deep networks due to their demand for powerful hardware as well as large amounts of data is an unfair assessment. On the contrary, we show that in the absence of such data, our proposed models can be quickly trained while achieving competitive recognition accuracy. Next, we explore the problem of synthetic gesture generation: a measure often taken to address the shortage of labeled data. We extend our proposed recognition models and demonstrate that the same models can be used in a Generative Adversarial Network (GAN) architecture for synthetic gesture generation. Specifically, we show that our original recognizer can be used as the discriminator in such frameworks, while its slightly modified version can act as the gesture generator. We then formulate a novel loss function for our gesture generator, which entirely replaces the need for a discriminator network in our generative model, thereby significantly reducing the complexity of our framework. Through evaluations, we show that our model is able to improve the recognition accuracy of multiple recognizers across a variety of datasets. Through user studies, we additionally show that human evaluators mistake our synthetic samples with the real ones frequently indicating that our synthetic samples are visually realistic. Additional resources for this dissertation (such as demo videos and public source codes) are available at https://www.maghoumi.com/dissertatio
- …