104 research outputs found
Learning Multimodal Structures in Computer Vision
A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately.
We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand.
Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power.
We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition
CELL PATTERN CLASSIFICATION OF INDIRECT IMMUNOFLUORESCENCE IMAGES
Ph.DDOCTOR OF PHILOSOPH
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
Exploiting Cross Domain Relationships for Target Recognition
Cross domain recognition extracts knowledge from one domain to recognize samples from another domain of interest. The key to solving problems under this umbrella is to find out the latent connections between different domains. In this dissertation, three different cross domain recognition problems are studied by exploiting the relationships between different domains explicitly according to the specific real problems.
First, the problem of cross view action recognition is studied. The same action might seem quite different when observed from different viewpoints. Thus, how to use the training samples from a given camera view and perform recognition in another new view is the key point. In this work, reconstructable paths between different views are built to mirror labeled actions from one source view into one another target view for learning an adaptable classifier. The path learning takes advantage of the joint dictionary learning techniques with exploiting hidden information in the seemingly useless samples, making the recognition performance robust and effective.
Second, the problem of person re-identification is studied, which tries to match pedestrian images in non-overlapping camera views based on appearance features. In this work, we propose to learn a random kernel forest to discriminatively assign a specific distance metric to each pair of local patches from the two images in matching. The forest is composed by multiple decision trees, which are designed to partition the overall space of local patch-pairs into substantial subspaces, where a simple but effective local metric kernel can be defined to minimize the distance of true matches.
Third, the problem of multi-event detection and recognition in smart grid is studied. The signal of multi-event might not be a straightforward combination of some single-event signals because of the correlation among devices. In this work, a concept of ``root-pattern\u27\u27 is proposed that can be extracted from a collection of single-event signals, but also transferable to analyse the constituent components of multi-cascading-event signals based on an over-complete dictionary, which is designed according to the ``root-patterns\u27\u27 with temporal information subtly embedded.
The correctness and effectiveness of the proposed approaches have been evaluated by extensive experiments
Bayesian nonparametric models for data exploration
Mención Internacional en el título de doctorMaking sense out of data is one of the biggest challenges of our time. With the emergence of
technologies such as the Internet, sensor networks or deep genome sequencing, a true data explosion
has been unleashed that affects all fields of science and our everyday life. Recent breakthroughs, such
as self-driven cars or champion-level Go player programs, have demonstrated the potential benefits
from exploiting data, mostly in well-defined supervised tasks. However, we have barely started to
actually explore and truly understand data.
In fact, data holds valuable information for answering most important questions for humanity:
How does aging impact our physical capabilities? What are the underlying mechanisms of cancer?
Which factors make countries wealthier than others? Most of these questions cannot be stated as
well-defined supervised problems, and might benefit enormously from multidisciplinary research
efforts involving easy-to-interpret models and rigorous data exploratory analyses. Efficient data exploration
might lead to life-changing scientific discoveries, which can later be turned into a more impactful
exploitation phase, to put forward more informed policy recommendations, decision-making
systems, medical protocols or improved models for highly accurate predictions.
This thesis proposes tailored Bayesian nonparametric (BNP) models to solve specific data exploratory
tasks across different scientific areas including sport sciences, cancer research, and economics.
We resort to BNP approaches to facilitate the discovery of unexpected hidden patterns
within data. BNP models place a prior distribution over an infinite-dimensional parameter space,
which makes them particularly useful in probabilistic models where the number of hidden parameters
is unknown a priori. Under this prior distribution, the posterior distribution of the hidden parameters
given the data will assign high probability mass to those configurations that best explain the
observations. Hence, inference over the hidden variables can be performed using standard Bayesian
inference techniques, therefore avoiding expensive model selection steps.
This thesis is application-focused and highly multidisciplinary. More precisely, we propose an
automatic grading system for sportive competitions to compare athletic performance regardless of
age, gender and environmental aspects; we develop BNP models to perform genetic association
and biomarker discovery in cancer research, either using genetic information and Electronic Health
Records or clinical trial data; finally, we present a flexible infinite latent factor model of international
trade data to understand the underlying economic structure of countries and their evolution over time.Uno de los principales desafíos de nuestro tiempo es encontrar sentido dentro de los datos. Con la aparición de tecnologías como Internet, redes de sensores, o métodos de secuenciación profunda
del genoma, una verdadera explosión digital se ha visto desencadenada, afectando todos los campos científicos, así como nuestra vida diaria. Logros recientes como pueden ser los coches auto-dirigidos o programas que ganan a los seres humanos al milenario juego del Go, han demostrado con creces los posibles beneficios que podemos obtener de la explotación de datos, mayoritariamente en tareas
supervisadas bien definidas. No obstante, apenas hemos empezado con la exploración de datos y su verdadero entendimiento.
En verdad, los datos encierran información muy valiosa para responder a muchas de las preguntas
más importantes para la humanidad: ¿Cómo afecta el envejecimiento a nuestras aptitudes físicas?
¿Cuáles son los mecanismos subyacentes del cáncer? ¿Qué factores explican la riqueza de ciertos
países frente a otros? Si bien la mayoría de estas preguntas no pueden formularse como problemas
supervisados bien definidos, éstas pueden ser abordadas mediante esfuerzos de investigación
multidisciplinar que involucren modelos fáciles de interpretar y análisis exploratorios rigurosos. Explorar los datos de manera eficiente abre potencialmente la puerta a un sinnúmero de descubrimientos
científicos en diversas áreas con impacto real en nuestras vidas, descubrimientos que a su vez pueden llevarnos a una mejor explotación de los datos, resultando en recomendaciones políticas adecuadas, sistemas precisos de toma de decisión, protocolos médicos optimizados o modelos con mejores capacidades
predictivas. Esta tesis propone modelos Bayesianos no-paramétricos (BNP) adecuados para la resolución específica de tareas explorativas de los datos en diversos ámbitos científicos incluyendo ciencias del
deporte, investigación contra el cáncer, o economía. Recurrimos a un planteamiento BNP para facilitar el descubrimiento de patrones ocultos inesperados subyacentes en los datos. Los modelos
BNP definen una distribución a priori sobre un espacio de parámetros de dimensión infinita, lo cual los hace especialmente atractivos para enfoques probabilísticos donde el número de parámetros latentes
es en principio desconocido. Bajo dicha distribución a priori, la distribución a posteriori de los parámetros ocultos dados los datos asignará mayor probabilidad a aquellas configuraciones que
mejor explican las observaciones. De esta manera, la inferencia sobre el espacio de variables ocultas puede realizarse mediante técnicas estándar de inferencia Bayesiana, evitando el proceso de selección
de modelos. Esta tesis se centra en el ámbito de las aplicaciones, y es de naturaleza multidisciplinar. En
concreto, proponemos un sistema de gradación automática para comparar el rendimiento deportivo
de atletas independientemente de su edad o género, así como de otros factores del entorno. Desarrollamos
modelos BNP para descubrir asociaciones genéticas y biomarcadores dentro de la investigación contra el cáncer, ya sea contrastando información genética con la historia clínica electrónica
de los pacientes, o utilizando datos de ensayos clínicos; finalmente, presentamos un modelo flexible
de factores latentes infinito para datos de comercio internacional, con el objetivo de entender la
estructura económica de los distintos países y su correspondiente evolución a lo largo del tiempo.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Joaquín Míguez Arenas.- Secretario: Daniel Hernández Lobato.- Vocal: Cédric Archambea
Medical Image Analysis using Deep Relational Learning
In the past ten years, with the help of deep learning, especially the rapid
development of deep neural networks, medical image analysis has made remarkable
progress. However, how to effectively use the relational information between
various tissues or organs in medical images is still a very challenging
problem, and it has not been fully studied. In this thesis, we propose two
novel solutions to this problem based on deep relational learning. First, we
propose a context-aware fully convolutional network that effectively models
implicit relation information between features to perform medical image
segmentation. The network achieves the state-of-the-art segmentation results on
the Multi Modal Brain Tumor Segmentation 2017 (BraTS2017) and Multi Modal Brain
Tumor Segmentation 2018 (BraTS2018) data sets. Subsequently, we propose a new
hierarchical homography estimation network to achieve accurate medical image
mosaicing by learning the explicit spatial relationship between adjacent
frames. We use the UCL Fetoscopy Placenta dataset to conduct experiments and
our hierarchical homography estimation network outperforms the other
state-of-the-art mosaicing methods while generating robust and meaningful
mosaicing result on unseen frames.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0778
On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator
Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
- …