193 research outputs found
Recommended from our members
Finite Fields: Theory and Applications
Finite fields are the focal point of many interesting geometric, algorithmic and combinatorial problems. The workshop was devoted to progress on these questions, with an eye also on the important applications of finite field techniques in cryptography, error correcting codes, and random number generation
Probabilistic Models and Natural Language Processing in Health
The treatment of mental disorders nowadays entails a wide variety of still non-solved
tasks such as misdiagnosis or delayed diagnosis. During this doctoral thesis we study and
develop different models that can serve as potential tools for the clinician labor. Among
our proposals, we outline two main lines of research, Natural Language Processing and
probabilistic methods.
In Chapter 2, we start our thesis with a regularization mechanism used in language
models and specially effective in Transformer-based architectures, where we call it NoRBERT,
from Noisy Regularized Bidirectional Representations from Transformers [9], [15].
According to the literature, we found out that regularization in NLP is a low explored
field limited to the use of general mechanisms such as dropout [57] or early stopping
[58]. In this landscape, we propose a novel approach to combine any LM with Variational
Auto-Encoders [23]. VAEs belong to deep generative models, with the construction of
a regular latent space that permits the reconstruction of the input samples throughout an
encoder and decoder networks. Our VAE is based in a prior distribution of a mixture
of Gaussians (GMVAE), what gives the model the chance to capture some multimodal
information. Combining both, Transformers and GMVAEs we build an architecture capable
of imputing missing words from a text corpora in a diverse topic space as well as
improve BLEU score in the reconstruction of the data base. Both results depend on the
depth of the regularized layer from the Transformer Encoder. The regularization in essence
is formed by the GMVAE reconstruction of the Transformer embeddings at some point in
the architecture, adding structure noise that helps the model a better generalization. We
show improvements in BERT[15], RoBERTa [16] and XLM-R [17] models, verified in
different datasets and we also provide explicit examples of sentences reconstructed by
Top NoRBERT. In addition, we validate the abilities of our model in data augmentation,
improving classification accuracy and F1 score in various datasets and scenarios thanks
to augmented samples generated by NoRBERT. We study some variations in the model,
Top, Deep and contextual NoRBERT, the latter based in the use of contextual words to
reconstruct the embeddings in the corresponding Transformer layer.
We continue with the Transformers line of research in Chapter 3, proposing PsyBERT.
PsyBERT, as the own name refers, is a BERT-based [15] architecture suitably modified
to work in Electronic Health Records from psychiatry patients. It is inspired by BEHRT [19], also devoted to EHRs in general health. We distinguish our model from the training
methodology and the embedding layer. In a similar way that with NoRBERT, we find
the utility of using a Masked Language Modeling (MLM) policy without no finetuning or
specific-task layer at all. On the one hand, we used MLM in NoRBERT to solve the task
of imputing missing words, finishing the aim of the model in generating new sentences by
inputs with missing information. On the other hand, we firstly propose the use of PsyBERT
such as tool to fill the missing diagnoses in the EHR as well as correct misdiagnosed
cases. After this task, we also apply PsyBERT in delusional disorder detection. On the
contrary, in this scenario we apply a multi-label classification layer, that aims to compute
the probability of the different diagnoses in the last visit of the patient to the hospital.
From these probabilities, we analyse delusional cases and propose a tool to detect potential
candidates of this mental disorder. In both tasks, we make use of several fields obtained
from the patient EHR, such as age, sex, diagnoses, treatments of psychiatric history and
propose a method capable of combining heterogeneous data to help the diagnosis in mental
health. During these works, we point out the problematic in the quality of the data from
the EHRs [104], [105] and the great advantage that medical assistance tools like our
model can provide. We do not only solve a classification problem with more than 700
different illnesses, but we bring a model to help doctors in the diagnosis of very complex
scenarios, with comorbidity, long periods of patient exploration by traditional methodology
or low prevalence cases. We present a powerful method treating a problematic with great
necessity.
Following the health line of research and psychiatry application, we analyse in Chapter
4 a probabilistic method to search for behavioral pattern in patients also with mental
disorders. In this case it is not the method the contribution of the work but the application
and results in collaboration with the clinician interpretation. The model is called SPFM
(Sparse Poisson Factorization Model) [22] and consist on a non-parametric probabilistic
model based on the Indian Buffet Process (IBP) [20], [21]. It is a exploratory method
capable of decomposing the input data in sparse matrixes. For that, it imposes the Poisson
distribution to the product of two matrixes, Z and B, both obtained respectively by the IBP
and a Gamma distribution. Hence Z corresponds to a binary matrix representing active
latent features in a patient data and B weights the contribution of the data characteristics to
the latent features. The data we use in the three works described during the chapter refers
to different questions from e-health questionnaries. Then, the data characteristics refer to
the answer or punctuation on each question and the latent features from different behavioral
patterns in a patient regarding the selection of features active in their questionnaires. For
example, patient X can present feature 1 and 2 and patient Y may presence feature 1
and 3, giving as a result two different profiles of behavioral. With these procedure we
study three scenarios. In the first problematic, we relate the profiles with the diagnoses,
finding common patterns among the patients and connections between diseases. We also
analyse the grade of critical state and contrast the clinician judgment via the Clinical
Global Impression (CGI). In the second scenario, we pursue a similar study and find
out connections between disturbed sleeping patterns and clinical markers of wish to die. We focus this analysis in patients with suicidal thoughts due to the problematic that
those individuals suppose as a major public health issue [175]. In this case we vary
the questionnarie and the data sample, obtaining different profiles also with important
information to interpret by the psychiatrist. The main contribution of this work is the
proportion of a mechanism capable of helping with detection and prevention of suicide.
Finally, the third work comprehend a behavioral pattern study in mental health patient
before and during covid-19 lockdown. We did not want to lose the chance to contribute
during coronavirus disease outbreak and presented a study about the changes in psychiatric
patients during the alarm state. We analyse again the profiles with the previous e-health
questionnaire and discover that the self-reported suicide risk decreased during the lockdown.
These results contrast with others studies [237] and suppose signs for an increase in suicidal
ideation once the crisis ceases.
Finally, Chapter 5 propose a regularization mechanism based in a theoretical idea from
[245] to obtain a variance reduction in the real risk. We interpret the robust regularized
risk that those authors propose in a two-step mechanism formed by the minimization of the
weighted risk and the maximization of a robust objective and suggest an idea to apply this
methodology in a way to select the samples from the mini-batch in a deep learning set up.
We study different variations of repeating the worst performed samples from the previous
mini-bath during the training procedure and show proves of improvements in the accuracy
and faster convergence rates of a image classification problem with different architectures
and datasets.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Joaquín Míguez Arenas.- Secretario: Francisco Jesús Rodríguez Ruiz.- Vocal: Santiago Ovejero Garcí
Learned simulation as the engine of physical scene understanding
La cognición humana evoca las habilidades del razonamiento, la comunicación y la interacción. Esto incluye la interpretación de la física del mundo real para comprender las leyes que subyacen en ella. Algunas teorías postulan la semejanza entre esta capacidad de razonamiento con simulaciones para interpretar la física de la escena, que abarca la percepción para la comprensión del estado físico actual, y el razonamiento acerca de la evolución temporal de un sistema dado. En este contexto se propone el desarrollo de un sistema para realizar simulación aprendida. Establecido un objetivo, el algoritmo se entrena para aprender una aproximación de la dinámica real, para construir así un gemelo digital del entorno. Entonces, el sistema de simulación emulará la física subyacente con información obtenida mediante observaciones de la escena. Para ello, se empleará una cámara estéreo para adquirir datos a partir de secuencias de video. El trabajo se centra los fenómenos oscilatorios de fluidos. Los fluidos están presentes en muchas de nuestras acciones diarias y constituyen un reto físico para el sistema propuesto. Son deformables, no lineales, y presentan un carácter disipativo dominante, lo que los convierte en un sistema complejo para ser aprendido. Además, sólo se tiene acceso a mediciones parciales de su estado ya que la cámara sólo proporciona información acerca de la superficie libre. El resultado es un sistema capaz de percibir y razonar sobre la dinámica del fluido. El gemelo digital cognitivo así construido proporciona una interpretación del estado del mismo para integrar su evolución en tiempo real, aprendiendo con información observada del gemelo físico. El sistema, entrenado originalmente para un líquido concreto, se adaptará a cualquier otro a través del aprendizaje por refuerzo produciendo así resultados precisos para líquidos desconocidos. Finalmente, se emplea la realidad aumentada (RA) para ofrecer una representación visual de los resultados, así como información adicional sobre el estado del líquido que no es accesible al ojo humano. Este objetivo se alcanza mediante el uso de técnicas de aprendizaje de variedades, y aprendizaje automático, como las redes neuronales, enriquecido con información física. Empleamos sesgos inductivos basados en el conocimiento de la termodinámica para desarrollar un sistema inteligente que cumpla con estos principios para dar soluciones con sentido sobre la dinámica. El problema abordado en esta tesis constituye una dificultad de primer orden en el desarrollo de sistemas robóticos destinados a la manipulación de fluidos. En acciones como el vertido o el movimiento, la oscilación de los líquidos juega un papel importante en el desarrollo de sistemas de asistencia a personas con movilidad reducida o aplicaciones industriales. Cognition evokes human abilities for reasoning, communication, and interaction. This includes the interpretation of real-world physics so as to understand its underlying laws. Theories postulate the similarity of human reasoning about these phenomena with simulations for physical scene understanding, which gathers perception for comprehension of the current dynamical state, and reasoning for time evolution prediction of a given system. In this context, we propose the development of a system for learned simulation. Given a design objective, an algorithm is trained to learn an approximation to the real dynamics to build a digital twin of the environment. Then, the underlying physics will be emulated with information coming from observations of the scene. For this purpose, we use a commodity camera to acquire data exclusively from video recordings. We focus on the sloshing problem as a benchmark. Fluids are widely present in several daily actions and portray a physically rich challenge for the proposed systems. They are highly deformable, nonlinear, and present a dominant dissipative behavior, making them a complex entity to be emulated. In addition, we only have access to partial measurements of their dynamical state, since a commodity camera only provides information about the free surface. The result is a system capable of perceiving and reasoning about the dynamics of the fluid. This cognitive digital twin provides an interpretation of the state of the fluid to integrate its dynamical evolution in real-time, updated with information observed from the real twin. The system, trained originally for one liquid, will be able to adapt itself to any other fluid through reinforcement learning and produce accurate results for previously unseen liquids. Augmented reality is used in the design of this application to offer a visual interpretation of the solutions to the user, and include information about the dynamics that is not accessible to the human eye. This objective is to be achieved through the use of manifold learning and machine learning techniques, such as neural networks, enriched with physics information. We use inductive biases based on the knowledge of thermodynamics to develop machine intelligence systems that fulfill these principles to provide meaningful solutions to the dynamics. This problem is considered one of the main targets in fluid manipulation for the development of robotic systems. Pursuing actions such as pouring or moving, sloshing dynamics play a capital role for the correct performance of aiding systems for the elderly or industrial applications that involve liquids. <br /
Training deep retrieval models with noisy datasets
In this thesis we study loss functions that allow to train Convolutional Neural
Networks (CNNs) under noisy datasets for the particular task of Content-
Based Image Retrieval (CBIR). In particular, we propose two novel losses to fit
models that generate global image representations. First, a Soft-Matching (SM)
loss, exploiting both image content and meta data, is used to specialized general
CNNs to particular cities or regions using weakly annotated datasets. Second,
a Bag Exponential (BE) loss inspired by the Multiple Instance Learning (MIL)
framework is employed to train CNNs for CBIR under noisy datasets.
The first part of the thesis introduces a novel training framework that, relying
on image content and meta data, learns location-adapted deep models that
provide fine-tuned image descriptors for specific visual contents. Our networks,
which start from a baseline model originally learned for a different task, are specialized
using a custom pairwise loss function, our proposed SM loss, that uses
weak labels based on image content and meta data.
The experimental results show that the proposed location-adapted CNNs
achieve an improvement of up to a 55% over the baseline networks on a landmark
discovery task. This implies that the models successfully learn the visual
clues and peculiarities of the region for which they are trained, and generate
image descriptors that are better location-adapted. In addition, for those landmarks
that are not present on the training set or even other cities, our proposed
models perform at least as well as the baseline network, which indicates a good
resilience against overfitting.
The second part of the thesis introduces the BE Loss function to train CNNs
for image retrieval borrowing inspiration from the MIL framework. The loss
combines the use of an exponential function acting as a soft margin, and a MILbased
mechanism working with bags of positive and negative pairs of images.
The method allows to train deep retrieval networks under noisy datasets, by
weighing the influence of the different samples at loss level, which increases the
performance of the generated global descriptors. The rationale behind the improvement
is that we are handling noise in an end-to-end manner and, therefore,
avoiding its negative influence as well as the unintentional biases due to fixed
pre-processing cleaning procedures. In addition, our method is general enough
to suit other scenarios requiring different weights for the training instances (e.g.
boosting the influence of hard positives during training). The proposed bag exponential
function can bee seen as a back door to guide the learning process
according to a certain objective in a end-to-end manner, allowing the model to
approach such an objective smoothly and progressively.
Our results show that our loss allows CNN-based retrieval systems to be
trained with noisy training sets and achieve state-of-the-art performance. Furthermore,
we have found that it is better to use training sets that are highly
correlated with the final task, even if they are noisy, than training with a clean set that is only weakly related with the topic at hand. From our point of view,
this result represents a big leap in the applicability of retrieval systems and help
to reduce the effort needed to set-up new CBIR applications: e.g. by allowing
a fast automatic generation of noisy training datasets and then using our bag
exponential loss to deal with noise. Moreover, we also consider that this result
opens a new line of research for CNN-based image retrieval: let the models decide
not only on the best features to solve the task but also on the most relevant
samples to do it.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Luis Salgado Álvarez de Sotomayor.- Secretario: Pablos Martínez Olmos.- Vocal: Ernest Valveny Llobe
Classification and regression with functional data: a mathematical optimization approach.
El objetivo de esta tesis doctoral es desarrollar nuevos métodos para la clasificación y regresión supervisada en el Análisis de Datos Funcionales. En particular, las herramientas de Optimización Matemática analizadas en esta tesis explotan la naturaleza funcional de los datos, dando lugar a nuevas técnicas que pueden mejorar los métodos clásicos y que conectan las matemáticas con las aplicaciones. El Capítulo 1 presenta las ideas generales, los retos y la notación usada a lo largo de la tesis.
El Capítulo 2 trata el problema de seleccionar el conjunto finito de instantes de tiempo que mejor clasifica datos funcionales multivariados en dos clases predefinidas. El uso, no sólo de la información proporcionada por la propia función, sino también por sus derivadas será decisivo para mejorar la predicción, como se pondrá de manifiesto posteriormente. Para ello se formula un problema de optimización binivel continuo. Dicho problema combina la aplicación de la conocida técnica SVM (Support Vector Machine)
con la maximización de la correlación entre la etiqueta de la clase y la denominada función score, vinculada a dicha técnica. El Capítulo 3 también se centra en la clasificación binaria de datos funcionales usando SVM. Sin embargo, en lugar de buscar los instantes de tiempo más relevantes,
aquí se define un ancho de banda funcional para la denominada función kernel. De esta forma, se puede mejorar el rendimiento del clasificador, a la vez que se identifican los diferentes intervalos del dominio de la función, de acuerdo a su capacidad predictiva, mejorando además la interpretabilidad del modelo resultante. La obtención de tales intervalos se lleva a cabo mediante la resolución de un problema de optimización binivel por medio de un algoritmo alternante. El Capítulo 4 se centra en la clasificación de los llamados datos funcionales híbridos, es decir, datos que están formados por variables funcionales y estáticas (constantes a lo largo del tiempo). El objetivo es seleccionar las variables, funcionales o estáticas, que mejor clasifiquen. Para ello, se define un kernel no isotrópico que asocia un parámetro ancho de banda escalar a cada una de las variables. De forma análoga a como se ha hecho en los capítulos anteriores, se propone un algoritmo alternante para resolver el problema de optimización binivel, que permite resolver los parámetros del kernel. El problema de selección de variables presentado en el Capítulo 2 se generaliza al campo de la regresión en el Capítulo 5. El método de resolución combina la técnica denominada SVR (Support Vector Regression) con la minimización de la suma de los
cuadrados de los residuos entre la verdadera variable respuesta y la prevista.
Todos los algoritmos propuestos a lo largo de esta tesis han sido aplicados a bases de datos sintéticas y reales, quedando probada su efectividad.The goal of this PhD dissertation is to develop new approaches for supervised classification and regression in Functional Data Analysis. articularly, the Mathematical optimization tools analyzed in this thesis exploit the functional nature of the data, leading to novel strategies which may outperform the standard methodologies and link mathematics with real-life applications. Chapter 1 presents the main ideas, challenges and the notation used in this thesis. Chapter 2 addresses the problem of selecting a finite set of time instants which best classify multivariate functional data into two predefined classes. Using, not only the information provided by the function itself but also its high-order derivatives will be crucial to improve the accuracy. To do this, a continuous bilevel optimization problem is solved. Such problem combines the resolution of the well-known technique SVM (Support Vector Machine) with the maximization of the correlation between the class label and the score. Chapter 3 also focuses on the binary classification problem using SVM. However, instead of finding the most important time instants, here we define a functional bandwidth in the so-called kernel function. In this way, accuracy may be improved and the most relevant intervals of the domain of the function, according to their classification ability, are identified, enhancing the interpretability. A bilevel optimization problem is formulated and solved by means of an alternating procedure. Chapter 4 is focused on classifying the so-called hybrid functional data, i.e., data which are formed by functional and static (constant over time) covariates. The goal is to select the features, functional or static, which best classify. An anisotropic kernel which associates a scalar bandwidth to each feature is defined. As in previous chapters, an alternating approach is proposed to solve a bilevel optimization problem. Chapter 5 generalizes the variable selection problem presented in Chapter 2 to regression. The solution approach combines the SVR (Support Vector Regression) problem with the minimization of sum of the squared residuals between the actual and predicted responses. An alternating heuristic is developed to handle such model. All the methodologies presented along this dissertation are tested in synthetic and real data sets, showing their applicability.Premio Extraordinario de Doctorado U
On the controllability of Partial Differential Equations involving non-local terms and singular potentials
In this thesis, we investigate controllability and observability properties of Partial Differential Equations describing various phenomena appearing in several fields of the applied sciences such as elasticity theory, ecology, anomalous transport and diffusion, material science, porous media flow and quantum mechanics. In particular, we focus on evolution Partial Differential Equations with non-local and singular terms.
Concerning non-local problems, we analyse the interior controllability of a Schr\"odinger and a wave-type equation in which the Laplace operator is replaced by the fractional Laplacian . Under appropriate assumptions on the order of the fractional Laplace operator involved, we prove the exact null controllability of both equations, employing a control supported in a neighbourhood of the boundary of a bounded domain . More precisely, we show that both the Schrodinger and the wave equation are null-controllable, for and for respectively. Furthermore, these exponents are sharp and controllability fails for (resp. ) for the Schrödinger (resp. wave) equation. Our proof is based on multiplier techniques and the very classical Hilbert Uniqueness Method.
For models involving singular terms, we firstly address the boundary controllability problem for a one-dimensional heat equation with the singular inverse-square potential , whose singularity is localised at one extreme of the space interval in which the PDE is defined. For all , we obtain the null controllability of the equation, acting with a control located at , which is both a boundary point and the pole of the potential. This result follows from analogous ones presented in \cite{gueye2014exact} for parabolic equations with variable degenerate coefficients.
Finally, we study the interior controllability of a heat equation with the singular inverse-square potential , involving the distance to the boundary of a bounded and domain , . For all (the critical Hardy constant associated to the potential ), we obtain the null controllability employing a control supported in an open subset . Moreover, we show that the upper bound is sharp. Our proof relies on a new Carleman estimate, obtained employing a weight properly designed for compensating the singularities of the potential
Topological Data Analysis of High-dimensional Correlation Structures with Applications in Epigenetics
This thesis comprises a comprehensive study of the correlation of highdimensional
datasets from a topological perspective. Derived from a lack of efficient algorithms of big data analysis
and motivated by the importance of finding a structure of correlations in genomics, we have developed two
analytical tools inspired by the topological data analysis approach that describe and predict the behavior of the
correlated design. Those models allowed us to study epigenetic interactions from a local and global perspective,
taking into account the different levels of complexity. We applied graph-theoretic and algebraic topology principles
to quantify structural patterns on local correlation networks and, based on them, we proposed a network model that
was able to predict the locally high correlations of DNA methylation data. This model provided with an efficient tool
to measure the evolution of the correlation with the aging process. Furthermore, we developed a powerful
computational algorithm to analyze the correlation structure globally that was able to detect differentiated
methylation patterns over sample groups. This methodology aimed to serve as a diagnostic tool, as it provides with
selected epigenetic biomarkers associated with a specific phenotype of interest. Overall, this work establishes a
novel perspective of analysis and modulation of hidden correlation structures, specifically those of great dimension
and complexity, contributing to the understanding of the epigenetic processes, and that is designed to be useful for
non-biological fields too
Bioinformatic analysis and deep learning on large-scale human transcriptomic data: studies on aging, Alzheimer’s neurodegeneration and cancer
[ES] El objetivo general del proyecto ha sido el análisis bioinformático integrativo de datos múltiples de proteómica y genómica combinados con datos clínicos asociados para la búsqueda de biomarcadores y módulos poligénicos causales aplicado a enfermedades complejas; principalmente, cáncer de origen primario desconocido, en sus distintos tipos y subtipos y enfermedades neurodegenerativas (ND) mayormente Alzheimer, además de neurodegeneración debida a la edad. Además, se ha hecho un uso intensivo de técnicas de inteligencia artificial, más en concreto de técnicas de redes neuronales de aprendizaje profundo para el análisis y pronóstico de dichas enfermedades
- …