193 research outputs found

    Probabilistic Models and Natural Language Processing in Health

    Get PDF
    The treatment of mental disorders nowadays entails a wide variety of still non-solved tasks such as misdiagnosis or delayed diagnosis. During this doctoral thesis we study and develop different models that can serve as potential tools for the clinician labor. Among our proposals, we outline two main lines of research, Natural Language Processing and probabilistic methods. In Chapter 2, we start our thesis with a regularization mechanism used in language models and specially effective in Transformer-based architectures, where we call it NoRBERT, from Noisy Regularized Bidirectional Representations from Transformers [9], [15]. According to the literature, we found out that regularization in NLP is a low explored field limited to the use of general mechanisms such as dropout [57] or early stopping [58]. In this landscape, we propose a novel approach to combine any LM with Variational Auto-Encoders [23]. VAEs belong to deep generative models, with the construction of a regular latent space that permits the reconstruction of the input samples throughout an encoder and decoder networks. Our VAE is based in a prior distribution of a mixture of Gaussians (GMVAE), what gives the model the chance to capture some multimodal information. Combining both, Transformers and GMVAEs we build an architecture capable of imputing missing words from a text corpora in a diverse topic space as well as improve BLEU score in the reconstruction of the data base. Both results depend on the depth of the regularized layer from the Transformer Encoder. The regularization in essence is formed by the GMVAE reconstruction of the Transformer embeddings at some point in the architecture, adding structure noise that helps the model a better generalization. We show improvements in BERT[15], RoBERTa [16] and XLM-R [17] models, verified in different datasets and we also provide explicit examples of sentences reconstructed by Top NoRBERT. In addition, we validate the abilities of our model in data augmentation, improving classification accuracy and F1 score in various datasets and scenarios thanks to augmented samples generated by NoRBERT. We study some variations in the model, Top, Deep and contextual NoRBERT, the latter based in the use of contextual words to reconstruct the embeddings in the corresponding Transformer layer. We continue with the Transformers line of research in Chapter 3, proposing PsyBERT. PsyBERT, as the own name refers, is a BERT-based [15] architecture suitably modified to work in Electronic Health Records from psychiatry patients. It is inspired by BEHRT [19], also devoted to EHRs in general health. We distinguish our model from the training methodology and the embedding layer. In a similar way that with NoRBERT, we find the utility of using a Masked Language Modeling (MLM) policy without no finetuning or specific-task layer at all. On the one hand, we used MLM in NoRBERT to solve the task of imputing missing words, finishing the aim of the model in generating new sentences by inputs with missing information. On the other hand, we firstly propose the use of PsyBERT such as tool to fill the missing diagnoses in the EHR as well as correct misdiagnosed cases. After this task, we also apply PsyBERT in delusional disorder detection. On the contrary, in this scenario we apply a multi-label classification layer, that aims to compute the probability of the different diagnoses in the last visit of the patient to the hospital. From these probabilities, we analyse delusional cases and propose a tool to detect potential candidates of this mental disorder. In both tasks, we make use of several fields obtained from the patient EHR, such as age, sex, diagnoses, treatments of psychiatric history and propose a method capable of combining heterogeneous data to help the diagnosis in mental health. During these works, we point out the problematic in the quality of the data from the EHRs [104], [105] and the great advantage that medical assistance tools like our model can provide. We do not only solve a classification problem with more than 700 different illnesses, but we bring a model to help doctors in the diagnosis of very complex scenarios, with comorbidity, long periods of patient exploration by traditional methodology or low prevalence cases. We present a powerful method treating a problematic with great necessity. Following the health line of research and psychiatry application, we analyse in Chapter 4 a probabilistic method to search for behavioral pattern in patients also with mental disorders. In this case it is not the method the contribution of the work but the application and results in collaboration with the clinician interpretation. The model is called SPFM (Sparse Poisson Factorization Model) [22] and consist on a non-parametric probabilistic model based on the Indian Buffet Process (IBP) [20], [21]. It is a exploratory method capable of decomposing the input data in sparse matrixes. For that, it imposes the Poisson distribution to the product of two matrixes, Z and B, both obtained respectively by the IBP and a Gamma distribution. Hence Z corresponds to a binary matrix representing active latent features in a patient data and B weights the contribution of the data characteristics to the latent features. The data we use in the three works described during the chapter refers to different questions from e-health questionnaries. Then, the data characteristics refer to the answer or punctuation on each question and the latent features from different behavioral patterns in a patient regarding the selection of features active in their questionnaires. For example, patient X can present feature 1 and 2 and patient Y may presence feature 1 and 3, giving as a result two different profiles of behavioral. With these procedure we study three scenarios. In the first problematic, we relate the profiles with the diagnoses, finding common patterns among the patients and connections between diseases. We also analyse the grade of critical state and contrast the clinician judgment via the Clinical Global Impression (CGI). In the second scenario, we pursue a similar study and find out connections between disturbed sleeping patterns and clinical markers of wish to die. We focus this analysis in patients with suicidal thoughts due to the problematic that those individuals suppose as a major public health issue [175]. In this case we vary the questionnarie and the data sample, obtaining different profiles also with important information to interpret by the psychiatrist. The main contribution of this work is the proportion of a mechanism capable of helping with detection and prevention of suicide. Finally, the third work comprehend a behavioral pattern study in mental health patient before and during covid-19 lockdown. We did not want to lose the chance to contribute during coronavirus disease outbreak and presented a study about the changes in psychiatric patients during the alarm state. We analyse again the profiles with the previous e-health questionnaire and discover that the self-reported suicide risk decreased during the lockdown. These results contrast with others studies [237] and suppose signs for an increase in suicidal ideation once the crisis ceases. Finally, Chapter 5 propose a regularization mechanism based in a theoretical idea from [245] to obtain a variance reduction in the real risk. We interpret the robust regularized risk that those authors propose in a two-step mechanism formed by the minimization of the weighted risk and the maximization of a robust objective and suggest an idea to apply this methodology in a way to select the samples from the mini-batch in a deep learning set up. We study different variations of repeating the worst performed samples from the previous mini-bath during the training procedure and show proves of improvements in the accuracy and faster convergence rates of a image classification problem with different architectures and datasets.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Joaquín Míguez Arenas.- Secretario: Francisco Jesús Rodríguez Ruiz.- Vocal: Santiago Ovejero Garcí

    Learned simulation as the engine of physical scene understanding

    Get PDF
    La cognición humana evoca las habilidades del razonamiento, la comunicación y la interacción. Esto incluye la interpretación de la física del mundo real para comprender las leyes que subyacen en ella. Algunas teorías postulan la semejanza entre esta capacidad de razonamiento con simulaciones para interpretar la física de la escena, que abarca la percepción para la comprensión del estado físico actual, y el razonamiento acerca de la evolución temporal de un sistema dado. En este contexto se propone el desarrollo de un sistema para realizar simulación aprendida. Establecido un objetivo, el algoritmo se entrena para aprender una aproximación de la dinámica real, para construir así un gemelo digital del entorno. Entonces, el sistema de simulación emulará la física subyacente con información obtenida mediante observaciones de la escena. Para ello, se empleará una cámara estéreo para adquirir datos a partir de secuencias de video. El trabajo se centra los fenómenos oscilatorios de fluidos. Los fluidos están presentes en muchas de nuestras acciones diarias y constituyen un reto físico para el sistema propuesto. Son deformables, no lineales, y presentan un carácter disipativo dominante, lo que los convierte en un sistema complejo para ser aprendido. Además, sólo se tiene acceso a mediciones parciales de su estado ya que la cámara sólo proporciona información acerca de la superficie libre. El resultado es un sistema capaz de percibir y razonar sobre la dinámica del fluido. El gemelo digital cognitivo así construido proporciona una interpretación del estado del mismo para integrar su evolución en tiempo real, aprendiendo con información observada del gemelo físico. El sistema, entrenado originalmente para un líquido concreto, se adaptará a cualquier otro a través del aprendizaje por refuerzo produciendo así resultados precisos para líquidos desconocidos. Finalmente, se emplea la realidad aumentada (RA) para ofrecer una representación visual de los resultados, así como información adicional sobre el estado del líquido que no es accesible al ojo humano. Este objetivo se alcanza mediante el uso de técnicas de aprendizaje de variedades, y aprendizaje automático, como las redes neuronales, enriquecido con información física. Empleamos sesgos inductivos basados en el conocimiento de la termodinámica para desarrollar un sistema inteligente que cumpla con estos principios para dar soluciones con sentido sobre la dinámica. El problema abordado en esta tesis constituye una dificultad de primer orden en el desarrollo de sistemas robóticos destinados a la manipulación de fluidos. En acciones como el vertido o el movimiento, la oscilación de los líquidos juega un papel importante en el desarrollo de sistemas de asistencia a personas con movilidad reducida o aplicaciones industriales. Cognition evokes human abilities for reasoning, communication, and interaction. This includes the interpretation of real-world physics so as to understand its underlying laws. Theories postulate the similarity of human reasoning about these phenomena with simulations for physical scene understanding, which gathers perception for comprehension of the current dynamical state, and reasoning for time evolution prediction of a given system. In this context, we propose the development of a system for learned simulation. Given a design objective, an algorithm is trained to learn an approximation to the real dynamics to build a digital twin of the environment. Then, the underlying physics will be emulated with information coming from observations of the scene. For this purpose, we use a commodity camera to acquire data exclusively from video recordings. We focus on the sloshing problem as a benchmark. Fluids are widely present in several daily actions and portray a physically rich challenge for the proposed systems. They are highly deformable, nonlinear, and present a dominant dissipative behavior, making them a complex entity to be emulated. In addition, we only have access to partial measurements of their dynamical state, since a commodity camera only provides information about the free surface. The result is a system capable of perceiving and reasoning about the dynamics of the fluid. This cognitive digital twin provides an interpretation of the state of the fluid to integrate its dynamical evolution in real-time, updated with information observed from the real twin. The system, trained originally for one liquid, will be able to adapt itself to any other fluid through reinforcement learning and produce accurate results for previously unseen liquids. Augmented reality is used in the design of this application to offer a visual interpretation of the solutions to the user, and include information about the dynamics that is not accessible to the human eye. This objective is to be achieved through the use of manifold learning and machine learning techniques, such as neural networks, enriched with physics information. We use inductive biases based on the knowledge of thermodynamics to develop machine intelligence systems that fulfill these principles to provide meaningful solutions to the dynamics. This problem is considered one of the main targets in fluid manipulation for the development of robotic systems. Pursuing actions such as pouring or moving, sloshing dynamics play a capital role for the correct performance of aiding systems for the elderly or industrial applications that involve liquids. <br /

    Training deep retrieval models with noisy datasets

    Get PDF
    In this thesis we study loss functions that allow to train Convolutional Neural Networks (CNNs) under noisy datasets for the particular task of Content- Based Image Retrieval (CBIR). In particular, we propose two novel losses to fit models that generate global image representations. First, a Soft-Matching (SM) loss, exploiting both image content and meta data, is used to specialized general CNNs to particular cities or regions using weakly annotated datasets. Second, a Bag Exponential (BE) loss inspired by the Multiple Instance Learning (MIL) framework is employed to train CNNs for CBIR under noisy datasets. The first part of the thesis introduces a novel training framework that, relying on image content and meta data, learns location-adapted deep models that provide fine-tuned image descriptors for specific visual contents. Our networks, which start from a baseline model originally learned for a different task, are specialized using a custom pairwise loss function, our proposed SM loss, that uses weak labels based on image content and meta data. The experimental results show that the proposed location-adapted CNNs achieve an improvement of up to a 55% over the baseline networks on a landmark discovery task. This implies that the models successfully learn the visual clues and peculiarities of the region for which they are trained, and generate image descriptors that are better location-adapted. In addition, for those landmarks that are not present on the training set or even other cities, our proposed models perform at least as well as the baseline network, which indicates a good resilience against overfitting. The second part of the thesis introduces the BE Loss function to train CNNs for image retrieval borrowing inspiration from the MIL framework. The loss combines the use of an exponential function acting as a soft margin, and a MILbased mechanism working with bags of positive and negative pairs of images. The method allows to train deep retrieval networks under noisy datasets, by weighing the influence of the different samples at loss level, which increases the performance of the generated global descriptors. The rationale behind the improvement is that we are handling noise in an end-to-end manner and, therefore, avoiding its negative influence as well as the unintentional biases due to fixed pre-processing cleaning procedures. In addition, our method is general enough to suit other scenarios requiring different weights for the training instances (e.g. boosting the influence of hard positives during training). The proposed bag exponential function can bee seen as a back door to guide the learning process according to a certain objective in a end-to-end manner, allowing the model to approach such an objective smoothly and progressively. Our results show that our loss allows CNN-based retrieval systems to be trained with noisy training sets and achieve state-of-the-art performance. Furthermore, we have found that it is better to use training sets that are highly correlated with the final task, even if they are noisy, than training with a clean set that is only weakly related with the topic at hand. From our point of view, this result represents a big leap in the applicability of retrieval systems and help to reduce the effort needed to set-up new CBIR applications: e.g. by allowing a fast automatic generation of noisy training datasets and then using our bag exponential loss to deal with noise. Moreover, we also consider that this result opens a new line of research for CNN-based image retrieval: let the models decide not only on the best features to solve the task but also on the most relevant samples to do it.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Luis Salgado Álvarez de Sotomayor.- Secretario: Pablos Martínez Olmos.- Vocal: Ernest Valveny Llobe

    Classification and regression with functional data: a mathematical optimization approach.

    Get PDF
    El objetivo de esta tesis doctoral es desarrollar nuevos métodos para la clasificación y regresión supervisada en el Análisis de Datos Funcionales. En particular, las herramientas de Optimización Matemática analizadas en esta tesis explotan la naturaleza funcional de los datos, dando lugar a nuevas técnicas que pueden mejorar los métodos clásicos y que conectan las matemáticas con las aplicaciones. El Capítulo 1 presenta las ideas generales, los retos y la notación usada a lo largo de la tesis. El Capítulo 2 trata el problema de seleccionar el conjunto finito de instantes de tiempo que mejor clasifica datos funcionales multivariados en dos clases predefinidas. El uso, no sólo de la información proporcionada por la propia función, sino también por sus derivadas será decisivo para mejorar la predicción, como se pondrá de manifiesto posteriormente. Para ello se formula un problema de optimización binivel continuo. Dicho problema combina la aplicación de la conocida técnica SVM (Support Vector Machine) con la maximización de la correlación entre la etiqueta de la clase y la denominada función score, vinculada a dicha técnica. El Capítulo 3 también se centra en la clasificación binaria de datos funcionales usando SVM. Sin embargo, en lugar de buscar los instantes de tiempo más relevantes, aquí se define un ancho de banda funcional para la denominada función kernel. De esta forma, se puede mejorar el rendimiento del clasificador, a la vez que se identifican los diferentes intervalos del dominio de la función, de acuerdo a su capacidad predictiva, mejorando además la interpretabilidad del modelo resultante. La obtención de tales intervalos se lleva a cabo mediante la resolución de un problema de optimización binivel por medio de un algoritmo alternante. El Capítulo 4 se centra en la clasificación de los llamados datos funcionales híbridos, es decir, datos que están formados por variables funcionales y estáticas (constantes a lo largo del tiempo). El objetivo es seleccionar las variables, funcionales o estáticas, que mejor clasifiquen. Para ello, se define un kernel no isotrópico que asocia un parámetro ancho de banda escalar a cada una de las variables. De forma análoga a como se ha hecho en los capítulos anteriores, se propone un algoritmo alternante para resolver el problema de optimización binivel, que permite resolver los parámetros del kernel. El problema de selección de variables presentado en el Capítulo 2 se generaliza al campo de la regresión en el Capítulo 5. El método de resolución combina la técnica denominada SVR (Support Vector Regression) con la minimización de la suma de los cuadrados de los residuos entre la verdadera variable respuesta y la prevista. Todos los algoritmos propuestos a lo largo de esta tesis han sido aplicados a bases de datos sintéticas y reales, quedando probada su efectividad.The goal of this PhD dissertation is to develop new approaches for supervised classification and regression in Functional Data Analysis. articularly, the Mathematical optimization tools analyzed in this thesis exploit the functional nature of the data, leading to novel strategies which may outperform the standard methodologies and link mathematics with real-life applications. Chapter 1 presents the main ideas, challenges and the notation used in this thesis. Chapter 2 addresses the problem of selecting a finite set of time instants which best classify multivariate functional data into two predefined classes. Using, not only the information provided by the function itself but also its high-order derivatives will be crucial to improve the accuracy. To do this, a continuous bilevel optimization problem is solved. Such problem combines the resolution of the well-known technique SVM (Support Vector Machine) with the maximization of the correlation between the class label and the score. Chapter 3 also focuses on the binary classification problem using SVM. However, instead of finding the most important time instants, here we define a functional bandwidth in the so-called kernel function. In this way, accuracy may be improved and the most relevant intervals of the domain of the function, according to their classification ability, are identified, enhancing the interpretability. A bilevel optimization problem is formulated and solved by means of an alternating procedure. Chapter 4 is focused on classifying the so-called hybrid functional data, i.e., data which are formed by functional and static (constant over time) covariates. The goal is to select the features, functional or static, which best classify. An anisotropic kernel which associates a scalar bandwidth to each feature is defined. As in previous chapters, an alternating approach is proposed to solve a bilevel optimization problem. Chapter 5 generalizes the variable selection problem presented in Chapter 2 to regression. The solution approach combines the SVR (Support Vector Regression) problem with the minimization of sum of the squared residuals between the actual and predicted responses. An alternating heuristic is developed to handle such model. All the methodologies presented along this dissertation are tested in synthetic and real data sets, showing their applicability.Premio Extraordinario de Doctorado U

    On the controllability of Partial Differential Equations involving non-local terms and singular potentials

    Get PDF
    In this thesis, we investigate controllability and observability properties of Partial Differential Equations describing various phenomena appearing in several fields of the applied sciences such as elasticity theory, ecology, anomalous transport and diffusion, material science, porous media flow and quantum mechanics. In particular, we focus on evolution Partial Differential Equations with non-local and singular terms. Concerning non-local problems, we analyse the interior controllability of a Schr\"odinger and a wave-type equation in which the Laplace operator is replaced by the fractional Laplacian (Δ)s(-\Delta)^s. Under appropriate assumptions on the order ss of the fractional Laplace operator involved, we prove the exact null controllability of both equations, employing a L2L^2 control supported in a neighbourhood ω\omega of the boundary of a bounded C1,1C^{1,1} domain ΩRN\Omega\subset\mathbb{R}^N. More precisely, we show that both the Schrodinger and the wave equation are null-controllable, for s1/2s\geq 1/2 and for s1s\geq 1 respectively. Furthermore, these exponents are sharp and controllability fails for s<1/2s<1/2 (resp. s<1s<1) for the Schrödinger (resp. wave) equation. Our proof is based on multiplier techniques and the very classical Hilbert Uniqueness Method. For models involving singular terms, we firstly address the boundary controllability problem for a one-dimensional heat equation with the singular inverse-square potential V(x):=μ/x2V(x):=\mu/x^2, whose singularity is localised at one extreme of the space interval (0,1)(0,1) in which the PDE is defined. For all 0<μ<1/40<\mu<1/4, we obtain the null controllability of the equation, acting with a L2L^2 control located at x=0x=0, which is both a boundary point and the pole of the potential. This result follows from analogous ones presented in \cite{gueye2014exact} for parabolic equations with variable degenerate coefficients. Finally, we study the interior controllability of a heat equation with the singular inverse-square potential Λ(x):=μ/δ2\Lambda(x):=\mu/\delta^2, involving the distance δ\delta to the boundary of a bounded and C2C^2 domain ΩRN\Omega\subset\mathbb{R}^N, N3N\geq 3. For all μ1/4\mu\leq 1/4 (the critical Hardy constant associated to the potential Λ\Lambda), we obtain the null controllability employing a L2L^2 control supported in an open subset ωΩ\omega\subset\Omega. Moreover, we show that the upper bound μ=1/4\mu=1/4 is sharp. Our proof relies on a new Carleman estimate, obtained employing a weight properly designed for compensating the singularities of the potential

    Topological Data Analysis of High-dimensional Correlation Structures with Applications in Epigenetics

    Get PDF
    This thesis comprises a comprehensive study of the correlation of highdimensional datasets from a topological perspective. Derived from a lack of efficient algorithms of big data analysis and motivated by the importance of finding a structure of correlations in genomics, we have developed two analytical tools inspired by the topological data analysis approach that describe and predict the behavior of the correlated design. Those models allowed us to study epigenetic interactions from a local and global perspective, taking into account the different levels of complexity. We applied graph-theoretic and algebraic topology principles to quantify structural patterns on local correlation networks and, based on them, we proposed a network model that was able to predict the locally high correlations of DNA methylation data. This model provided with an efficient tool to measure the evolution of the correlation with the aging process. Furthermore, we developed a powerful computational algorithm to analyze the correlation structure globally that was able to detect differentiated methylation patterns over sample groups. This methodology aimed to serve as a diagnostic tool, as it provides with selected epigenetic biomarkers associated with a specific phenotype of interest. Overall, this work establishes a novel perspective of analysis and modulation of hidden correlation structures, specifically those of great dimension and complexity, contributing to the understanding of the epigenetic processes, and that is designed to be useful for non-biological fields too

    Bioinformatic analysis and deep learning on large-scale human transcriptomic data: studies on aging, Alzheimer’s neurodegeneration and cancer

    Get PDF
    [ES] El objetivo general del proyecto ha sido el análisis bioinformático integrativo de datos múltiples de proteómica y genómica combinados con datos clínicos asociados para la búsqueda de biomarcadores y módulos poligénicos causales aplicado a enfermedades complejas; principalmente, cáncer de origen primario desconocido, en sus distintos tipos y subtipos y enfermedades neurodegenerativas (ND) mayormente Alzheimer, además de neurodegeneración debida a la edad. Además, se ha hecho un uso intensivo de técnicas de inteligencia artificial, más en concreto de técnicas de redes neuronales de aprendizaje profundo para el análisis y pronóstico de dichas enfermedades
    corecore