760 research outputs found

    Learning representations for supervised information fusion using tensor decompositions and deep learning methods

    Get PDF
    Machine learning is aimed at the automatic extraction of semantic-level information from potentially raw and unstructured data. A key challenge in building intelligent systems lies in the ability to extract and fuse information from multiple sources. In the present thesis, this challenge is addressed by using representation learning, which has been one of the most important innovations in machine learning in the last decade. Representation learning is the basis for modern approaches to natural language processing and artificial neural networks, in particular deep learning, which includes popular models such as convolutional neural networks (CNN) and recurrent neural networks (RNN). It has also been shown that many approaches to tensor decomposition and multi-way models can also be related to representation learning. Tensor decompositions have been applied to a variety of tasks, e.g., knowledge graph modeling and electroencephalography (EEG) data analysis. In this thesis, we focus on machine learning models based on recent representation learning techniques, which can combine information from multiple channels by exploiting their inherent multi-channel data structure. This thesis is divided into three main sections. In the first section, we describe a neural network architecture for fusing multi-channel representations. Additionally, we propose a self-attention mechanism that dynamically weights learned representations from various channels based on the system context. We apply this method to the modeling of distributed sensor networks and demonstrate the effectiveness of our model on three real-world sensor network datasets. In the second section, we examine how tensor factorization models can be applied to modeling relationships between multiple input channels. We apply tensor decomposition models, such as CANDECOMP/PARAFAC (CP) and tensor train decomposition, in a novel way to high-dimensional and sparse data tensors, in addition to showing how they can be used for machine learning tasks, such as regression and classification. Furthermore, we illustrate how the tensor models can be extended to continuous inputs by learning a mapping from the continuous inputs to the latent representations. We apply our approach to the modeling of inverse dynamics, which is crucial for accurate feedforward robot control. Our experimental results show competitive performance of the proposed functional tensor model, with significantly decreased training and inference time when compared to state-of-the-art methods. In the third part, we show how the multi-modal information from both a statistical semantic model and a visual model can be fused to improve the task of visual relationship detection. In this sense, we combine standard visual models for object detection, based on convolutional neural networks, with latent variable models based on tensor factorization for link prediction. Specifically, we propose two approaches for the fusion of semantic and sensory information. The first approach uses a probabilistic framework, whereas the second makes use of a multi-way neural network architecture. Our experimental results on the recently published Stanford Visual Relationship dataset, a challenging real-world dataset, show that the integration of a statistical semantic model using link prediction methods can significantly improve visual relationship detection.Maschinelles Lernen zielt auf die automatische Extraktion semantischer Information aus zum Teil rohen und unstrukturierten Daten. Eine entscheidende Herausforderung beim Entwurf intelligenter Systeme, besteht darin Informationen aus verschiedenen Quellen zu extrahieren und zu fusionieren. In dieser Arbeit wird diesen Herausforderungen mit Methoden des Repräsentations-Lernens begegnet, welche eine der bedeutendsten Innovationen im Maschinellen Lernen in der letzten Dekade darstellt. Repräsentations-Lernen ist die Basis für moderne Ansätze zur Verarbeitung natürlicher Sprache und Modellierung künstlicher Neuronaler Netze, insbesondere dem Deep Learning, welchem beliebte Modelle wie Convolutional Neural Networks (CNN) und rekurrente neuronale Netze (RNN) zugeordnet werden. Außerdem wurde gezeigt, dass auch viele Ansätze zur Tensor Faktorisierung und Multi-way Modelle als Repräsentations-Lernen interpretiert werden können. Tensor Faktorisierungs Modelle finden Anwendung in einer Vielzahl von Bereichen, wie zum Beispiel der Modellierung von Wissensgraphen und der Elektroenzephalografie (EEG) Daten Analyse. Die hier vorliegende Arbeit konzentriert sich auf aktuelle Techniken des Repräsentations-Lernens, welche Information aus unterschiedlichen Kanälen kombinieren und dabei die inhärente Mehr-Kanal Struktur der Daten ausnutzen. Die Arbeit ist in drei Hauptteile gegliedert. Im ersten Teil wird die Architektur eines neuronalen Netzes beschrieben, welches zur Fusion mehrerer Repräsentationen aus unterschiedlichen Kanälen verwendet wird. Des Weiteren wird ein Attention Mechanismus vorgestellt, welcher dynamisch die gelernten Repräsentationen aus unterschiedlichen Kanälen in Abhängigkeit des aktuellen Systemzustands gewichtet. Die Methode wird zur Modellierung verteilter Sensor Netzwerke angewendet. Dabei wird die Effektivität des Ansatzes anhand dreier Datensätze mit echten Sensor Werten evaluiert. Im zweiten Teil dieser Arbeit wird untersucht, wie Tensor-Faktorisierungs Modelle zur Modellierung von Beziehungen zwischen verschiedenen Eingangs Kanälen verwendet werden können. Dabei werden Tensor Modelle wie CANDECOMP/PARAFAC (CP) und Tensor Train in einer neuartigen Art und Weise auf hochdimensionale und dünnbesetzte Tensoren angewendet. Es wird gezeigt, wie diese Modelle für Aufgaben des maschinellen Lernens, wie Regression und Klassifikation eingesetzt werden können. Desweitern wird gezeigt, wie die Tensor Modelle zu kontinuierlichen Eingangsvariablen erweitert werden können, indem eine Funktion von den kontinuierlichen Eingängen zu der latenten Repräsentation des Faktorisierungs Modells gelernt wird. Der gezeigte Ansatz wird schließlich zur Modellierung inverser Dynamiken angewandt. Die Modellierung inverser Dynamiken ist essenziell für die Vorwärtssteuerung eines Roboters. Die Experimente zeigen, dass das kontinuierliche Tensor Modell vergleichbare Ergebnisse erzielt wie herkömmliche Methoden für diese Aufgabe, wobei sich durch das Tensor Modell sowohl die Trainings als auch die Inferenz Zeit deutlich reduzieren lassen. Im dritten Teil wird gezeigt, wie die multi-modale Information eines statistisch semantischen Modells und eines visuellen Modells fusioniert werden können, um im Bereich der visuellen Infromationsextraktion, speziell dem Erkennen von Beziehungen zwischen visuellen Objekten, verbesserte Ergebnisse zu erzielen. Dabei wird ein gängiges, auf CNNs basierendes, visuelles Modell zur Objekterkennung mit Tensor-Faktorisierungs Modellen zur Modellierung von Wissensgraphen kombiniert. Es werden zwei Ansätze für die Fusion semantischer und sensorischer Information gezeigt. Der erste Ansatz benutzt eine probabilistische Methode, wohingegen der zweite Ansatz ein Multi-way neuronales Netzwerk verwendet um die Informationen zu kombinieren. Die Evaluation auf einem kürzlich veröffentlichten Datensatz (Stanford Visual Relationship Dataset), mit Bildern aus der realen Welt, zeigt, dass die Integration eines statistisch semantischen Modells, die Methoden zur Detektion visueller Objektbeziehungen deutlich verbessert

    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

    Full text link
    Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOCK, a new multimodal fusion based on the block-superdiagonal tensor decomposition. It leverages the notion of block-term ranks, which generalizes both concepts of rank and mode ranks for tensors, already used for multimodal fusion. It allows to define new ways for optimizing the tradeoff between the expressiveness and complexity of the fusion model, and is able to represent very fine interactions between modalities while maintaining powerful mono-modal representations. We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities. Through extensive experiments, we show that BLOCK compares favorably with respect to state-of-the-art multimodal fusion models for both VQA and VRD tasks. Our code is available at https://github.com/Cadene/block.bootstrap.pytorch

    Deep learning for health outcome prediction

    Get PDF
    Modern medical data contains rich information that allows us to make new types of inferences to predict health outcomes. However, the complexity of modern medical data has rendered many classical analysis approaches insufficient. Machine learning with deep neural networks enables computational models to process raw data and learn useful representations with multiple levels of abstraction. In this thesis, I present novel deep learning methods for health outcome prediction from brain MRI and genomic data. I show that a deep neural network can learn a biomarker from structural brain MRI and that this biomarker provides a useful measure for investigating brain and systemic health, can augment neuroradiological research and potentially serve as a decision-support tool in clinical environments. I also develop two tensor methods for deep neural networks: the first, tensor dropout, for improving the robustness of deep neural networks, and the second, Kronecker machines, for combining multiple sources of data to improve prediction accuracy. Finally, I present a novel deep learning method for predicting polygenic risk scores from genome sequences by leveraging both local and global interactions between genetic variants. These contributions demonstrate the benefits of using deep learning for health outcome prediction in both research and clinical settings.Open Acces

    Tensor Decomposition of Large-scale Clinical EEGs Reveals Interpretable Patterns of Brain Physiology

    Full text link
    Identifying abnormal patterns in electroencephalography (EEG) remains the cornerstone of diagnosing several neurological diseases. The current clinical EEG review process relies heavily on expert visual review, which is unscalable and error-prone. In an effort to augment the expert review process, there is a significant interest in mining population-level EEG patterns using unsupervised approaches. Current approaches rely either on two-dimensional decompositions (e.g., principal and independent component analyses) or deep representation learning (e.g., auto-encoders, self-supervision). However, most approaches do not leverage the natural multi-dimensional structure of EEGs and lack interpretability. In this study, we propose a tensor decomposition approach using the canonical polyadic decomposition to discover a parsimonious set of population-level EEG patterns, retaining the natural multi-dimensional structure of EEGs (time x space x frequency). We then validate their clinical value using a cohort of patients including varying stages of cognitive impairment. Our results show that the discovered patterns reflect physiologically meaningful features and accurately classify the stages of cognitive impairment (healthy vs mild cognitive impairment vs Alzheimer's dementia) with substantially fewer features compared to classical and deep learning-based baselines. We conclude that the decomposition of population-level EEG tensors recovers expert-interpretable EEG patterns that can aid in the study of smaller specialized clinical cohorts.Comment: 4 pages, 3 Figures, 2 Tables; Under submission at IEEE NE

    Machine Learning for Fluid Mechanics

    Full text link
    The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
    corecore