305 research outputs found

    Explainable Physics-informed Deep Learning for Rainfall-runoff Modeling and Uncertainty Assessment across the Continental United States

    Get PDF
    Hydrologic models provide a comprehensive tool to calibrate streamflow response to environmental variables. Various hydrologic modeling approaches, ranging from physically based to conceptual to entirely data-driven models, have been widely used for hydrologic simulation. During the recent years, however, Deep Learning (DL), a new generation of Machine Learning (ML), has transformed hydrologic simulation research to a new direction. DL methods have recently proposed for rainfall-runoff modeling that complement both distributed and conceptual hydrologic models, particularly in a catchment where data to support a process-based model is scared and limited. This dissertation investigated the applicability of two advanced probabilistic physics-informed DL algorithms, i.e., deep autoregressive network (DeepAR) and temporal fusion transformer (TFT), for daily rainfall-runoff modeling across the continental United States (CONUS). We benchmarked our proposed models against several physics-based hydrologic approaches such as the Sacramento Soil Moisture Accounting Model (SAC-SMA), Variable Infiltration Capacity (VIC), Framework for Understanding Structural Errors (FUSE), Hydrologiska Byråns Vattenbalansavdelning (HBV), and the mesoscale hydrologic model (mHM). These benchmark models can be distinguished into two different groups. The first group are the models calibrated for each basin individually (e.g., SAC-SMA, VIC, FUSE2, mHM and HBV) while the second group, including our physics-informed approaches, is made up of the models that were regionally calibrated. Models in this group share one parameter set for all basins in the dataset. All the approaches were implemented and tested using Catchment Attributes and Meteorology for Large-sample Studies (CAMELS)\u27s Maurer datasets. We developed the TFT and DeepAR with two different configurations i.e., with (physics-informed model) and without (the original model) static attributes. Various catchment static and dynamic physical attributes were incorporated into the pipeline with various spatiotemporal variabilities to simulate how a drainage system responds to rainfall-runoff processes. To demonstrate how the model learned to differentiate between different rainfall–runoff behaviors across different catchments and to identify the dominant process, sensitivity and explainability analysis of modeling outcomes are also performed. Despite recent advancements, deep networks are perceived as being challenging to parameterize; thus, their simulation may propagate error and uncertainty in modeling. To address uncertainty, a quantile likelihood function was incorporated as the TFT loss function. The results suggest that the physics-informed TFT model was superior in predicting high and low flow fluctuations compared to the original TFT and DeepAR models (without static attributes) or even the physics-informed DeepAR. Physics-informed TFT model well recognized which static attributes more contributing to streamflow generation of each specific catchment considering its climate, topography, land cover, soil, and geological conditions. The interpretability and the ability of the physics-informed TFT model to assimilate the multisource of information and parameters make it a strong candidate for regional as well as continental-scale hydrologic simulations. It was noted that both physics-informed TFT and DeepAR were more successful in learning the intermediate flow and high flow regimes rather than the low flow regime. The advantage of the high flow can be attributed to learning a more generalizable mapping between static and dynamic attributes and runoff parameters. It seems both TFT and DeepAR may have enabled the learning of some true processes that are missing from both conceptual and physics-based models, possibly related to deep soil water storage (the layer where soil water is not sensitive to daily evapotranspiration), saturated hydraulic conductivity, and vegetation dynamics

    MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks

    Full text link
    Predicting multiple real-world tasks in a single model often requires a particularly diverse feature space. Multimodal (MM) models aim to extract the synergistic predictive potential of multiple data types to create a shared feature space with aligned semantic meaning across inputs of drastically varying sizes (i.e. images, text, sound). Most current MM architectures fuse these representations in parallel, which not only limits their interpretability but also creates a dependency on modality availability. We present MultiModN, a multimodal, modular network that fuses latent representations in a sequence of any number, combination, or type of modality while providing granular real-time predictive feedback on any number or combination of predictive tasks. MultiModN's composable pipeline is interpretable-by-design, as well as innately multi-task and robust to the fundamental issue of biased missingness. We perform four experiments on several benchmark MM datasets across 10 real-world tasks (predicting medical diagnoses, academic performance, and weather), and show that MultiModN's sequential MM fusion does not compromise performance compared with a baseline of parallel fusion. By simulating the challenging bias of missing not-at-random (MNAR), this work shows that, contrary to MultiModN, parallel fusion baselines erroneously learn MNAR and suffer catastrophic failure when faced with different patterns of MNAR at inference. To the best of our knowledge, this is the first inherently MNAR-resistant approach to MM modeling. In conclusion, MultiModN provides granular insights, robustness, and flexibility without compromising performance.Comment: Accepted as a full paper at NeurIPS 2023 in New Orleans, US

    Sequence modelling for e-commerce

    Get PDF

    Automatic generation of natural language descriptions of visual data: describing images and videos using recurrent and self-attentive models

    Get PDF
    Humans are faced with a constant flow of visual stimuli, e.g., from the environment or when looking at social media. In contrast, visually-impaired people are often incapable to perceive and process this advantageous and beneficial information that could help maneuver them through everyday situations and activities. However, audible feedback such as natural language can give them the ability to better be aware of their surroundings, thus enabling them to autonomously master everyday's challenges. One possibility to create audible feedback is to produce natural language descriptions for visual data such as still images and then read this text to the person. Moreover, textual descriptions for images can be further utilized for text analysis (e.g., sentiment analysis) and information aggregation. In this work, we investigate different approaches and techniques for the automatic generation of natural language of visual data such as still images and video clips. In particular, we look at language models that generate textual descriptions with recurrent neural networks: First, we present a model that allows to generate image captions for scenes that depict interactions between humans and branded products. Thereby, we focus on the correct identification of the brand name in a multi-task training setting and present two new metrics that allow us to evaluate this requirement. Second, we explore the automatic answering of questions posed for an image. In fact, we propose a model that generates answers from scratch instead of predicting an answer from a limited set of possible answers. In comparison to related works, we are therefore able to generate rare answers, which are not contained in the pool of frequent answers. Third, we review the automatic generation of doctors' reports for chest X-ray images. That is, we introduce a model that can cope with a dataset bias of medical datasets (i.e., abnormal cases are very rare) and generates reports with a hierarchical recurrent model. We also investigate the correlation between the distinctiveness of the report and the score in traditional metrics and find a discrepancy between good scores and accurate reports. Then, we examine self-attentive language models that improve computational efficiency and performance over the recurrent models. Specifically, we utilize the Transformer architecture. First, we expand the automatic description generation to the domain of videos where we present a video-to-text (VTT) model that can easily synchronize audio-visual features. With an extensive experimental exploration, we verify the effectiveness of our video-to-text translation pipeline. Finally, we revisit our recurrent models with this self-attentive approach

    Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries

    Get PDF
    This two-volume set LNCS 12962 and 12963 constitutes the thoroughly refereed proceedings of the 7th International MICCAI Brainlesion Workshop, BrainLes 2021, as well as the RSNA-ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge, the Federated Tumor Segmentation (FeTS) Challenge, the Cross-Modality Domain Adaptation (CrossMoDA) Challenge, and the challenge on Quantification of Uncertainties in Biomedical Image Quantification (QUBIQ). These were held jointly at the 23rd Medical Image Computing for Computer Assisted Intervention Conference, MICCAI 2020, in September 2021. The 91 revised papers presented in these volumes were selected form 151 submissions. Due to COVID-19 pandemic the conference was held virtually. This is an open access book

    Interactive, multi-purpose traffic prediction platform using connected vehicles dataset

    Get PDF
    Traffic congestion is a perennial issue because of the increasing traffic demand yet limited budget for maintaining current transportation infrastructure; let alone expanding them. Many congestion management techniques require timely and accurate traffic estimation and prediction. Examples of such techniques include incident management, real-time routing, and providing accurate trip information based on historical data. In this dissertation, a speech-powered traffic prediction platform is proposed, which deploys a new deep learning algorithm for traffic prediction using Connected Vehicles (CV) data. To speed-up traffic forecasting, a Graph Convolution -- Gated Recurrent Unit (GC-GRU) architecture is proposed and analysis of its performance on tabular data is compared to state-of-the-art models. GC-GRU's Mean Absolute Percentage Error (MAPE) was very close to Transformer (3.16 vs 3.12) while achieving the fastest inference time and a six-fold faster training time than Transformer, although Long-Short-Term Memory (LSTM) was the fastest in training. Such improved performance in traffic prediction with a shorter inference time and competitive training time allows the proposed architecture to better cater to real-time applications. This is the first study to demonstrate the advantage of using multiscale approach by combining CV data with conventional sources such as Waze and probe data. CV data was better at detecting short duration, Jam and stand-still incidents and detected them earlier as compared to probe. CV data excelled at detecting minor incidents with a 90 percent detection rate versus 20 percent for probes and detecting them 3 minutes faster. To process the big CV data faster, a new algorithm is proposed to extract the spatial and temporal features from the CSV files into a Multiscale Data Analysis (MDA). The algorithm also leverages Graphics Processing Unit (GPU) using the Nvidia Rapids framework and Dask parallel cluster in Python. The results show a seventy-fold speedup in the data Extract, Transform, Load (ETL) of the CV data for the State of Missouri of an entire day for all the unique CV journeys (reducing the processing time from about 48 hours to 25 minutes). The processed data is then fed into a customized UNet model that learns highlevel traffic features from network-level images to predict large-scale, multi-route, speed and volume of CVs. The accuracy and robustness of the proposed model are evaluated by taking different road types, times of day and image snippets of the developed model and comparable benchmarks. To visually analyze the historical traffic data and the results of the prediction model, an interactive web application powered by speech queries is built to offer accurate and fast insights of traffic performance, and thus, allow for better positioning of traffic control strategies. The product of this dissertation can be seamlessly deployed by transportation authorities to understand and manage congestions in a timely manner.Includes bibliographical references

    Learning representations for supervised information fusion using tensor decompositions and deep learning methods

    Get PDF
    Machine learning is aimed at the automatic extraction of semantic-level information from potentially raw and unstructured data. A key challenge in building intelligent systems lies in the ability to extract and fuse information from multiple sources. In the present thesis, this challenge is addressed by using representation learning, which has been one of the most important innovations in machine learning in the last decade. Representation learning is the basis for modern approaches to natural language processing and artificial neural networks, in particular deep learning, which includes popular models such as convolutional neural networks (CNN) and recurrent neural networks (RNN). It has also been shown that many approaches to tensor decomposition and multi-way models can also be related to representation learning. Tensor decompositions have been applied to a variety of tasks, e.g., knowledge graph modeling and electroencephalography (EEG) data analysis. In this thesis, we focus on machine learning models based on recent representation learning techniques, which can combine information from multiple channels by exploiting their inherent multi-channel data structure. This thesis is divided into three main sections. In the first section, we describe a neural network architecture for fusing multi-channel representations. Additionally, we propose a self-attention mechanism that dynamically weights learned representations from various channels based on the system context. We apply this method to the modeling of distributed sensor networks and demonstrate the effectiveness of our model on three real-world sensor network datasets. In the second section, we examine how tensor factorization models can be applied to modeling relationships between multiple input channels. We apply tensor decomposition models, such as CANDECOMP/PARAFAC (CP) and tensor train decomposition, in a novel way to high-dimensional and sparse data tensors, in addition to showing how they can be used for machine learning tasks, such as regression and classification. Furthermore, we illustrate how the tensor models can be extended to continuous inputs by learning a mapping from the continuous inputs to the latent representations. We apply our approach to the modeling of inverse dynamics, which is crucial for accurate feedforward robot control. Our experimental results show competitive performance of the proposed functional tensor model, with significantly decreased training and inference time when compared to state-of-the-art methods. In the third part, we show how the multi-modal information from both a statistical semantic model and a visual model can be fused to improve the task of visual relationship detection. In this sense, we combine standard visual models for object detection, based on convolutional neural networks, with latent variable models based on tensor factorization for link prediction. Specifically, we propose two approaches for the fusion of semantic and sensory information. The first approach uses a probabilistic framework, whereas the second makes use of a multi-way neural network architecture. Our experimental results on the recently published Stanford Visual Relationship dataset, a challenging real-world dataset, show that the integration of a statistical semantic model using link prediction methods can significantly improve visual relationship detection.Maschinelles Lernen zielt auf die automatische Extraktion semantischer Information aus zum Teil rohen und unstrukturierten Daten. Eine entscheidende Herausforderung beim Entwurf intelligenter Systeme, besteht darin Informationen aus verschiedenen Quellen zu extrahieren und zu fusionieren. In dieser Arbeit wird diesen Herausforderungen mit Methoden des Repräsentations-Lernens begegnet, welche eine der bedeutendsten Innovationen im Maschinellen Lernen in der letzten Dekade darstellt. Repräsentations-Lernen ist die Basis für moderne Ansätze zur Verarbeitung natürlicher Sprache und Modellierung künstlicher Neuronaler Netze, insbesondere dem Deep Learning, welchem beliebte Modelle wie Convolutional Neural Networks (CNN) und rekurrente neuronale Netze (RNN) zugeordnet werden. Außerdem wurde gezeigt, dass auch viele Ansätze zur Tensor Faktorisierung und Multi-way Modelle als Repräsentations-Lernen interpretiert werden können. Tensor Faktorisierungs Modelle finden Anwendung in einer Vielzahl von Bereichen, wie zum Beispiel der Modellierung von Wissensgraphen und der Elektroenzephalografie (EEG) Daten Analyse. Die hier vorliegende Arbeit konzentriert sich auf aktuelle Techniken des Repräsentations-Lernens, welche Information aus unterschiedlichen Kanälen kombinieren und dabei die inhärente Mehr-Kanal Struktur der Daten ausnutzen. Die Arbeit ist in drei Hauptteile gegliedert. Im ersten Teil wird die Architektur eines neuronalen Netzes beschrieben, welches zur Fusion mehrerer Repräsentationen aus unterschiedlichen Kanälen verwendet wird. Des Weiteren wird ein Attention Mechanismus vorgestellt, welcher dynamisch die gelernten Repräsentationen aus unterschiedlichen Kanälen in Abhängigkeit des aktuellen Systemzustands gewichtet. Die Methode wird zur Modellierung verteilter Sensor Netzwerke angewendet. Dabei wird die Effektivität des Ansatzes anhand dreier Datensätze mit echten Sensor Werten evaluiert. Im zweiten Teil dieser Arbeit wird untersucht, wie Tensor-Faktorisierungs Modelle zur Modellierung von Beziehungen zwischen verschiedenen Eingangs Kanälen verwendet werden können. Dabei werden Tensor Modelle wie CANDECOMP/PARAFAC (CP) und Tensor Train in einer neuartigen Art und Weise auf hochdimensionale und dünnbesetzte Tensoren angewendet. Es wird gezeigt, wie diese Modelle für Aufgaben des maschinellen Lernens, wie Regression und Klassifikation eingesetzt werden können. Desweitern wird gezeigt, wie die Tensor Modelle zu kontinuierlichen Eingangsvariablen erweitert werden können, indem eine Funktion von den kontinuierlichen Eingängen zu der latenten Repräsentation des Faktorisierungs Modells gelernt wird. Der gezeigte Ansatz wird schließlich zur Modellierung inverser Dynamiken angewandt. Die Modellierung inverser Dynamiken ist essenziell für die Vorwärtssteuerung eines Roboters. Die Experimente zeigen, dass das kontinuierliche Tensor Modell vergleichbare Ergebnisse erzielt wie herkömmliche Methoden für diese Aufgabe, wobei sich durch das Tensor Modell sowohl die Trainings als auch die Inferenz Zeit deutlich reduzieren lassen. Im dritten Teil wird gezeigt, wie die multi-modale Information eines statistisch semantischen Modells und eines visuellen Modells fusioniert werden können, um im Bereich der visuellen Infromationsextraktion, speziell dem Erkennen von Beziehungen zwischen visuellen Objekten, verbesserte Ergebnisse zu erzielen. Dabei wird ein gängiges, auf CNNs basierendes, visuelles Modell zur Objekterkennung mit Tensor-Faktorisierungs Modellen zur Modellierung von Wissensgraphen kombiniert. Es werden zwei Ansätze für die Fusion semantischer und sensorischer Information gezeigt. Der erste Ansatz benutzt eine probabilistische Methode, wohingegen der zweite Ansatz ein Multi-way neuronales Netzwerk verwendet um die Informationen zu kombinieren. Die Evaluation auf einem kürzlich veröffentlichten Datensatz (Stanford Visual Relationship Dataset), mit Bildern aus der realen Welt, zeigt, dass die Integration eines statistisch semantischen Modells, die Methoden zur Detektion visueller Objektbeziehungen deutlich verbessert
    corecore