17 research outputs found

    Interpretable Multivariate Time Series Forecasting with Temporal Attention Convolutional Neural Networks

    Get PDF
    Data in time series format, such as biological signals from medical sensors or machine signals from sensors in industrial environments are rich sources of information that can give crucial insights on the present and future condition of a person or machine. The task of predicting future values of time series has been initially approached with simple machine learning methods, and lately with deep learning. Two models that have shown good performance in this task are the temporal convolutional network and the attention module. However, despite the promising results of deep learning methods, their black-box nature makes them unsuitable for real-world applications where the predictions need to be explainable in order to be trusted. In this paper we propose an architecture comprised of a temporal convolutional network with an attention mechanism that makes predictions while presenting the timesteps of the input that were most influential for future outputs. We apply it on two datasets and we show that we gain interpretability without degrading the accuracy compared to the original temporal convolutional models. We then go one step further and we combine our configuration with various machine learning methods on top, creating a pipeline that achieves interpretability both across timesteps and input features. We use it to forecast a different variable from one of the above datasets and we study how the accuracy is affected compared to the original black-box approach

    When we can trust computers (and when we can't)

    Get PDF
    With the relentless rise of computer power, there is a widespread expectation that computers can solve the most pressing problems of science, and even more besides. We explore the limits of computational modelling and conclude that, in the domains of science and engineering which are relatively simple and firmly grounded in theory, these methods are indeed powerful. Even so, the availability of code, data and documentation, along with a range of techniques for validation, verification and uncertainty quantification, are essential for building trust in computer-generated findings. When it comes to complex systems in domains of science that are less firmly grounded in theory, notably biology and medicine, to say nothing of the social sciences and humanities, computers can create the illusion of objectivity, not least because the rise of big data and machine-learning pose new challenges to reproducibility, while lacking true explanatory power. We also discuss important aspects of the natural world which cannot be solved by digital means. In the long term, renewed emphasis on analogue methods will be necessary to temper the excessive faith currently placed in digital computation. This article is part of the theme issue 'Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico'

    Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests

    Full text link
    Multiple Instance Learning (MIL) is a sub-domain of classification problems with positive and negative labels and a "bag" of inputs, where the label is positive if and only if a positive element is contained within the bag, and otherwise is negative. Training in this context requires associating the bag-wide label to instance-level information, and implicitly contains a causal assumption and asymmetry to the task (i.e., you can't swap the labels without changing the semantics). MIL problems occur in healthcare (one malignant cell indicates cancer), cyber security (one malicious executable makes an infected computer), and many other tasks. In this work, we examine five of the most prominent deep-MIL models and find that none of them respects the standard MIL assumption. They are able to learn anti-correlated instances, i.e., defaulting to "positive" labels until seeing a negative counter-example, which should not be possible for a correct MIL model. We suspect that enhancements and other works derived from these models will share the same issue. In any context in which these models are being used, this creates the potential for learning incorrect models, which creates risk of operational failure. We identify and demonstrate this problem via a proposed "algorithmic unit test", where we create synthetic datasets that can be solved by a MIL respecting model, and which clearly reveal learning that violates MIL assumptions. The five evaluated methods each fail one or more of these tests. This provides a model-agnostic way to identify violations of modeling assumptions, which we hope will be useful for future development and evaluation of MIL models.Comment: To appear in the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction

    Full text link
    Various deep learning models have been proposed for 3D bone shape reconstruction from two orthogonal (biplanar) X-ray images. However, it is unclear how these models compare against each other since they are evaluated on different anatomy, cohort and (often privately held) datasets. Moreover, the impact of the commonly optimized image-based segmentation metrics such as dice score on the estimation of clinical parameters relevant in 2D-3D bone shape reconstruction is not well known. To move closer toward clinical translation, we propose a benchmarking framework that evaluates tasks relevant to real-world clinical scenarios, including reconstruction of fractured bones, bones with implants, robustness to population shift, and error in estimating clinical parameters. Our open-source platform provides reference implementations of 8 models (many of whose implementations were not publicly available), APIs to easily collect and preprocess 6 public datasets, and the implementation of automatic clinical parameter and landmark extraction methods. We present an extensive evaluation of 8 2D-3D models on equal footing using 6 public datasets comprising images for four different anatomies. Our results show that attention-based methods that capture global spatial relationships tend to perform better across all anatomies and datasets; performance on clinically relevant subgroups may be overestimated without disaggregated reporting; ribs are substantially more difficult to reconstruct compared to femur, hip and spine; and the dice score improvement does not always bring a corresponding improvement in the automatic estimation of clinically relevant parameters.Comment: accepted to NeurIPS 202

    On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

    Full text link
    While learning with limited labelled data can improve performance when the labels are lacking, it is also sensitive to the effects of uncontrolled randomness introduced by so-called randomness factors (e.g., varying order of data). We propose a method to systematically investigate the effects of randomness factors while taking the interactions between them into consideration. To measure the true effects of an individual randomness factor, our method mitigates the effects of other factors and observes how the performance varies across multiple runs. Applying our method to multiple randomness factors across in-context learning and fine-tuning approaches on 7 representative text classification tasks and meta-learning on 3 tasks, we show that: 1) disregarding interactions between randomness factors in existing works caused inconsistent findings due to incorrect attribution of the effects of randomness factors, such as disproving the consistent sensitivity of in-context learning to sample order even with random sample selection; and 2) besides mutual interactions, the effects of randomness factors, especially sample order, are also dependent on more systematic choices unexplored in existing works, such as number of classes, samples per class or choice of prompt format
    corecore