18 research outputs found

    Deep learning for robust and explainable models in computer vision

    Get PDF
    Recent breakthroughs in machine and deep learning (ML and DL) research have provided excellent tools for leveraging enormous amounts of data and optimizing huge models with millions of parameters to obtain accurate networks for image processing. These developments open up tremendous opportunities for using artificial intelligence (AI) in the automation and human assisted AI industry. However, as more and more models are deployed and used in practice, many challenges have emerged. This thesis presents various approaches that address robustness and explainability challenges for using ML and DL in practice. Robustness and reliability are the critical components of any model before certification and deployment in practice. Deep convolutional neural networks (CNNs) exhibit vulnerability to transformations of their inputs, such as rotation and scaling, or intentional manipulations as described in the adversarial attack literature. In addition, building trust in AI-based models requires a better understanding of current models and developing methods that are more explainable and interpretable a priori. This thesis presents developments in computer vision models' robustness and explainability. Furthermore, this thesis offers an example of using vision models' feature response visualization (models' interpretations) to improve robustness despite interpretability and robustness being seemingly unrelated in the related research. Besides methodological developments for robust and explainable vision models, a key message of this thesis is introducing model interpretation techniques as a tool for understanding vision models and improving their design and robustness. In addition to the theoretical developments, this thesis demonstrates several applications of ML and DL in different contexts, such as medical imaging and affective computing

    Efficient deep CNNs for cross-modal automated computer vision under time and space constraints

    Get PDF
    We present an automated computer vision architecture to handle video and image data using the same backbone networks. We show empirical results that lead us to adopt MOBILENETV2 as this backbone architecture. The paper demonstrates that neural architectures are transferable from images to videos through suitable preprocessing and temporal information fusion

    Radial basis function networks for convolutional neural networks to learn similarity distance metric and improve interpretability

    Get PDF
    Radial basis function neural networks (RBFs) are prime candidates for pattern classification and regression and have been used extensively in classical machine learning applications. However, RBFs have not been integrated into contemporary deep learning research and computer vision using conventional convolutional neural networks (CNNs) due to their lack of adaptability with modern architectures. In this paper, we adapt RBF networks as a classifier on top of CNNs by modifying the training process and introducing a new activation function to train modern vision architectures end-to-end for image classification. The specific architecture of RBFs enables the learning of a similarity distance metric to compare and find similar and dissimilar images. Furthermore, we demonstrate that using an RBF classifier on top of any CNN architecture provides new human-interpretable insights about the decision-making process of the models. Finally, we successfully apply RBFs to a range of CNN architectures and evaluate the results on benchmark computer vision datasets

    Automated machine learning in practice : state of the art and recent results

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.A main driver behind the digitization of industry and society is the belief that data-driven model building and decision making can contribute to higher degrees of automation and more informed decisions. Building such models from data often involves the application of some form of machine learning. Thus, there is an ever growing demand in work force with the necessary skill set to do so. This demand has given rise to a new research topic concerned with fitting machine learning models fully automatically – AutoML. This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results of the most important AutoML algorithms

    Bias, awareness, and ignorance in deep-learning-based face recognition

    Get PDF
    Face Recognition (FR) is increasingly influencing our lives: we use it to unlock our phones; police uses it to identify suspects. Two main concerns are associated with this increase in facial recognition: (1) the fact that these systems are typically less accurate for marginalized groups, which can be described as “bias”, and (2) the increased surveillance through these systems. Our paper is concerned with the first issue. Specifically, we explore an intuitive technique for reducing this bias, namely “blinding” models to sensitive features, such as gender or race, and show why this cannot be equated with reducing bias. Even when not designed for this task, facial recognition models can deduce sensitive features, such as gender or race, from pictures of faces—simply because they are trained to determine the “similarity” of pictures. This means that people with similar skin tones, similar hair length, etc. will be seen as similar by facial recognition models. When confronted with biased decision-making by humans, one approach taken in job application screening is to “blind” the human decision-makers to sensitive attributes such as gender and race by not showing pictures of the applicants. Based on a similar idea, one might think that if facial recognition models were less aware of these sensitive features, the difference in accuracy between groups would decrease. We evaluate this assumption—which has already penetrated into the scientific literature as a valid de-biasing method—by measuring how “aware” models are of sensitive features and correlating this with differences in accuracy. In particular, we blind pre-trained models to make them less aware of sensitive attributes. We find that awareness and accuracy do not positively correlate, i.e., that bias ≠ awareness. In fact, blinding barely affects accuracy in our experiments. The seemingly simple solution of decreasing bias in facial recognition rates by reducing awareness of sensitive features does thus not work in practice: trying to ignore sensitive attributes is not a viable concept for less biased FR

    Deep learning in the wild

    Get PDF
    Invited paperDeep learning with neural networks is applied by an increasing number of people outside of classic research environments, due to the vast success of the methodology on a wide range of machine perception tasks. While this interest is fueled by beautiful success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores the specific challenges arising in the realm of real world tasks, based on case studies from research & development in conjunction with industry, and extracts lessons learned from them. It thus fills a gap between the publication of latest algorithmic and methodical developments, and the usually omitted nitty-gritty of how to make them work. Specifically, we give insight into deep learning projects on face matching, print media monitoring, industrial quality control, music scanning, strategy game playing, and automated machine learning, thereby providing best practices for deep learning in practice

    Two to trust : AutoML for safe modelling and interpretable deep learning for robustness

    Get PDF
    With great power comes great responsibility. The success of machine learning, especially deep learning, in research and practice has attracted a great deal of interest, which in turn necessitates increased trust. Sources of mistrust include matters of model genesis ("Is this really the appropriate model?") and interpretability ("Why did the model come to this conclusion?", "Is the model safe from being easily fooled by adversaries?"). In this paper, two partners for the trustworthiness tango are presented: recent advances and ideas, as well as practical applications in industry in (a) Automated machine learning (AutoML), a powerful tool to optimize deep neural network architectures and netune hyperparameters, which promises to build models in a safer and more comprehensive way; (b) Interpretability of neural network outputs, which addresses the vital question regarding the reasoning behind model predictions and provides insights to improve robustness against adversarial attacks

    PrepNet : a convolutional auto-encoder to homogenize CT scans for cross-dataset medical image analysis

    Get PDF
    With the spread of COVID-19 over the world, the need arose for fast and precise automatic triage mechanisms to decelerate the spread of the disease by reducing human efforts e.g. for image-based diagnosis. Although the literature has shown promising efforts in this direction, reported results do not consider the variability of CT scans acquired under varying circumstances, thus rendering resulting models unfit for use on data acquired using e.g. different scanner technologies. While COVID-19 diagnosis can now be done efficiently using PCR tests, this use case exemplifies the need for a methodology to overcome data variability issues in order to make medical image analysis models more widely applicable. In this paper, we explicitly address the variability issue using the example of COVID-19 diagnosis and propose a novel generative approach that aims at erasing the differences induced by e.g. the imaging technology while simultaneously introducing minimal changes to the CT scans through leveraging the idea of deep autoencoders. The proposed prepossessing architecture (PrepNet) (i) is jointly trained on multiple CT scan datasets and (ii) is capable of extracting improved discriminative features for improved diagnosis. Experimental results on three public datasets (SARS-COVID-2, UCSD COVID-CT, MosMed) show that our model improves cross-dataset generalization by up to 11:84 percentage points despite a minor drop in within dataset performance

    Design patterns for resource-constrained automated deep-learning methods

    Get PDF
    We present an extensive evaluation of a wide variety of promising design patterns for automated deep-learning (AutoDL) methods, organized according to the problem categories of the 2019 AutoDL challenges, which set the task of optimizing both model accuracy and search efficiency under tight time and computing constraints. We propose structured empirical evaluations as the most promising avenue to obtain design principles for deep-learning systems due to the absence of strong theoretical support. From these evaluations, we distill relevant patterns which give rise to neural network design recommendations. In particular, we establish (a) that very wide fully connected layers learn meaningful features faster; we illustrate (b) how the lack of pretraining in audio processing can be compensated by architecture search; we show (c) that in text processing deep-learning-based methods only pull ahead of traditional methods for short text lengths with less than a thousand characters under tight resource limitations; and lastly we present (d) evidence that in very data- and computing-constrained settings, hyperparameter tuning of more traditional machine-learning methods outperforms deep-learning systems

    Deep learning-based simultaneous multi-phase deformable image registration of sparse 4D-CBCT

    Get PDF
    Purpose: Respiratory gated 4D-CBCT suffers from sparseness artefacts caused by the limited number of projections available for each respiratory phase/amplitude. These artefacts severely impact deformable image registration methods used to extract motion information. We use deep learning-based methods to predict displacement vector-fields (DVF) from sparse 4D-CBCT images to alleviate the impacts of sparseness artefacts. Methods: We trained U-Net-type convolutional neural network models to predict multiple (10) DVFs in a single forward pass given multiple sparse, gated CBCT and an optional artefact-free reference image as inputs. The predicted DVFs are used to warp the reference image to the different motion states, resulting in an artefact-free image for each state. The supervised training uses data generated by a motion simulation framework. The training dataset consists of 560 simulated 4D-CBCT images of 56 different patients; the generated data include fully sampled ground-truth images that are used to train the network. We compare the results of our method to pairwise image registration (reference image to single sparse image) using a) the deeds algorithm and b) VoxelMorph with image pair inputs. Results: We show that our method clearly outperforms pairwise registration using the deeds algorithm alone. PSNR improved from 25.8 to 46.4, SSIM from 0.9296 to 0.9999. In addition, the runtime of our learning-based method is orders of magnitude shorter (2 seconds instead of 10 minutes). Our results also indicate slightly improved performance compared to pairwise registration (delta-PSNR=1.2). We also trained a model that does not require the artefact-free reference image (which is usually not available) during inference demonstrating only marginally compromised results (delta-PSNR=-0.8). Conclusion: To the best of our knowledge, this is the first time CNNs are used to predict multi-phase DVFs in a single forward pass. This enables novel applications such as 4D-auto-segmentation, motion compensated image reconstruction, motion analyses, and patient motion modeling
    corecore