47 research outputs found

    Simulation of hyperelastic materials in real-time using Deep Learning

    Get PDF
    The finite element method (FEM) is among the most commonly used numerical methods for solving engineering problems. Due to its computational cost, various ideas have been introduced to reduce computation times, such as domain decomposition, parallel computing, adaptive meshing, and model order reduction. In this paper we present U-Mesh: a data-driven method based on a U-Net architecture that approximates the non-linear relation between a contact force and the displacement field computed by a FEM algorithm. We show that deep learning, one of the latest machine learning methods based on artificial neural networks, can enhance computational mechanics through its ability to encode highly non-linear models in a compact form. Our method is applied to two benchmark examples: a cantilever beam and an L-shape subject to moving punctual loads. A comparison between our method and proper orthogonal decomposition (POD) is done through the paper. The results show that U-Mesh can perform very fast simulations on various geometries, mesh resolutions and number of input forces with very small errors

    Learning non-linear invariants for unsupervised out-of-distribution detection

    Get PDF
    An important hurdle to overcome before machine learning models can be reliably deployed in practice is identifying when samples are different from those seen during training, as the output for unexpected samples are often confidently incorrect, while not being identifiable as such. This problem is known as out-of-distribution (OOD) detection. A popular approach for the unsupervised OOD case is to reject samples with a high Mahalanobis distance with regards to the mean features of the training data. Recent work showed that the Mahalanobis distance can be thought of as finding the training data invariants, and rejecting OOD samples that violate them. A key limitation to this approach is that it is limited to linear relations only. Here, we present a novel method capable of identifying non-linear invariants in the data. These are learned using a reversible neural network, consisting of alternating rotation and coupling layers. Results on a varied number of tasks show it to be the best method overall, and achieving state-of-the-art results on some of the experiments

    Logical Implications for Visual Question Answering Consistency

    Get PDF
    Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or strong assumptions on pairs of questions and answers to enforce model consistency. Instead, we propose a novel strategy intended to improve model performance by directly reducing logical inconsistencies. To do this, we introduce a new consistency loss term that can be used by a wide range of the VQA models and which relies on knowing the logical relation between pairs of questions and answers. While such information is typically not available in VQA datasets, we propose to infer these logical relations using a dedicated language model and use these in our proposed consistency loss function. We conduct extensive experiments on the VQA Introspect and DME datasets and show that our method brings improvements to state-of-the-art VQA models while being robust across different architectures and settings

    Self-Binarizing Networks

    Get PDF
    We present a method to train self-binarizing neural networks, that is, networks that evolve their weights and activations during training to become binary. To obtain similar binary networks, existing methods rely on the sign activation function. This function, however, has no gradients for non-zero values, which makes standard backpropagation impossible. To circumvent the difficulty of training a network relying on the sign activation function, these methods alternate between floating-point and binary representations of the network during training, which is sub-optimal and inefficient. We approach the binarization task by training on a unique representation involving a smooth activation function, which is iteratively sharpened during training until it becomes a binary representation equivalent to the sign activation function. Additionally, we introduce a new technique to perform binary batch normalization that simplifies the conventional batch normalization by transforming it into a simple comparison operation. This is unlike existing methods, which are forced to the retain the conventional floating-point-based batch normalization. Our binary networks, apart from displaying advantages of lower memory and computation as compared to conventional floating-point and binary networks, also show higher classification accuracy than existing state-of-the-art methods on multiple benchmark datasets.Comment: 9 pages, 5 figure

    Mask then classify: multi-instance segmentation for surgical instruments.

    Get PDF
    PURPOSE The detection and segmentation of surgical instruments has been a vital step for many applications in minimally invasive surgical robotics. Previously, the problem was tackled from a semantic segmentation perspective, yet these methods fail to provide good segmentation maps of instrument types and do not contain any information on the instance affiliation of each pixel. We propose to overcome this limitation by using a novel instance segmentation method which first masks instruments and then classifies them into their respective type. METHODS We introduce a novel method for instance segmentation where a pixel-wise mask of each instance is found prior to classification. An encoder-decoder network is used to extract instrument instances, which are then separately classified using the features of the previous stages. Furthermore, we present a method to incorporate instrument priors from surgical robots. RESULTS Experiments are performed on the robotic instrument segmentation dataset of the 2017 endoscopic vision challenge. We perform a fourfold cross-validation and show an improvement of over 18% to the previous state-of-the-art. Furthermore, we perform an ablation study which highlights the importance of certain design choices and observe an increase of 10% over semantic segmentation methods. CONCLUSIONS We have presented a novel instance segmentation method for surgical instruments which outperforms previous semantic segmentation-based methods. Our method further provides a more informative output of instance level information, while retaining a precise segmentation mask. Finally, we have shown that robotic instrument priors can be used to further increase the performance

    Stochastic Segmentation with Conditional Categorical Diffusion Models

    Full text link
    Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true distribution of annotation maps. In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. Our model is conditioned to the input image, enabling it to generate multiple segmentation label maps that account for the aleatoric uncertainty arising from divergent ground truth annotations. Our experimental results show that CCDM achieves state-of-the-art performance on LIDC, a stochastic semantic segmentation dataset, and outperforms established baselines on the classical segmentation dataset Cityscapes.Comment: Code available at https://github.com/LarsDoorenbos/ccdm-stochastic-segmentatio

    Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns

    Get PDF
    Annotating new datasets for machine learning tasks is tedious, time-consuming, and costly. For segmentation applications, the burden is particularly high as manual delineations of relevant image content are often extremely expensive or can only be done by experts with domain-specific knowledge. Thanks to developments in transfer learning and training with weak supervision, segmentation models can now also greatly benefit from annotations of different kinds. However, for any new domain application looking to use weak supervision, the dataset builder still needs to define a strategy to distribute full segmentation and other weak annotations. Doing so is challenging, however, as it is a priori unknown how to distribute an annotation budget for a given new dataset. To this end, we propose a novel approach to determine annotation strategies for segmentation datasets, whereby estimating what proportion of segmentation and classification annotations should be collected given a fixed budget. To do so, our method sequentially determines proportions of segmentation and classification annotations to collect for budget-fractions by modeling the expected improvement of the final segmentation model. We show in our experiments that our approach yields annotations that perform very close to the optimal for a number of different annotation budgets and datasets

    CataNet: Predicting remaining cataract surgery duration

    Full text link
    Cataract surgery is a sight saving surgery that is performed over 10 million times each year around the world. With such a large demand, the ability to organize surgical wards and operating rooms efficiently is critical to delivery this therapy in routine clinical care. In this context, estimating the remaining surgical duration (RSD) during procedures is one way to help streamline patient throughput and workflows. To this end, we propose CataNet, a method for cataract surgeries that predicts in real time the RSD jointly with two influential elements: the surgeon's experience, and the current phase of the surgery. We compare CataNet to state-of-the-art RSD estimation methods, showing that it outperforms them even when phase and experience are not considered. We investigate this improvement and show that a significant contributor is the way we integrate the elapsed time into CataNet's feature extractor.Comment: Accepted at MICCAI 202
    corecore