27,780 research outputs found

    Learning Deep Similarity Metric for 3D MR-TRUS Registration

    Full text link
    Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images for guiding targeted prostate biopsy has significantly improved the biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image registration. However, it is very challenging to obtain a robust automatic MR-TRUS registration due to the large appearance difference between the two imaging modalities. The work presented in this paper aims to tackle this problem by addressing two challenges: (i) the definition of a suitable similarity metric and (ii) the determination of a suitable optimization strategy. Methods: This work proposes the use of a deep convolutional neural network to learn a similarity metric for MR-TRUS registration. We also use a composite optimization strategy that explores the solution space in order to search for a suitable initialization for the second-order optimization of the learned metric. Further, a multi-pass approach is used in order to smooth the metric for optimization. Results: The learned similarity metric outperforms the classical mutual information and also the state-of-the-art MIND feature based methods. The results indicate that the overall registration framework has a large capture range. The proposed deep similarity metric based approach obtained a mean TRE of 3.86mm (with an initial TRE of 16mm) for this challenging problem. Conclusion: A similarity metric that is learned using a deep neural network can be used to assess the quality of any given image registration and can be used in conjunction with the aforementioned optimization framework to perform automatic registration that is robust to poor initialization.Comment: To appear on IJCAR

    On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities

    Full text link
    Current object recognition methods fail on object sets that include both diffuse, reflective and transparent materials, although they are very common in domestic scenarios. We show that a combination of cues from multiple sensor modalities, including specular reflectance and unavailable depth information, allows us to capture a larger subset of household objects by extending a state of the art object recognition method. This leads to a significant increase in robustness of recognition over a larger set of commonly used objects.Comment: 12 page

    A comparative evaluation of 3 different free-form deformable image registration and contour propagation methods for head and neck MRI : the case of parotid changes radiotherapy

    Get PDF
    Purpose: To validate and compare the deformable image registration and parotid contour propagation process for head and neck magnetic resonance imaging in patients treated with radiotherapy using 3 different approachesthe commercial MIM, the open-source Elastix software, and an optimized version of it. Materials and Methods: Twelve patients with head and neck cancer previously treated with radiotherapy were considered. Deformable image registration and parotid contour propagation were evaluated by considering the magnetic resonance images acquired before and after the end of the treatment. Deformable image registration, based on free-form deformation method, and contour propagation available on MIM were compared to Elastix. Two different contour propagation approaches were implemented for Elastix software, a conventional one (DIR_Trx) and an optimized homemade version, based on mesh deformation (DIR_Mesh). The accuracy of these 3 approaches was estimated by comparing propagated to manual contours in terms of average symmetric distance, maximum symmetric distance, Dice similarity coefficient, sensitivity, and inclusiveness. Results: A good agreement was generally found between the manual contours and the propagated ones, without differences among the 3 methods; in few critical cases with complex deformations, DIR_Mesh proved to be more accurate, having the lowest values of average symmetric distance and maximum symmetric distance and the highest value of Dice similarity coefficient, although nonsignificant. The average propagation errors with respect to the reference contours are lower than the voxel diagonal (2 mm), and Dice similarity coefficient is around 0.8 for all 3 methods. Conclusion: The 3 free-form deformation approaches were not significantly different in terms of deformable image registration accuracy and can be safely adopted for the registration and parotid contour propagation during radiotherapy on magnetic resonance imaging. More optimized approaches (as DIR_Mesh) could be preferable for critical deformations

    Integration of multimodal data based on surface registration

    Get PDF
    The paper proposes and evaluates a strategy for the alignment of anatomical and functional data of the brain. The method takes as an input two different sets of images of a same patient: MR data and SPECT. It proceeds in four steps: first, it constructs two voxel models from the two image sets; next, it extracts from the two voxel models the surfaces of regions of interest; in the third step, the surfaces are interactively aligned by corresponding pairs; finally a unique volume model is constructed by selectively applying the geometrical transformations associated to the regions and weighting their contributions. The main advantages of this strategy are (i) that it can be applied retrospectively, (ii) that it is tri-dimensional, and (iii) that it is local. Its main disadvantage with regard to previously published methods it that it requires the extraction of surfaces. However, this step is often required for other stages of the multimodal analysis such as the visualization and therefore its cost can be accounted in the global cost of the process.Postprint (published version

    Adversarial Deformation Regularization for Training Image Registration Neural Networks

    Get PDF
    We describe an adversarial learning approach to constrain convolutional neural network training for image registration, replacing heuristic smoothness measures of displacement fields often used in these tasks. Using minimally-invasive prostate cancer intervention as an example application, we demonstrate the feasibility of utilizing biomechanical simulations to regularize a weakly-supervised anatomical-label-driven registration network for aligning pre-procedural magnetic resonance (MR) and 3D intra-procedural transrectal ultrasound (TRUS) images. A discriminator network is optimized to distinguish the registration-predicted displacement fields from the motion data simulated by finite element analysis. During training, the registration network simultaneously aims to maximize similarity between anatomical labels that drives image alignment and to minimize an adversarial generator loss that measures divergence between the predicted- and simulated deformation. The end-to-end trained network enables efficient and fully-automated registration that only requires an MR and TRUS image pair as input, without anatomical labels or simulated data during inference. 108 pairs of labelled MR and TRUS images from 76 prostate cancer patients and 71,500 nonlinear finite-element simulations from 143 different patients were used for this study. We show that, with only gland segmentation as training labels, the proposed method can help predict physically plausible deformation without any other smoothness penalty. Based on cross-validation experiments using 834 pairs of independent validation landmarks, the proposed adversarial-regularized registration achieved a target registration error of 6.3 mm that is significantly lower than those from several other regularization methods.Comment: Accepted to MICCAI 201

    Learning semantic sentence representations from visually grounded language without lexical knowledge

    Get PDF
    Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics

    How can a multimodal approach to primate communication help us understand the evolution of communication?

    Get PDF
    Scientists studying the communication of non-human animals are often aiming to better understand the evolution of human communication, including human language. Some scientists take a phylogenetic perspective, where the goal is to trace the evolutionary history of communicative traits, while others take a functional perspective, where the goal is to understand the selection pressures underpinning specific traits. Both perspectives are necessary to fully understand the evolution of communication, but it is important to understand how the two perspectives differ and what they can and cannot tell us. Here, we suggest that integrating phylogenetic and functional questions can be fruitful in better understanding the evolution of communication. We also suggest that adopting a multimodal approach to communication might help to integrate phylogenetic and functional questions, and provide an interesting avenue for research into language evolution
    • …
    corecore