86,328 research outputs found

    Global rigid registration of CT to video in laparoscopic liver surgery

    Get PDF
    PURPOSE: Image-guidance systems have the potential to aid in laparoscopic interventions by providing sub-surface structure information and tumour localisation. The registration of a preoperative 3D image with the intraoperative laparoscopic video feed is an important component of image guidance, which should be fast, robust and cause minimal disruption to the surgical procedure. Most methods for rigid and non-rigid registration require a good initial alignment. However, in most research systems for abdominal surgery, the user has to manually rotate and translate the models, which is usually difficult to perform quickly and intuitively. METHODS: We propose a fast, global method for the initial rigid alignment between a 3D mesh derived from a preoperative CT of the liver and a surface reconstruction of the intraoperative scene. We formulate the shape matching problem as a quadratic assignment problem which minimises the dissimilarity between feature descriptors while enforcing geometrical consistency between all the feature points. We incorporate a novel constraint based on the liver contours which deals specifically with the challenges introduced by laparoscopic data. RESULTS: We validate our proposed method on synthetic data, on a liver phantom and on retrospective clinical data acquired during a laparoscopic liver resection. We show robustness over reduced partial size and increasing levels of deformation. Our results on the phantom and on the real data show good initial alignment, which can successfully converge to the correct position using fine alignment techniques. Furthermore, since we can pre-process the CT scan before surgery, the proposed method runs faster than current algorithms. CONCLUSION: The proposed shape matching method can provide a fast, global initial registration, which can be further refined by fine alignment methods. This approach will lead to a more usable and intuitive image-guidance system for laparoscopic liver surgery

    SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval

    Get PDF
    This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods

    Personalised correction, feedback, and guidance in an automated tutoring system for skills training

    Get PDF
    In addition to knowledge, in various domains skills are equally important. Active learning and training are effective forms of education. We present an automated skills training system for a database programming environment that promotes procedural knowledge acquisition and skills training. The system provides support features such as correction of solutions, feedback and personalised guidance, similar to interactions with a human tutor. Specifically, we address synchronous feedback and guidance based on personalised assessment. Each of these features is automated and includes a level of personalisation and adaptation. At the core of the system is a pattern-based error classification and correction component that analyses student input

    Assessment of a photogrammetric approach for urban DSM extraction from tri-stereoscopic satellite imagery

    Get PDF
    Built-up environments are extremely complex for 3D surface modelling purposes. The main distortions that hamper 3D reconstruction from 2D imagery are image dissimilarities, concealed areas, shadows, height discontinuities and discrepancies between smooth terrain and man-made features. A methodology is proposed to improve automatic photogrammetric extraction of an urban surface model from high resolution satellite imagery with the emphasis on strategies to reduce the effects of the cited distortions and to make image matching more robust. Instead of a standard stereoscopic approach, a digital surface model is derived from tri-stereoscopic satellite imagery. This is based on an extensive multi-image matching strategy that fully benefits from the geometric and radiometric information contained in the three images. The bundled triplet consists of an IKONOS along-track pair and an additional near-nadir IKONOS image. For the tri-stereoscopic study a densely built-up area, extending from the centre of Istanbul to the urban fringe, is selected. The accuracy of the model extracted from the IKONOS triplet, as well as the model extracted from only the along-track stereopair, are assessed by comparison with 3D check points and 3D building vector data
    corecore