166 research outputs found

    Wide baseline pose estimation from video with a density-based uncertainty model

    Get PDF
    International audienceRobust wide baseline pose estimation is an essential step in the deployment of smart camera networks. In this work, we highlight some current limitations of conventional strategies for relative pose estimation in difficult urban scenes. Then, we propose a solution which relies on an adaptive search of corresponding interest points in synchronized video streams which allows us to converge robustly toward a high-quality solution. The core idea of our algorithm is to build across the image space a nonstationary mapping of the local pose estimation uncertainty, based on the spatial distribution of interest points. Subsequently, the mapping guides the selection of new observations from the video stream in order to prioritize the coverage of areas of high uncertainty. With an additional step in the initial stage, the proposed algorithm may also be used for refining an existing pose estimation based on the video data; this mode allows for performing a data-driven self-calibration task for stereo rigs for which accuracy is critical, such as onboard medical or vehicular systems. We validate our method on three different datasets which cover typical scenarios in pose estimation. The results show a fast and robust convergence of the solution, with a significant improvement, compared to single image-based alternatives, of the RMSE of ground-truth matches, and of the maximum absolute error

    Medical Ultrasound Imaging and Interventional Component (MUSiiC) Framework for Advanced Ultrasound Image-guided Therapy

    Get PDF
    Medical ultrasound (US) imaging is a popular and convenient medical imaging modality thanks to its mobility, non-ionizing radiation, ease-of-use, and real-time data acquisition. Conventional US brightness mode (B-Mode) is one type of diagnostic medical imaging modality that represents tissue morphology by collecting and displaying the intensity information of a reflected acoustic wave. Moreover, US B-Mode imaging is frequently integrated with tracking systems and robotic systems in image-guided therapy (IGT) systems. Recently, these systems have also begun to incorporate advanced US imaging such as US elasticity imaging, photoacoustic imaging, and thermal imaging. Several software frameworks and toolkits have been developed for US imaging research and the integration of US data acquisition, processing and display with existing IGT systems. However, there is no software framework or toolkit that supports advanced US imaging research and advanced US IGT systems by providing low-level US data (channel data or radio-frequency (RF) data) essential for advanced US imaging. In this dissertation, we propose a new medical US imaging and interventional component framework for advanced US image-guided therapy based on networkdistributed modularity, real-time computation and communication, and open-interface design specifications. Consequently, the framework can provide a modular research environment by supporting communication interfaces between heterogeneous systems to allow for flexible interventional US imaging research, and easy reconfiguration of an entire interventional US imaging system by adding or removing devices or equipment specific to each therapy. In addition, our proposed framework offers real-time synchronization between data from multiple data acquisition devices for advanced iii interventional US imaging research and integration of the US imaging system with other IGT systems. Moreover, we can easily implement and test new advanced ultrasound imaging techniques inside the proposed framework in real-time because our software framework is designed and optimized for advanced ultrasound research. The system’s flexibility, real-time performance, and open-interface are demonstrated and evaluated through performing experimental tests for several applications

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Kontextsensitivität für den Operationssaal der Zukunft

    Get PDF
    The operating room of the future is a topic of high interest. In this thesis, which is among the first in the recently defined field of Surgical Data Science, three major topics for automated context awareness in the OR of the future will be examined: improved surgical workflow analysis, the newly developed event impact factors, and as application combining these and other concepts the unified surgical display.Der Operationssaal der Zukunft ist ein Forschungsfeld von großer Bedeutung. In dieser Dissertation, die eine der ersten im kürzlich definierten Bereich „Surgical Data Science“ ist, werden drei Themen für die automatisierte Kontextsensitivität im OP der Zukunft untersucht: verbesserte chirurgische Worflowanalyse, die neuentwickelten „Event Impact Factors“ und als Anwendungsfall, der diese Konzepte mit anderen kombiniert, das vereinheitlichte chirurgische Display

    REAL-TIME 4D ULTRASOUND RECONSTRUCTION FOR IMAGE-GUIDED INTRACARDIAC INTERVENTIONS

    Get PDF
    Image-guided therapy addresses the lack of direct vision associated with minimally- invasive interventions performed on the beating heart, but requires effective intraoperative imaging. Gated 4D ultrasound reconstruction using a tracked 2D probe generates a time-series of 3D images representing the beating heart over the cardiac cycle. These images have a relatively high spatial resolution and wide field of view, and ultrasound is easily integrated into the intraoperative environment. This thesis presents a real-time 4D ultrasound reconstruction system incorporated within an augmented reality environment for surgical guidance, whose incremental visualization reduces common acquisition errors. The resulting 4D ultrasound datasets are intended for visualization or registration to preoperative images. A human factors experiment demonstrates the advantages of real-time ultrasound reconstruction, and accuracy assessments performed both with a dynamic phantom and intraoperatively reveal RMS localization errors of 2.5-2.7 mm, and 0.8 mm, respectively. Finally, clinical applicability is demonstrated by both porcine and patient imaging

    Kontextsensitivität für den Operationssaal der Zukunft

    Get PDF
    The operating room of the future is a topic of high interest. In this thesis, which is among the first in the recently defined field of Surgical Data Science, three major topics for automated context awareness in the OR of the future will be examined: improved surgical workflow analysis, the newly developed event impact factors, and as application combining these and other concepts the unified surgical display.Der Operationssaal der Zukunft ist ein Forschungsfeld von großer Bedeutung. In dieser Dissertation, die eine der ersten im kürzlich definierten Bereich „Surgical Data Science“ ist, werden drei Themen für die automatisierte Kontextsensitivität im OP der Zukunft untersucht: verbesserte chirurgische Worflowanalyse, die neuentwickelten „Event Impact Factors“ und als Anwendungsfall, der diese Konzepte mit anderen kombiniert, das vereinheitlichte chirurgische Display

    Perception and Orientation in Minimally Invasive Surgery

    No full text
    During the last two decades, we have seen a revolution in the way that we perform abdominal surgery with increased reliance on minimally invasive techniques. This paradigm shift has come at a rapid pace, with laparoscopic surgery now representing the gold standard for many surgical procedures and further minimisation of invasiveness being seen with the recent clinical introduction of novel techniques such as single-incision laparoscopic surgery and natural orifice translumenal endoscopic surgery. Despite the obvious benefits conferred on the patient in terms of morbidity, length of hospital stay and post-operative pain, this paradigm shift comes at a significantly higher demand on the surgeon, in terms of both perception and manual dexterity. The issues involved include degradation of sensory input to the operator compared to conventional open surgery owing to a loss of three-dimensional vision through the use of the two-dimensional operative interface, and decreased haptic feedback from the instruments. These changes have led to a much higher cognitive load on the surgeon and a greater risk of operator disorientation leading to potential surgical errors. This thesis represents a detailed investigation of disorientation in minimally invasive surgery. In this thesis, eye tracking methodology is identified as the method of choice for evaluating behavioural patterns during orientation. An analysis framework is proposed to profile orientation behaviour using eye tracking data validated in a laboratory model. This framework is used to characterise and quantify successful orientation strategies at critical stages of laparoscopic cholecystectomy and furthermore use these strategies to prove that focused teaching of this behaviour in novices can significantly increase performance in this task. Orientation strategies are then characterised for common clinical scenarios in natural orifice translumenal endoscopic surgery and the concept of image saliency is introduced to further investigate the importance of specific visual cues associated with effective orientation. Profiling of behavioural patterns is related to performance in orientation and implications on education and construction of smart surgical robots are drawn. Finally, a method for potentially decreasing operator disorientation is investigated in the form of endoscopic horizon stabilization in a simulated operative model for transgastric surgery. The major original contributions of this thesis include: Validation of a profiling methodology/framework to characterise orientation behaviour Identification of high performance orientation strategies in specific clinical scenarios including laparoscopic cholecystectomy and natural orifice translumenal endoscopic surgery Evaluation of the efficacy of teaching orientation strategies Evaluation of automatic endoscopic horizon stabilization in natural orifice translumenal endoscopic surgery The impact of the results presented in this thesis, as well as the potential for further high impact research is discussed in the context of both eye tracking as an evaluation tool in minimally invasive surgery as well as implementation of means to combat operator disorientation in a surgical platform. The work also provides further insight into the practical implementation of computer-assistance and technological innovation in future flexible access surgical platforms

    Towards Quantitative Endoscopy with Vision Intelligence

    Get PDF
    In this thesis, we work on topics related to quantitative endoscopy with vision-based intelligence. Specifically, our works revolve around the topic of video reconstruction in endoscopy, where many challenges exist, such as texture scarceness, illumination variation, multimodality, etc., and these prevent prior works from working effectively and robustly. To this end, we propose to combine the strength of expressivity of deep learning approaches and the rigorousness and accuracy of non-linear optimization algorithms to develop a series of methods to confront such challenges towards quantitative endoscopy. We first propose a retrospective sparse reconstruction method that can estimate a high-accuracy and density point cloud and high-completeness camera trajectory from a monocular endoscopic video with state-of-the-art performance. To enable this, replacing the role of a hand-crafted local descriptor, a deep image feature descriptor is developed to boost the feature matching performance in a typical sparse reconstruction algorithm. A retrospective surface reconstruction pipeline is then proposed to estimate a textured surface model from a monocular endoscopic video, where self-supervised depth and descriptor learning and surface fusion technique is involved. We show that the proposed method performs superior to a popular dense reconstruction method and the estimate reconstructions are in good agreement with the surface models obtained from CT scans. To align video-reconstructed surface models with pre-operative imaging such as CT, we introduce a global point cloud registration algorithm that is robust to resolution mismatch that often happens in such multi-modal scenarios. Specifically, a geometric feature descriptor is developed where a novel network normalization technique is used to help a 3D network produce more consistent and distinctive geometric features for samples with different resolutions. The proposed geometric descriptor achieves state-of-the-art performance, based on our evaluation. Last but not least, a real-time SLAM system that estimates a surface geometry and camera trajectory from a monocular endoscopic video is developed, where deep representations for geometry and appearance and non-linear factor graph optimization are used. We show that the proposed SLAM system performs favorably compared with a state-of-the-art feature-based SLAM system
    • …
    corecore