18 research outputs found

    Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark

    Get PDF
    Purpose: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. Methods: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. Results: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). Conclusion: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery

    Méthodes d'apprentissage profond pour la détection et la reconnaissance d'outils et d'activités chirurgicaux dans les vidéos laparoscopiques

    No full text
    Nous abordons les deux problèmes de détection d'outils et de reconnaissance d'activité à grain fin en salle d'opération, qui sont essentielles pour le développement d'applications d'assistance chirurgicale. En tirant parti d'une supervision faible pour la modélisation temporelle et la localisation spatiale, nous proposons un modèle de détection et de suivi conjoint pour les instruments chirurgicaux, contournant le manque de jeu de données annotées spatialement. Pour une assistance plus utile de l'IA dans la salle d'opération, nous formalisons les activités chirurgicales sous forme de triplets de , et proposons plusieurs méthodes d'apprentissage en profondeur, qui tirent parti de l'activation de l'instrument, des mécanismes d'attention spatiale et sémantique, pour reconnaître les triplés directement à partir de vidéos. L'évaluation est effectuée sur des ensembles de données, que nous introduisons dans cette thèse, et donne des résultats de pointe pour ces tâches.In this thesis, we address the two problem of tool detection and fine-grained activity recognition in the operating room (OR), which are key ingredients in the development of surgical assistance applications. Leveraging weak supervision for temporal modeling and spatial localization, we propose a joint detection and tracking model for surgical instruments, circumventing the lack of spatially annotated dataset on this task. For a more helpful AI assistance in the OR, we formalize surgical activities as triplets of , and propose several deep learning methods, that leverages instrument's activation, spatial attention, and semantic attention mechanisms, to recognize these triplets directly from surgical videos. Evaluation is performed on large scale datasets, which we introduce in this thesis, obtaining state-of-the-art results for these task

    Méthodes d'apprentissage profond pour la détection et la reconnaissance d'outils et d'activités chirurgicaux dans les vidéos laparoscopiques

    No full text
    In this thesis, we address the two problem of tool detection and fine-grained activity recognition in the operating room (OR), which are key ingredients in the development of surgical assistance applications. Leveraging weak supervision for temporal modeling and spatial localization, we propose a joint detection and tracking model for surgical instruments, circumventing the lack of spatially annotated dataset on this task. For a more helpful AI assistance in the OR, we formalize surgical activities as triplets of , and propose several deep learning methods, that leverages instrument's activation, spatial attention, and semantic attention mechanisms, to recognize these triplets directly from surgical videos. Evaluation is performed on large scale datasets, which we introduce in this thesis, obtaining state-of-the-art results for these tasksNous abordons les deux problèmes de détection d'outils et de reconnaissance d'activité à grain fin en salle d'opération, qui sont essentielles pour le développement d'applications d'assistance chirurgicale. En tirant parti d'une supervision faible pour la modélisation temporelle et la localisation spatiale, nous proposons un modèle de détection et de suivi conjoint pour les instruments chirurgicaux, contournant le manque de jeu de données annotées spatialement. Pour une assistance plus utile de l'IA dans la salle d'opération, nous formalisons les activités chirurgicales sous forme de triplets de , et proposons plusieurs méthodes d'apprentissage en profondeur, qui tirent parti de l'activation de l'instrument, des mécanismes d'attention spatiale et sémantique, pour reconnaître les triplés directement à partir de vidéos. L'évaluation est effectuée sur des ensembles de données, que nous introduisons dans cette thèse, et donne des résultats de pointe pour ces tâches

    Data splits and metrics for method benchmarking on surgical action triplet datasets

    No full text
    In addition to generating data and annotations, devising sensible data splitting strategies and evaluation metrics is essential for the creation of a benchmark dataset. This practice ensures consensus on the usage of the data, homogeneous assessment, and uniform comparison of research methods on the dataset. This study focuses on CholecT50, which is a 50 video surgical dataset that formalizes surgical activities as triplets of 〈instrument, verb, target〉. In this paper, we introduce the standard splits for the CholecT50 and CholecT45 datasets and show how they compare with existing use of the dataset. CholecT45 is the first public release of 45 videos of CholecT50 dataset. We also develop a metrics library, ivtmetrics, for model evaluation on surgical triplets. Furthermore, we conduct a benchmark study by reproducing baseline methods in the most predominantly used deep learning frameworks (PyTorch and TensorFlow) to evaluate them using the proposed data splits and metrics and release them publicly to support future research. The proposed data splits and evaluation metrics will enable global tracking of research progress on the dataset and facilitate optimal model selection for further deployment

    Motivation for the Design and Implementation of a SMS Slang Converter for Android Devices

    No full text
    Mobile phone no doubt has become everyone’s personal companion. It finds close companionship among the youths. Mobile phone SMS slang has adversely affected students’ ability to write proper English grammar, as they find it a very convenient and economic way of communication. This work presents a needs assessment survey in which over 90% of respondents expressed desire to have an application that converts SMS slang. It also shows the design and implementation of SMS slang to English converter for Android devices. Results show a compression rate of over forty percent, therefore reducing the cost of the SMS, time and bandwidth

    Weakly Supervised Convolutional LSTM Approach for Tool Tracking in Laparoscopic Video

    No full text
    Purpose: Real-time surgical tool tracking is a core component of the future intelligent operating room (OR), because it is highly instrumental to analyze and understand the surgical activities. Current methods for surgical tool tracking in videos need to be trained on data in which the spatial positions of the tools are manually annotated. Generating such training data is difficult and time-consuming. Instead, we propose to use solely binary presence annotations to train a tool tracker for laparoscopic videos.Methods: The proposed approach is composed of a CNN + Convolutional LSTM (ConvLSTM) neural network trained end to end, but weakly supervised on tool binary presence labels only. We use the ConvLSTM to model the temporal dependencies in the motion of the surgical tools and leverage its spatiotemporal ability to smooth the class peak activations in the localization heat maps (Lh-maps).Results: We build a baseline tracker on top of the CNN model and demonstrate that our approach based on the ConvLSTM outperforms the baseline in tool presence detection, spatial localization, and motion tracking by over [Formula: see text], [Formula: see text], and [Formula: see text], respectively.Conclusions: In this paper, we demonstrate that binary presence labels are sufficient for training a deep learning tracking model using our proposed method. We also show that the ConvLSTM can leverage the spatiotemporal coherence of consecutive image frames across a surgical video to improve tool presence detection, spatial localization, and motion tracking
    corecore