638 research outputs found
Temporal Action Segmentation: An Analysis of Modern Techniques
Temporal action segmentation (TAS) in videos aims at densely identifying
video frames in minutes-long videos with multiple action classes. As a
long-range video understanding task, researchers have developed an extended
collection of methods and examined their performance using various benchmarks.
Despite the rapid growth of TAS techniques in recent years, no systematic
survey has been conducted in these sectors. This survey analyzes and summarizes
the most significant contributions and trends. In particular, we first examine
the task definition, common benchmarks, types of supervision, and prevalent
evaluation measures. In addition, we systematically investigate two essential
techniques of this topic, i.e., frame representation and temporal modeling,
which have been studied extensively in the literature. We then conduct a
thorough review of existing TAS works categorized by their levels of
supervision and conclude our survey by identifying and emphasizing several
research gaps. In addition, we have curated a list of TAS resources, which is
available at https://github.com/nus-cvml/awesome-temporal-action-segmentation.Comment: 19 pages, 9 figures, 8 table
CataNet: Predicting remaining cataract surgery duration
Cataract surgery is a sight saving surgery that is performed over 10 million
times each year around the world. With such a large demand, the ability to
organize surgical wards and operating rooms efficiently is critical to delivery
this therapy in routine clinical care. In this context, estimating the
remaining surgical duration (RSD) during procedures is one way to help
streamline patient throughput and workflows. To this end, we propose CataNet, a
method for cataract surgeries that predicts in real time the RSD jointly with
two influential elements: the surgeon's experience, and the current phase of
the surgery. We compare CataNet to state-of-the-art RSD estimation methods,
showing that it outperforms them even when phase and experience are not
considered. We investigate this improvement and show that a significant
contributor is the way we integrate the elapsed time into CataNet's feature
extractor.Comment: Accepted at MICCAI 202
Automatic Detection of Out-of-body Frames in Surgical Videos for Privacy Protection Using Self-supervised Learning and Minimal Labels
Endoscopic video recordings are widely used in minimally invasive
robot-assisted surgery, but when the endoscope is outside the patient's body,
it can capture irrelevant segments that may contain sensitive information. To
address this, we propose a framework that accurately detects out-of-body frames
in surgical videos by leveraging self-supervision with minimal data labels. We
use a massive amount of unlabeled endoscopic images to learn meaningful
representations in a self-supervised manner. Our approach, which involves
pre-training on an auxiliary task and fine-tuning with limited supervision,
outperforms previous methods for detecting out-of-body frames in surgical
videos captured from da Vinci X and Xi surgical systems. The average F1 scores
range from 96.00 to 98.02. Remarkably, using only 5% of the training labels,
our approach still maintains an average F1 score performance above 97,
outperforming fully-supervised methods with 95% fewer labels. These results
demonstrate the potential of our framework to facilitate the safe handling of
surgical video recordings and enhance data privacy protection in minimally
invasive surgery.Comment: A 15-page journal article submitted to Journal of Medical Robotics
Research (JMRR
SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge
Surgical tool segmentation and action recognition are fundamental building
blocks in many computer-assisted intervention applications, ranging from
surgical skills assessment to decision support systems. Nowadays,
learning-based action recognition and segmentation approaches outperform
classical methods, relying, however, on large, annotated datasets. Furthermore,
action recognition and tool segmentation algorithms are often trained and make
predictions in isolation from each other, without exploiting potential
cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we
release the first multimodal, publicly available, in-vivo, dataset for surgical
action recognition and semantic instrumentation segmentation, containing 50
suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The
aim of the challenge is twofold. First, to enable researchers to leverage the
scale of the provided dataset and develop robust and highly accurate
single-task action recognition and tool segmentation approaches in the surgical
domain. Second, to further explore the potential of multitask-based learning
approaches and determine their comparative advantage against their single-task
counterparts. A total of 12 teams participated in the challenge, contributing 7
action recognition methods, 9 instrument segmentation techniques, and 4
multitask approaches that integrated both action recognition and instrument
segmentation. The complete SAR-RARP50 dataset is available at:
https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/19109
Unveiling healthcare data archiving: Exploring the role of artificial intelligence in medical image analysis
Gli archivi sanitari digitali possono essere considerati dei moderni database progettati per immagazzinare e gestire ingenti quantità di informazioni mediche, dalle cartelle cliniche dei pazienti, a studi clinici fino alle immagini mediche e a dati genomici. I dati strutturati e non strutturati che compongono gli archivi sanitari sono oggetto di scrupolose e rigorose procedure di validazione per garantire accuratezza, affidabilità e standardizzazione a fini clinici e di ricerca.
Nel contesto di un settore sanitario in continua e rapida evoluzione, l’intelligenza artificiale (IA) si propone come una forza trasformativa, capace di riformare gli archivi sanitari digitali migliorando la gestione, l’analisi e il recupero di vasti set di dati clinici, al fine di ottenere decisioni cliniche più informate e ripetibili, interventi tempestivi e risultati migliorati per i pazienti.
Tra i diversi dati archiviati, la gestione e l’analisi delle immagini mediche in archivi digitali presentano numerose sfide dovute all’eterogeneità dei dati, alla variabilità della qualità delle immagini, nonché alla mancanza di annotazioni. L’impiego di soluzioni basate sull’IA può aiutare a risolvere efficacemente queste problematiche, migliorando l’accuratezza dell’analisi delle immagini, standardizzando la qualità dei dati e facilitando la generazione di annotazioni dettagliate.
Questa tesi ha lo scopo di utilizzare algoritmi di IA per l’analisi di immagini mediche depositate in archivi sanitari digitali. Il presente lavoro propone di indagare varie tecniche di imaging medico, ognuna delle quali è caratterizzata da uno specifico dominio di applicazione e presenta quindi un insieme unico di sfide, requisiti e potenziali esiti. In particolare, in questo lavoro di tesi sarà oggetto di approfondimento l’assistenza diagnostica degli algoritmi di IA per tre diverse tecniche di imaging, in specifici scenari clinici:
i) Immagini endoscopiche ottenute durante esami di laringoscopia; ciò include un’esplorazione approfondita di tecniche come la detection di keypoints per la stima della motilità delle corde vocali e la segmentazione di tumori del tratto aerodigestivo superiore;
ii) Immagini di risonanza magnetica per la segmentazione dei dischi intervertebrali, per la diagnosi e il trattamento di malattie spinali, così come per lo svolgimento di interventi chirurgici guidati da immagini;
iii) Immagini ecografiche in ambito reumatologico, per la valutazione della sindrome del tunnel carpale attraverso la segmentazione del nervo mediano.
Le metodologie esposte in questo lavoro evidenziano l’efficacia degli algoritmi di IA nell’analizzare immagini mediche archiviate. I progressi metodologici ottenuti sottolineano il notevole potenziale dell’IA nel rivelare informazioni implicitamente presenti negli archivi sanitari digitali
Deep Retinal Optical Flow: From Synthetic Dataset Generation to Framework Creation and Evaluation
Sustained delivery of regenerative retinal therapies by robotic systems requires intra-operative tracking of the retinal fundus. This thesis presents a supervised convolutional neural network to densely predict optical flow of the retinal fundus, using semantic segmentation as an auxiliary task. Retinal flow information missing due to occlusion by surgical tools or other effects is implicitly inpainted, allowing for the robust tracking of surgical targets.
As manual annotation of optical flow is infeasible, a flexible algorithm for the generation of large synthetic training datasets on the basis of given intra-operative retinal images and tool templates is developed. The compositing of synthetic images is approached as a layer-wise operation implementing a number of transforms at every level which can be extended as required, mimicking the various phenomena visible in real data. Optical flow ground truth is calculated from motion transforms with the help of oflib, an open-source optical flow library available from the Python Package Index. It enables the user to manipulate, evaluate, and combine flow fields. The PyTorch version of oflib is fully differentiable and therefore suitable for use in deep learning methods requiring back-propagation.
The optical flow estimation from the network trained on synthetic data is evaluated using three performance metrics obtained from tracking a grid and sparsely annotated ground truth points. The evaluation benchmark consists of a series of challenging real intra-operative clips obtained from an extensive internally acquired dataset encompassing representative surgical cases. The deep learning approach clearly outperforms variational baseline methods and is shown to generalise well to real data showing scenarios routinely observed during vitreoretinal procedures. This indicates complex synthetic training datasets can be used to specifically guide optical flow estimation, laying the foundation for a robust system which can assist with intra-operative tracking of moving surgical targets even when occluded
Artificial Intelligence for Emerging Technology in Surgery: Systematic Review and Validation
Surgery is a high-risk procedure of therapy and is associated to post trauma complications of longer hospital stay, estimated blood loss and long duration of surgeries. Reports have suggested that over 2.5% patients die during and post operation. This paper is aimed at systematic review of previous research on artificial intelligence (AI) in surgery, analyzing their results with suitable software to validate their research by obtaining same or contrary results. Six published research articles have been reviewed across three continents. These articles have been re-validated using software including SPSS and MedCalc to obtain the statistical features such as the mean, standard deviation, significant level, and standard error. From the significant values, the experiments are then classified according to the null (p0.05) hypotheses. The results obtained from the analysis have suggested significant difference in operating time, docking time, staging time, and estimated blood loss but show no significant difference in length of hospital stay, recovery time and lymph nodes harvested between robotic assisted surgery using AI and normal conventional surgery. From the evaluations, this research suggests that AI-assisted surgery improves over the conventional surgery as safer and more efficient system of surgery with minimal or no complications
- …