10,527 research outputs found

    Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

    Get PDF
    We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate the problem as pairwise (who's better?) and overall (who's best?) ranking of video collections, using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill, and learns shared features when a pair of videos exhibit comparable skill levels. Results demonstrate our method is applicable across tasks, with the percentage of correctly ordered pairs of videos ranging from 70% to 83% for four datasets. We demonstrate the robustness of our approach via sensitivity analysis of its parameters. We see this work as effort toward the automated organization of how-to video collections and overall, generic skill determination in video.Comment: CVPR 201

    Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

    Get PDF
    Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinciยฎ Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset

    ์ž„์ƒ์ˆ ๊ธฐ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฒ• ์—ฐ๊ตฌ: ๋Œ€์žฅ๋‚ด์‹œ๊ฒฝ ์ง„๋‹จ ๋ฐ ๋กœ๋ด‡์ˆ˜์ˆ  ์ˆ ๊ธฐ ํ‰๊ฐ€์— ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์˜์šฉ์ƒ์ฒด๊ณตํ•™์ „๊ณต, 2020. 8. ๊น€ํฌ์ฐฌ.This paper presents deep learning-based methods for improving performance of clinicians. Novel methods were applied to the following two clinical cases and the results were evaluated. In the first study, a deep learning-based polyp classification algorithm for improving clinical performance of endoscopist during colonoscopy diagnosis was developed. Colonoscopy is the main method for diagnosing adenomatous polyp, which can multiply into a colorectal cancer and hyperplastic polyps. The classification algorithm was developed using convolutional neural network (CNN), trained with colorectal polyp images taken by a narrow-band imaging colonoscopy. The proposed method is built around an automatic machine learning (AutoML) which searches for the optimal architecture of CNN for colorectal polyp image classification and trains the weights of the architecture. In addition, gradient-weighted class activation mapping technique was used to overlay the probabilistic basis of the prediction result on the polyp location to aid the endoscopists visually. To verify the improvement in diagnostic performance, the efficacy of endoscopists with varying proficiency levels were compared with or without the aid of the proposed polyp classification algorithm. The results confirmed that, on average, diagnostic accuracy was improved and diagnosis time was shortened in all proficiency groups significantly. In the second study, a surgical instruments tracking algorithm for robotic surgery video was developed, and a model for quantitatively evaluating the surgeons surgical skill based on the acquired motion information of the surgical instruments was proposed. The movement of surgical instruments is the main component of evaluation for surgical skill. Therefore, the focus of this study was develop an automatic surgical instruments tracking algorithm, and to overcome the limitations presented by previous methods. The instance segmentation framework was developed to solve the instrument occlusion issue, and a tracking framework composed of a tracker and a re-identification algorithm was developed to maintain the type of surgical instruments being tracked in the video. In addition, algorithms for detecting the tip position of instruments and arm-indicator were developed to acquire the movement of devices specialized for the robotic surgery video. The performance of the proposed method was evaluated by measuring the difference between the predicted tip position and the ground truth position of the instruments using root mean square error, area under the curve, and Pearsons correlation analysis. Furthermore, motion metrics were calculated from the movement of surgical instruments, and a machine learning-based robotic surgical skill evaluation model was developed based on these metrics. These models were used to evaluate clinicians, and results were similar in the developed evaluation models, the Objective Structured Assessment of Technical Skill (OSATS), and the Global Evaluative Assessment of Robotic Surgery (GEARS) evaluation methods. In this study, deep learning technology was applied to colorectal polyp images for a polyp classification, and to robotic surgery videos for surgical instruments tracking. The improvement in clinical performance with the aid of these methods were evaluated and verified.๋ณธ ๋…ผ๋ฌธ์€ ์˜๋ฃŒ์ง„์˜ ์ž„์ƒ์ˆ ๊ธฐ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜๊ณ  ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์‹ค๋ก€์— ๋Œ€ํ•ด ์ ์šฉํ•˜์—ฌ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋Œ€์žฅ๋‚ด์‹œ๊ฒฝ์œผ๋กœ ๊ด‘ํ•™ ์ง„๋‹จ ์‹œ, ๋‚ด์‹œ๊ฒฝ ์ „๋ฌธ์˜์˜ ์ง„๋‹จ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์šฉ์ข… ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜๊ณ , ๋‚ด์‹œ๊ฒฝ ์ „๋ฌธ์˜์˜ ์ง„๋‹จ ๋Šฅ๋ ฅ ํ–ฅ์ƒ ์—ฌ๋ถ€๋ฅผ ๊ฒ€์ฆํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ๋Œ€์žฅ๋‚ด์‹œ๊ฒฝ ๊ฒ€์‚ฌ๋กœ ์•”์ข…์œผ๋กœ ์ฆ์‹ํ•  ์ˆ˜ ์žˆ๋Š” ์„ ์ข…๊ณผ ๊ณผ์ฆ์‹์„ฑ ์šฉ์ข…์„ ์ง„๋‹จํ•˜๋Š” ๊ฒƒ์€ ์ค‘์š”ํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ˜‘๋Œ€์—ญ ์˜์ƒ ๋‚ด์‹œ๊ฒฝ์œผ๋กœ ์ดฌ์˜ํ•œ ๋Œ€์žฅ ์šฉ์ข… ์˜์ƒ์œผ๋กœ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์„ ํ•™์Šตํ•˜์—ฌ ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ž๋™ ๊ธฐ๊ณ„ํ•™์Šต (AutoML) ๋ฐฉ๋ฒ•์œผ๋กœ, ๋Œ€์žฅ ์šฉ์ข… ์˜์ƒ์— ์ตœ์ ํ™”๋œ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ๋ฅผ ์ฐพ๊ณ  ์‹ ๊ฒฝ๋ง์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ•™์Šตํ•˜์˜€๋‹ค. ๋˜ํ•œ ๊ธฐ์šธ๊ธฐ-๊ฐ€์ค‘์น˜ ํด๋ž˜์Šค ํ™œ์„ฑํ™” ๋งตํ•‘ ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ๊ฐœ๋ฐœํ•œ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๊ฒฐ๊ณผ์˜ ํ™•๋ฅ ์  ๊ทผ๊ฑฐ๋ฅผ ์šฉ์ข… ์œ„์น˜์— ์‹œ๊ฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋„๋ก ํ•จ์œผ๋กœ ๋‚ด์‹œ๊ฒฝ ์ „๋ฌธ์˜์˜ ์ง„๋‹จ์„ ๋•๋„๋ก ํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ˆ™๋ จ๋„ ๊ทธ๋ฃน๋ณ„๋กœ ๋‚ด์‹œ๊ฒฝ ์ „๋ฌธ์˜๊ฐ€ ์šฉ์ข… ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ๊ณ ํ•˜์˜€์„ ๋•Œ ์ง„๋‹จ ๋Šฅ๋ ฅ์ด ํ–ฅ์ƒ๋˜์—ˆ๋Š”์ง€ ๋น„๊ต ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๊ณ , ๋ชจ๋“  ๊ทธ๋ฃน์—์„œ ์œ ์˜๋ฏธํ•˜๊ฒŒ ์ง„๋‹จ ์ •ํ™•๋„๊ฐ€ ํ–ฅ์ƒ๋˜๊ณ  ์ง„๋‹จ ์‹œ๊ฐ„์ด ๋‹จ์ถ•๋˜์—ˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋กœ๋ด‡์ˆ˜์ˆ  ๋™์˜์ƒ์—์„œ ์ˆ˜์ˆ ๋„๊ตฌ ์œ„์น˜ ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜๊ณ , ํš๋“ํ•œ ์ˆ˜์ˆ ๋„๊ตฌ์˜ ์›€์ง์ž„ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ˆ ์ž์˜ ์ˆ™๋ จ๋„๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ˆ˜์ˆ ๋„๊ตฌ์˜ ์›€์ง์ž„์€ ์ˆ˜์ˆ ์ž์˜ ๋กœ๋ด‡์ˆ˜์ˆ  ์ˆ™๋ จ๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ฃผ์š”ํ•œ ์ •๋ณด์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ž๋™ ์ˆ˜์ˆ ๋„๊ตฌ ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜์˜€์œผ๋ฉฐ, ๋‹ค์Œ ๋‘๊ฐ€์ง€ ์„ ํ–‰์—ฐ๊ตฌ์˜ ํ•œ๊ณ„์ ์„ ๊ทน๋ณตํ•˜์˜€๋‹ค. ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  (Instance Segmentation) ํ”„๋ ˆ์ž„์›์„ ๊ฐœ๋ฐœํ•˜์—ฌ ํ์ƒ‰ (Occlusion) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๊ณ , ์ถ”์ ๊ธฐ (Tracker)์™€ ์žฌ์‹๋ณ„ํ™” (Re-Identification) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ถ”์  ํ”„๋ ˆ์ž„์›์„ ๊ฐœ๋ฐœํ•˜์—ฌ ๋™์˜์ƒ์—์„œ ์ถ”์ ํ•˜๋Š” ์ˆ˜์ˆ ๋„๊ตฌ์˜ ์ข…๋ฅ˜๊ฐ€ ์œ ์ง€๋˜๋„๋ก ํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋กœ๋ด‡์ˆ˜์ˆ  ๋™์˜์ƒ์˜ ํŠน์ˆ˜์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ˆ˜์ˆ ๋„๊ตฌ์˜ ์›€์ง์ž„์„ ํš๋“ํ•˜๊ธฐ์œ„ํ•ด ์ˆ˜์ˆ ๋„๊ตฌ ๋ ์œ„์น˜์™€ ๋กœ๋ด‡ ํŒ”-์ธ๋””์ผ€์ดํ„ฐ (Arm-Indicator) ์ธ์‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์€ ์˜ˆ์ธกํ•œ ์ˆ˜์ˆ ๋„๊ตฌ ๋ ์œ„์น˜์™€ ์ •๋‹ต ์œ„์น˜ ๊ฐ„์˜ ํ‰๊ท  ์ œ๊ณฑ๊ทผ ์˜ค์ฐจ, ๊ณก์„  ์•„๋ž˜ ๋ฉด์ , ํ”ผ์–ด์Šจ ์ƒ๊ด€๋ถ„์„์œผ๋กœ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ˆ˜์ˆ ๋„๊ตฌ์˜ ์›€์ง์ž„์œผ๋กœ๋ถ€ํ„ฐ ์›€์ง์ž„ ์ง€ํ‘œ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๋กœ๋ด‡์ˆ˜์ˆ  ์ˆ™๋ จ๋„ ํ‰๊ฐ€ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ๊ฐœ๋ฐœํ•œ ํ‰๊ฐ€ ๋ชจ๋ธ์€ ๊ธฐ์กด์˜ Objective Structured Assessment of Technical Skill (OSATS), Global Evaluative Assessment of Robotic Surgery (GEARS) ํ‰๊ฐ€ ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์˜๋ฃŒ์ง„์˜ ์ž„์ƒ์ˆ ๊ธฐ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ๋Œ€์žฅ ์šฉ์ข… ์˜์ƒ๊ณผ ๋กœ๋ด‡์ˆ˜์ˆ  ๋™์˜์ƒ์— ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜๊ณ  ๊ทธ ์œ ํšจ์„ฑ์„ ํ™•์ธํ•˜์˜€์œผ๋ฉฐ, ํ–ฅํ›„์— ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์ž„์ƒ์—์„œ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ์ง„๋‹จ ๋ฐ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์˜ ๋Œ€์•ˆ์ด ๋  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€ํ•œ๋‹ค.Chapter 1 General Introduction 1 1.1 Deep Learning for Medical Image Analysis 1 1.2 Deep Learning for Colonoscipic Diagnosis 2 1.3 Deep Learning for Robotic Surgical Skill Assessment 3 1.4 Thesis Objectives 5 Chapter 2 Optical Diagnosis of Colorectal Polyps using Deep Learning with Visual Explanations 7 2.1 Introduction 7 2.1.1 Background 7 2.1.2 Needs 8 2.1.3 Related Work 9 2.2 Methods 11 2.2.1 Study Design 11 2.2.2 Dataset 14 2.2.3 Preprocessing 17 2.2.4 Convolutional Neural Networks (CNN) 21 2.2.4.1 Standard CNN 21 2.2.4.2 Search for CNN Architecture 22 2.2.4.3 Searched CNN Training 23 2.2.4.4 Visual Explanation 24 2.2.5 Evaluation of CNN and Endoscopist Performances 25 2.3 Experiments and Results 27 2.3.1 CNN Performance 27 2.3.2 Results of Visual Explanation 31 2.3.3 Endoscopist with CNN Performance 33 2.4 Discussion 45 2.4.1 Research Significance 45 2.4.2 Limitations 47 2.5 Conclusion 49 Chapter 3 Surgical Skill Assessment during Robotic Surgery by Deep Learning-based Surgical Instrument Tracking 50 3.1 Introduction 50 3.1.1 Background 50 3.1.2 Needs 51 3.1.3 Related Work 52 3.2 Methods 56 3.2.1 Study Design 56 3.2.2 Dataset 59 3.2.3 Instance Segmentation Framework 63 3.2.4 Tracking Framework 66 3.2.4.1 Tracker 66 3.2.4.2 Re-identification 68 3.2.5 Surgical Instrument Tip Detection 69 3.2.6 Arm-Indicator Recognition 71 3.2.7 Surgical Skill Prediction Model 71 3.3 Experiments and Results 78 3.3.1 Performance of Instance Segmentation Framework 78 3.3.2 Performance of Tracking Framework 82 3.3.3 Evaluation of Surgical Instruments Trajectory 83 3.3.4 Evaluation of Surgical Skill Prediction Model 86 3.4 Discussion 90 3.4.1 Research Significance 90 3.4.2 Limitations 92 3.5 Conclusion 96 Chapter 4 Summary and Future Works 97 4.1 Thesis Summary 97 4.2 Limitations and Future Works 98 Bibliography 100 Abstract in Korean 116 Acknowledgement 119Docto

    Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark

    Get PDF
    Purpose: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. Methods: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. Results: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). Conclusion: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery
    • โ€ฆ
    corecore