414 research outputs found

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition

    Detail Enhancing Denoising of Digitized 3D Models from a Mobile Scanning System

    Get PDF
    The acquisition process of digitizing a large-scale environment produces an enormous amount of raw geometry data. This data is corrupted by system noise, which leads to 3D surfaces that are not smooth and details that are distorted. Any scanning system has noise associate with the scanning hardware, both digital quantization errors and measurement inaccuracies, but a mobile scanning system has additional system noise introduced by the pose estimation of the hardware during data acquisition. The combined system noise generates data that is not handled well by existing noise reduction and smoothing techniques. This research is focused on enhancing the 3D models acquired by mobile scanning systems used to digitize large-scale environments. These digitization systems combine a variety of sensors – including laser range scanners, video cameras, and pose estimation hardware – on a mobile platform for the quick acquisition of 3D models of real world environments. The data acquired by such systems are extremely noisy, often with significant details being on the same order of magnitude as the system noise. By utilizing a unique 3D signal analysis tool, a denoising algorithm was developed that identifies regions of detail and enhances their geometry, while removing the effects of noise on the overall model. The developed algorithm can be useful for a variety of digitized 3D models, not just those involving mobile scanning systems. The challenges faced in this study were the automatic processing needs of the enhancement algorithm, and the need to fill a hole in the area of 3D model analysis in order to reduce the effect of system noise on the 3D models. In this context, our main contributions are the automation and integration of a data enhancement method not well known to the computer vision community, and the development of a novel 3D signal decomposition and analysis tool. The new technologies featured in this document are intuitive extensions of existing methods to new dimensionality and applications. The totality of the research has been applied towards detail enhancing denoising of scanned data from a mobile range scanning system, and results from both synthetic and real models are presented

    μž„μƒμˆ κΈ° ν–₯상을 μœ„ν•œ λ”₯λŸ¬λ‹ 기법 연ꡬ: λŒ€μž₯λ‚΄μ‹œκ²½ 진단 및 λ‘œλ΄‡μˆ˜μˆ  술기 평가에 적용

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ ν˜‘λ™κ³Όμ • μ˜μš©μƒμ²΄κ³΅ν•™μ „κ³΅, 2020. 8. 김희찬.This paper presents deep learning-based methods for improving performance of clinicians. Novel methods were applied to the following two clinical cases and the results were evaluated. In the first study, a deep learning-based polyp classification algorithm for improving clinical performance of endoscopist during colonoscopy diagnosis was developed. Colonoscopy is the main method for diagnosing adenomatous polyp, which can multiply into a colorectal cancer and hyperplastic polyps. The classification algorithm was developed using convolutional neural network (CNN), trained with colorectal polyp images taken by a narrow-band imaging colonoscopy. The proposed method is built around an automatic machine learning (AutoML) which searches for the optimal architecture of CNN for colorectal polyp image classification and trains the weights of the architecture. In addition, gradient-weighted class activation mapping technique was used to overlay the probabilistic basis of the prediction result on the polyp location to aid the endoscopists visually. To verify the improvement in diagnostic performance, the efficacy of endoscopists with varying proficiency levels were compared with or without the aid of the proposed polyp classification algorithm. The results confirmed that, on average, diagnostic accuracy was improved and diagnosis time was shortened in all proficiency groups significantly. In the second study, a surgical instruments tracking algorithm for robotic surgery video was developed, and a model for quantitatively evaluating the surgeons surgical skill based on the acquired motion information of the surgical instruments was proposed. The movement of surgical instruments is the main component of evaluation for surgical skill. Therefore, the focus of this study was develop an automatic surgical instruments tracking algorithm, and to overcome the limitations presented by previous methods. The instance segmentation framework was developed to solve the instrument occlusion issue, and a tracking framework composed of a tracker and a re-identification algorithm was developed to maintain the type of surgical instruments being tracked in the video. In addition, algorithms for detecting the tip position of instruments and arm-indicator were developed to acquire the movement of devices specialized for the robotic surgery video. The performance of the proposed method was evaluated by measuring the difference between the predicted tip position and the ground truth position of the instruments using root mean square error, area under the curve, and Pearsons correlation analysis. Furthermore, motion metrics were calculated from the movement of surgical instruments, and a machine learning-based robotic surgical skill evaluation model was developed based on these metrics. These models were used to evaluate clinicians, and results were similar in the developed evaluation models, the Objective Structured Assessment of Technical Skill (OSATS), and the Global Evaluative Assessment of Robotic Surgery (GEARS) evaluation methods. In this study, deep learning technology was applied to colorectal polyp images for a polyp classification, and to robotic surgery videos for surgical instruments tracking. The improvement in clinical performance with the aid of these methods were evaluated and verified.λ³Έ 논문은 μ˜λ£Œμ§„μ˜ μž„μƒμˆ κΈ° λŠ₯λ ₯을 ν–₯μƒμ‹œν‚€κΈ° μœ„ν•˜μ—¬ μƒˆλ‘œμš΄ λ”₯λŸ¬λ‹ 기법듀을 μ œμ•ˆν•˜κ³  λ‹€μŒ 두 가지 싀둀에 λŒ€ν•΄ μ μš©ν•˜μ—¬ κ·Έ κ²°κ³Όλ₯Ό ν‰κ°€ν•˜μ˜€λ‹€. 첫 번째 μ—°κ΅¬μ—μ„œλŠ” λŒ€μž₯λ‚΄μ‹œκ²½μœΌλ‘œ κ΄‘ν•™ 진단 μ‹œ, λ‚΄μ‹œκ²½ μ „λ¬Έμ˜μ˜ 진단 λŠ₯λ ₯을 ν–₯μƒμ‹œν‚€κΈ° μœ„ν•˜μ—¬ λ”₯λŸ¬λ‹ 기반의 μš©μ’… λΆ„λ₯˜ μ•Œκ³ λ¦¬μ¦˜μ„ κ°œλ°œν•˜κ³ , λ‚΄μ‹œκ²½ μ „λ¬Έμ˜μ˜ 진단 λŠ₯λ ₯ ν–₯상 μ—¬λΆ€λ₯Ό κ²€μ¦ν•˜κ³ μž ν•˜μ˜€λ‹€. λŒ€μž₯λ‚΄μ‹œκ²½ κ²€μ‚¬λ‘œ μ•”μ’…μœΌλ‘œ 증식할 수 μžˆλŠ” μ„ μ’…κ³Ό 과증식성 μš©μ’…μ„ μ§„λ‹¨ν•˜λŠ” 것은 μ€‘μš”ν•˜λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” ν˜‘λŒ€μ—­ μ˜μƒ λ‚΄μ‹œκ²½μœΌλ‘œ μ΄¬μ˜ν•œ λŒ€μž₯ μš©μ’… μ˜μƒμœΌλ‘œ ν•©μ„±κ³± 신경망을 ν•™μŠ΅ν•˜μ—¬ λΆ„λ₯˜ μ•Œκ³ λ¦¬μ¦˜μ„ κ°œλ°œν•˜μ˜€λ‹€. μ œμ•ˆν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ€ μžλ™ κΈ°κ³„ν•™μŠ΅ (AutoML) λ°©λ²•μœΌλ‘œ, λŒ€μž₯ μš©μ’… μ˜μƒμ— μ΅œμ ν™”λœ ν•©μ„±κ³± 신경망 ꡬ쑰λ₯Ό μ°Ύκ³  μ‹ κ²½λ§μ˜ κ°€μ€‘μΉ˜λ₯Ό ν•™μŠ΅ν•˜μ˜€λ‹€. λ˜ν•œ 기울기-κ°€μ€‘μΉ˜ 클래슀 ν™œμ„±ν™” 맡핑 기법을 μ΄μš©ν•˜μ—¬ κ°œλ°œν•œ ν•©μ„±κ³± 신경망 결과의 ν™•λ₯ μ  κ·Όκ±°λ₯Ό μš©μ’… μœ„μΉ˜μ— μ‹œκ°μ μœΌλ‘œ λ‚˜νƒ€λ‚˜λ„λ‘ ν•¨μœΌλ‘œ λ‚΄μ‹œκ²½ μ „λ¬Έμ˜μ˜ 진단을 돕도둝 ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μˆ™λ ¨λ„ κ·Έλ£Ήλ³„λ‘œ λ‚΄μ‹œκ²½ μ „λ¬Έμ˜κ°€ μš©μ’… λΆ„λ₯˜ μ•Œκ³ λ¦¬μ¦˜μ˜ κ²°κ³Όλ₯Ό μ°Έκ³ ν•˜μ˜€μ„ λ•Œ 진단 λŠ₯λ ₯이 ν–₯μƒλ˜μ—ˆλŠ”μ§€ 비ꡐ μ‹€ν—˜μ„ μ§„ν–‰ν•˜μ˜€κ³ , λͺ¨λ“  κ·Έλ£Ήμ—μ„œ μœ μ˜λ―Έν•˜κ²Œ 진단 정확도가 ν–₯μƒλ˜κ³  진단 μ‹œκ°„μ΄ λ‹¨μΆ•λ˜μ—ˆμŒμ„ ν™•μΈν•˜μ˜€λ‹€. 두 번째 μ—°κ΅¬μ—μ„œλŠ” λ‘œλ΄‡μˆ˜μˆ  λ™μ˜μƒμ—μ„œ μˆ˜μˆ λ„κ΅¬ μœ„μΉ˜ 좔적 μ•Œκ³ λ¦¬μ¦˜μ„ κ°œλ°œν•˜κ³ , νšλ“ν•œ μˆ˜μˆ λ„κ΅¬μ˜ μ›€μ§μž„ 정보λ₯Ό λ°”νƒ•μœΌλ‘œ 수술자의 μˆ™λ ¨λ„λ₯Ό μ •λŸ‰μ μœΌλ‘œ ν‰κ°€ν•˜λŠ” λͺ¨λΈμ„ μ œμ•ˆν•˜μ˜€λ‹€. μˆ˜μˆ λ„κ΅¬μ˜ μ›€μ§μž„μ€ 수술자의 λ‘œλ΄‡μˆ˜μˆ  μˆ™λ ¨λ„λ₯Ό ν‰κ°€ν•˜κΈ° μœ„ν•œ μ£Όμš”ν•œ 정보이닀. λ”°λΌμ„œ λ³Έ μ—°κ΅¬λŠ” λ”₯λŸ¬λ‹ 기반의 μžλ™ μˆ˜μˆ λ„κ΅¬ 좔적 μ•Œκ³ λ¦¬μ¦˜μ„ κ°œλ°œν•˜μ˜€μœΌλ©°, λ‹€μŒ 두가지 μ„ ν–‰μ—°κ΅¬μ˜ ν•œκ³„μ μ„ κ·Ήλ³΅ν•˜μ˜€λ‹€. μΈμŠ€ν„΄μŠ€ λΆ„ν•  (Instance Segmentation) ν”„λ ˆμž„μ›μ„ κ°œλ°œν•˜μ—¬ 폐색 (Occlusion) 문제λ₯Ό ν•΄κ²°ν•˜μ˜€κ³ , 좔적기 (Tracker)와 μž¬μ‹λ³„ν™” (Re-Identification) μ•Œκ³ λ¦¬μ¦˜μœΌλ‘œ κ΅¬μ„±λœ 좔적 ν”„λ ˆμž„μ›μ„ κ°œλ°œν•˜μ—¬ λ™μ˜μƒμ—μ„œ μΆ”μ ν•˜λŠ” μˆ˜μˆ λ„κ΅¬μ˜ μ’…λ₯˜κ°€ μœ μ§€λ˜λ„λ‘ ν•˜μ˜€λ‹€. λ˜ν•œ λ‘œλ΄‡μˆ˜μˆ  λ™μ˜μƒμ˜ νŠΉμˆ˜μ„±μ„ κ³ λ €ν•˜μ—¬ μˆ˜μˆ λ„κ΅¬μ˜ μ›€μ§μž„μ„ νšλ“ν•˜κΈ°μœ„ν•΄ μˆ˜μˆ λ„κ΅¬ 끝 μœ„μΉ˜μ™€ λ‘œλ΄‡ νŒ”-인디케이터 (Arm-Indicator) 인식 μ•Œκ³ λ¦¬μ¦˜μ„ κ°œλ°œν•˜μ˜€λ‹€. μ œμ•ˆν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ˜ μ„±λŠ₯은 μ˜ˆμΈ‘ν•œ μˆ˜μˆ λ„κ΅¬ 끝 μœ„μΉ˜μ™€ μ •λ‹΅ μœ„μΉ˜ κ°„μ˜ 평균 제곱근 였차, 곑선 μ•„λž˜ 면적, ν”Όμ–΄μŠ¨ μƒκ΄€λΆ„μ„μœΌλ‘œ ν‰κ°€ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μˆ˜μˆ λ„κ΅¬μ˜ μ›€μ§μž„μœΌλ‘œλΆ€ν„° μ›€μ§μž„ μ§€ν‘œλ₯Ό κ³„μ‚°ν•˜κ³  이λ₯Ό λ°”νƒ•μœΌλ‘œ κΈ°κ³„ν•™μŠ΅ 기반의 λ‘œλ΄‡μˆ˜μˆ  μˆ™λ ¨λ„ 평가 λͺ¨λΈμ„ κ°œλ°œν•˜μ˜€λ‹€. κ°œλ°œν•œ 평가 λͺ¨λΈμ€ 기쑴의 Objective Structured Assessment of Technical Skill (OSATS), Global Evaluative Assessment of Robotic Surgery (GEARS) 평가 방법과 μœ μ‚¬ν•œ μ„±λŠ₯을 λ³΄μž„μ„ ν™•μΈν•˜μ˜€λ‹€. λ³Έ 논문은 μ˜λ£Œμ§„μ˜ μž„μƒμˆ κΈ° λŠ₯λ ₯을 ν–₯μƒμ‹œν‚€κΈ° μœ„ν•˜μ—¬ λŒ€μž₯ μš©μ’… μ˜μƒκ³Ό λ‘œλ΄‡μˆ˜μˆ  λ™μ˜μƒμ— λ”₯λŸ¬λ‹ κΈ°μˆ μ„ μ μš©ν•˜κ³  κ·Έ μœ νš¨μ„±μ„ ν™•μΈν•˜μ˜€μœΌλ©°, ν–₯후에 μ œμ•ˆν•˜λŠ” 방법이 μž„μƒμ—μ„œ μ‚¬μš©λ˜κ³  μžˆλŠ” 진단 및 평가 λ°©λ²•μ˜ λŒ€μ•ˆμ΄ 될 κ²ƒμœΌλ‘œ κΈ°λŒ€ν•œλ‹€.Chapter 1 General Introduction 1 1.1 Deep Learning for Medical Image Analysis 1 1.2 Deep Learning for Colonoscipic Diagnosis 2 1.3 Deep Learning for Robotic Surgical Skill Assessment 3 1.4 Thesis Objectives 5 Chapter 2 Optical Diagnosis of Colorectal Polyps using Deep Learning with Visual Explanations 7 2.1 Introduction 7 2.1.1 Background 7 2.1.2 Needs 8 2.1.3 Related Work 9 2.2 Methods 11 2.2.1 Study Design 11 2.2.2 Dataset 14 2.2.3 Preprocessing 17 2.2.4 Convolutional Neural Networks (CNN) 21 2.2.4.1 Standard CNN 21 2.2.4.2 Search for CNN Architecture 22 2.2.4.3 Searched CNN Training 23 2.2.4.4 Visual Explanation 24 2.2.5 Evaluation of CNN and Endoscopist Performances 25 2.3 Experiments and Results 27 2.3.1 CNN Performance 27 2.3.2 Results of Visual Explanation 31 2.3.3 Endoscopist with CNN Performance 33 2.4 Discussion 45 2.4.1 Research Significance 45 2.4.2 Limitations 47 2.5 Conclusion 49 Chapter 3 Surgical Skill Assessment during Robotic Surgery by Deep Learning-based Surgical Instrument Tracking 50 3.1 Introduction 50 3.1.1 Background 50 3.1.2 Needs 51 3.1.3 Related Work 52 3.2 Methods 56 3.2.1 Study Design 56 3.2.2 Dataset 59 3.2.3 Instance Segmentation Framework 63 3.2.4 Tracking Framework 66 3.2.4.1 Tracker 66 3.2.4.2 Re-identification 68 3.2.5 Surgical Instrument Tip Detection 69 3.2.6 Arm-Indicator Recognition 71 3.2.7 Surgical Skill Prediction Model 71 3.3 Experiments and Results 78 3.3.1 Performance of Instance Segmentation Framework 78 3.3.2 Performance of Tracking Framework 82 3.3.3 Evaluation of Surgical Instruments Trajectory 83 3.3.4 Evaluation of Surgical Skill Prediction Model 86 3.4 Discussion 90 3.4.1 Research Significance 90 3.4.2 Limitations 92 3.5 Conclusion 96 Chapter 4 Summary and Future Works 97 4.1 Thesis Summary 97 4.2 Limitations and Future Works 98 Bibliography 100 Abstract in Korean 116 Acknowledgement 119Docto
    • …
    corecore