Search CORE

916 research outputs found

Physics-Informed Computer Vision: A Review and Perspectives

Author: Banerjee Chayan
Fookes Clinton
Karniadakis George
Nguyen Kien
Publication venue
Publication date: 31/05/2023
Field of study

Incorporation of physical information in machine learning frameworks are opening and transforming many application domains. Here the learning process is augmented through the induction of fundamental knowledge and governing physical laws. In this work we explore their utility for computer vision tasks in interpreting and understanding visual data. We present a systematic literature review of formulation and approaches to computer vision tasks guided by physical laws. We begin by decomposing the popular computer vision pipeline into a taxonomy of stages and investigate approaches to incorporate governing physical equations in each stage. Existing approaches in each task are analyzed with regard to what governing physical processes are modeled, formulated and how they are incorporated, i.e. modify data (observation bias), modify networks (inductive bias), and modify losses (learning bias). The taxonomy offers a unified view of the application of the physics-informed capability, highlighting where physics-informed learning has been conducted and where the gaps and opportunities are. Finally, we highlight open problems and challenges to inform future research. While still in its early days, the study of physics-informed computer vision has the promise to develop better computer vision models that can improve physical plausibility, accuracy, data efficiency and generalization in increasingly realistic applications

arXiv.org e-Print Archive

복부 CT에서 간과 혈관 분할 기법

Author: 정민영
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 신영길.복부 전산화 단층 촬영 (CT) 영상에서 정확한 간 및 혈관 분할은 체적 측정, 치료 계획 수립 및 추가적인 증강 현실 기반 수술 가이드와 같은 컴퓨터 진단 보조 시스템을 구축하는데 필수적인 요소이다. 최근 들어 컨볼루셔널 인공 신경망 (CNN) 형태의 딥 러닝이 많이 적용되면서 의료 영상 분할의 성능이 향상되고 있지만, 실제 임상에 적용할 수 있는 높은 일반화 성능을 제공하기는 여전히 어렵다. 또한 물체의 경계는 전통적으로 영상 분할에서 매우 중요한 요소로 이용되었지만, CT 영상에서 간의 불분명한 경계를 추출하기가 어렵기 때문에 현대 CNN에서는 이를 사용하지 않고 있다. 간 혈관 분할 작업의 경우, 복잡한 혈관 영상으로부터 학습 데이터를 만들기 어렵기 때문에 딥 러닝을 적용하기가 어렵다. 또한 얇은 혈관 부분의 영상 밝기 대비가 약하여 원본 영상에서 식별하기가 매우 어렵다. 본 논문에서는 위 언급한 문제들을 해결하기 위해 일반화 성능이 향상된 CNN과 얇은 혈관을 포함하는 복잡한 간 혈관을 정확하게 분할하는 알고리즘을 제안한다. 간 분할 작업에서 우수한 일반화 성능을 갖는 CNN을 구축하기 위해, 내부적으로 간 모양을 추정하는 부분이 포함된 자동 컨텍스트 알고리즘을 제안한다. 또한, CNN을 사용한 학습에 경계선의 개념이 새롭게 제안된다. 모호한 경계부가 포함되어 있어 전체 경계 영역을 CNN에 훈련하는 것은 매우 어렵기 때문에 반복되는 학습 과정에서 인공 신경망이 스스로 예측한 확률에서 부정확하게 추정된 부분적 경계만을 사용하여 인공 신경망을 학습한다. 실험적 결과를 통해 제안된 CNN이 다른 최신 기법들보다 정확도가 우수하다는 것을 보인다. 또한, 제안된 CNN의 일반화 성능을 검증하기 위해 다양한 실험을 수행한다. 간 혈관 분할에서는 간 내부의 관심 영역을 지정하기 위해 앞서 획득한 간 영역을 활용한다. 정확한 간 혈관 분할을 위해 혈관 후보 점들을 추출하여 사용하는 알고리즘을 제안한다. 확실한 후보 점들을 얻기 위해, 삼차원 영상의 차원을 먼저 최대 강도 투영 기법을 통해 이차원으로 낮춘다. 이차원 영상에서는 복잡한 혈관의 구조가 보다 단순화될 수 있다. 이어서, 이차원 영상에서 혈관 분할을 수행하고 혈관 픽셀들은 원래의 삼차원 공간상으로 역 투영된다. 마지막으로, 전체 혈관의 분할을 위해 원본 영상과 혈관 후보 점들을 모두 사용하는 새로운 레벨 셋 기반 알고리즘을 제안한다. 제안된 알고리즘은 복잡한 구조가 단순화되고 얇은 혈관이 더 잘 보이는 이차원 영상에서 얻은 후보 점들을 사용하기 때문에 얇은 혈관 분할에서 높은 정확도를 보인다. 실험적 결과에 의하면 제안된 알고리즘은 잘못된 영역의 추출 없이 다른 레벨 셋 기반 알고리즘들보다 우수한 성능을 보인다. 제안된 알고리즘은 간과 혈관을 분할하는 새로운 방법을 제시한다. 제안된 자동 컨텍스트 구조는 사람이 디자인한 학습 과정이 일반화 성능을 크게 향상할 수 있다는 것을 보인다. 그리고 제안된 경계선 학습 기법으로 CNN을 사용한 영상 분할의 성능을 향상할 수 있음을 내포한다. 간 혈관의 분할은 이차원 최대 강도 투영 기반 이미지로부터 획득된 혈관 후보 점들을 통해 얇은 혈관들이 성공적으로 분할될 수 있음을 보인다. 본 논문에서 제안된 알고리즘은 간의 해부학적 분석과 자동화된 컴퓨터 진단 보조 시스템을 구축하는 데 매우 중요한 기술이다.Accurate liver and its vessel segmentation on abdominal computed tomography (CT) images is one of the most important prerequisites for computer-aided diagnosis (CAD) systems such as volumetric measurement, treatment planning, and further augmented reality-based surgical guide. In recent years, the application of deep learning in the form of convolutional neural network (CNN) has improved the performance of medical image segmentation, but it is difficult to provide high generalization performance for the actual clinical practice. Furthermore, although the contour features are an important factor in the image segmentation problem, they are hard to be employed on CNN due to many unclear boundaries on the image. In case of a liver vessel segmentation, a deep learning approach is impractical because it is difficult to obtain training data from complex vessel images. Furthermore, thin vessels are hard to be identified in the original image due to weak intensity contrasts and noise. In this dissertation, a CNN with high generalization performance and a contour learning scheme is first proposed for liver segmentation. Secondly, a liver vessel segmentation algorithm is presented that accurately segments even thin vessels. To build a CNN with high generalization performance, the auto-context algorithm is employed. The auto-context algorithm goes through two pipelines: the first predicts the overall area of a liver and the second predicts the final liver using the first prediction as a prior. This process improves generalization performance because the network internally estimates shape-prior. In addition to the auto-context, a contour learning method is proposed that uses only sparse contours rather than the entire contour. Sparse contours are obtained and trained by using only the mispredicted part of the network's final prediction. Experimental studies show that the proposed network is superior in accuracy to other modern networks. Multiple N-fold tests are also performed to verify the generalization performance. An algorithm for accurate liver vessel segmentation is also proposed by introducing vessel candidate points. To obtain confident vessel candidates, the 3D image is first reduced to 2D through maximum intensity projection. Subsequently, vessel segmentation is performed from the 2D images and the segmented pixels are back-projected into the original 3D space. Finally, a new level set function is proposed that utilizes both the original image and vessel candidate points. The proposed algorithm can segment thin vessels with high accuracy by mainly using vessel candidate points. The reliability of the points can be higher through robust segmentation in the projected 2D images where complex structures are simplified and thin vessels are more visible. Experimental results show that the proposed algorithm is superior to other active contour models. The proposed algorithms present a new method of segmenting the liver and its vessels. The auto-context algorithm shows that a human-designed curriculum (i.e., shape-prior learning) can improve generalization performance. The proposed contour learning technique can increase the accuracy of a CNN for image segmentation by focusing on its failures, represented by sparse contours. The vessel segmentation shows that minor vessel branches can be successfully segmented through vessel candidate points obtained by reducing the image dimension. The algorithms presented in this dissertation can be employed for later analysis of liver anatomy that requires accurate segmentation techniques.Chapter 1 Introduction 1 1.1 Background and motivation 1 1.2 Problem statement 3 1.3 Main contributions 6 1.4 Contents and organization 9 Chapter 2 Related Works 10 2.1 Overview 10 2.2 Convolutional neural networks 11 2.2.1 Architectures of convolutional neural networks 11 2.2.2 Convolutional neural networks in medical image segmentation 21 2.3 Liver and vessel segmentation 37 2.3.1 Classical methods for liver segmentation 37 2.3.2 Vascular image segmentation 40 2.3.3 Active contour models 46 2.3.4 Vessel topology-based active contour model 54 2.4 Motivation 60 Chapter 3 Liver Segmentation via Auto-Context Neural Network with Self-Supervised Contour Attention 62 3.1 Overview 62 3.2 Single-pass auto-context neural network 65 3.2.1 Skip-attention module 66 3.2.2 V-transition module 69 3.2.3 Liver-prior inference and auto-context 70 3.2.4 Understanding the network 74 3.3 Self-supervising contour attention 75 3.4 Learning the network 81 3.4.1 Overall loss function 81 3.4.2 Data augmentation 81 3.5 Experimental Results 83 3.5.1 Overview 83 3.5.2 Data configurations and target of comparison 84 3.5.3 Evaluation metric 85 3.5.4 Accuracy evaluation 87 3.5.5 Ablation study 93 3.5.6 Performance of generalization 110 3.5.7 Results from ground-truth variations 114 3.6 Discussion 116 Chapter 4 Liver Vessel Segmentation via Active Contour Model with Dense Vessel Candidates 119 4.1 Overview 119 4.2 Dense vessel candidates 124 4.2.1 Maximum intensity slab images 125 4.2.2 Segmentation of 2D vessel candidates and back-projection 130 4.3 Clustering of dense vessel candidates 135 4.3.1 Virtual gradient-assisted regional ACM 136 4.3.2 Localized regional ACM 142 4.4 Experimental results 145 4.4.1 Overview 145 4.4.2 Data configurations and environment 146 4.4.3 2D segmentation 146 4.4.4 ACM comparisons 149 4.4.5 Evaluation of bifurcation points 154 4.4.6 Computational performance 159 4.4.7 Ablation study 160 4.4.8 Parameter study 162 4.5 Application to portal vein analysis 164 4.6 Discussion 168 Chapter 5 Conclusion and Future Works 170 Bibliography 172 초록 197Docto

SNU Open Repository and Archive

Camera Re-Localization with Data Augmentation by Image Rendering and Image-to-Image Translation

Author: Müller Markus
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

Die Selbstlokalisierung von Automobilen, Robotern oder unbemannten Luftfahrzeugen sowie die Selbstlokalisierung von Fußgängern ist und wird für eine Vielzahl an Anwendungen von hohem Interesse sein. Eine Hauptaufgabe ist die autonome Navigation von solchen Fahrzeugen, wobei die Lokalisierung in der umgebenden Szene eine Schlüsselkomponente darstellt. Da Kameras etablierte fest verbaute Sensoren in Automobilen, Robotern und unbemannten Luftfahrzeugen sind, ist der Mehraufwand diese auch für Aufgaben der Lokalisierung zu verwenden gering bis gar nicht vorhanden. Das gleiche gilt für die Selbstlokalisierung von Fußgängern, bei der Smartphones als mobile Plattformen für Kameras zum Einsatz kommen. Kamera-Relokalisierung, bei der die Pose einer Kamera bezüglich einer festen Umgebung bestimmt wird, ist ein wertvoller Prozess um eine Lösung oder Unterstützung der Lokalisierung für Fahrzeuge oder Fußgänger darzustellen. Kameras sind zudem kostengünstige Sensoren welche im Alltag von Menschen und Maschinen etabliert sind. Die Unterstützung von Kamera-Relokalisierung ist nicht auf Anwendungen bezüglich der Navigation begrenzt, sondern kann allgemein zur Unterstützung von Bildanalyse oder Bildverarbeitung wie Szenenrekonstruktion, Detektion, Klassifizierung oder ähnlichen Anwendungen genutzt werden. Für diese Zwecke, befasst sich diese Arbeit mit der Verbesserung des Prozesses der Kamera-Relokalisierung. Da Convolutional Neural Networks (CNNs) und hybride Lösungen um die Posen von Kameras zu bestimmen in den letzten Jahren mit etablierten manuell entworfenen Methoden konkurrieren, ist der Fokus in dieser Thesis auf erstere Methoden gesetzt. Die Hauptbeiträge dieser Arbeit beinhalten den Entwurf eines CNN zur Schätzung von Kameraposen, wobei der Schwerpunkt auf einer flachen Architektur liegt, die den Anforderungen an mobile Plattformen genügt. Dieses Netzwerk erreicht Genauigkeiten in gleichem Grad wie tiefere CNNs mit umfangreicheren Modelgrößen. Desweiteren ist die Performanz von CNNs stark von der Quantität und Qualität der zugrundeliegenden Trainingsdaten, die für die Optimierung genutzt werden, abhängig. Daher, befassen sich die weiteren Beiträge dieser Thesis mit dem Rendern von Bildern und Bild-zu-Bild Umwandlungen zur Erweiterung solcher Trainingsdaten. Das generelle Erweitern solcher Trainingsdaten wird Data Augmentation (DA) genannt. Für das Rendern von Bildern zur nützlichen Erweiterung von Trainingsdaten werden 3D Modelle genutzt. Generative Adversarial Networks (GANs) dienen zur Bild-zu-Bild Umwandlung. Während das Rendern von Bildern die Quantität in einem Bilddatensatz erhöht, verbessert die Bild-zu-Bild Umwandlung die Qualität dieser gerenderten Daten. Experimente werden sowohl mit erweiterten Datensätzen aus gerenderten Bildern als auch mit umgewandelten Bildern durchgeführt. Beide Ansätze der DA tragen zur Verbesserung der Genauigkeit der Lokalisierung bei. Somit werden in dieser Arbeit Kamera-Relokalisierung mit modernsten Methoden durch DA verbessert

KITopen

Advanced Biometrics with Deep Learning

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

Directory of Open Access Books (DOAB)

System-Characterized Artificial Intelligence Approaches for Cardiac cellular systems and Molecular Signature analysis

Author: Wu Ziqian
Publication venue: Dartmouth Digital Commons
Publication date: 28/06/2023
Field of study

The dissertation presents a significant advancement in the field of cardiac cellular systems and molecular signature systems by employing machine learning and generative artificial intelligence techniques. These methodologies are systematically characterized and applied to address critical challenges in these domains. A novel computational model is developed, which combines machine learning tools and multi-physics models. The main objective of this model is to accurately predict complex cellular dynamics, taking into account the intricate interactions within the cardiac cellular system. Furthermore, a comprehensive framework based on generative adversarial networks (GANs) is proposed. This framework is designed to generate synthetic data that faithfully represents an in-vitro cardiac cellular system. The generated data can be used to enhance the understanding and analysis of the system’s behavior. Additionally, a novel AI approach is formulated, which integrates deep learning and GAN techniques for Raman characterization. This approach enables efficient detection of multi-analyte mixtures by leveraging the power of deep learning algorithms and the generation of synthetic data through GANs. Overall, the integration of machine learning, generative artificial intelligence, and multi-physics modeling provides valuable insights and tools for precise prediction and efficient detection in cardiac cellular systems and molecular signature systems

Dartmouth Digital Commons (Dartmouth College)

Light Field Diffusion for Single-View Novel View Synthesis

Author: Han Kun
Ma Haoyu
Sun Shanlin
Xie Xiaohui
Xiong Yifeng
Publication venue
Publication date: 22/09/2023
Field of study

Single-view novel view synthesis, the task of generating images from new viewpoints based on a single reference image, is an important but challenging task in computer vision. Recently, Denoising Diffusion Probabilistic Model (DDPM) has become popular in this area due to its strong ability to generate high-fidelity images. However, current diffusion-based methods directly rely on camera pose matrices as viewing conditions, globally and implicitly introducing 3D constraints. These methods may suffer from inconsistency among generated images from different perspectives, especially in regions with intricate textures and structures. In this work, we present Light Field Diffusion (LFD), a conditional diffusion-based model for single-view novel view synthesis. Unlike previous methods that employ camera pose matrices, LFD transforms the camera view information into light field encoding and combines it with the reference image. This design introduces local pixel-wise constraints within the diffusion models, thereby encouraging better multi-view consistency. Experiments on several datasets show that our LFD can efficiently generate high-fidelity images and maintain better 3D consistency even in intricate regions. Our method can generate images with higher quality than NeRF-based models, and we obtain sample quality similar to other diffusion-based models but with only one-third of the model size

arXiv.org e-Print Archive