916 research outputs found
Physics-Informed Computer Vision: A Review and Perspectives
Incorporation of physical information in machine learning frameworks are
opening and transforming many application domains. Here the learning process is
augmented through the induction of fundamental knowledge and governing physical
laws. In this work we explore their utility for computer vision tasks in
interpreting and understanding visual data. We present a systematic literature
review of formulation and approaches to computer vision tasks guided by
physical laws. We begin by decomposing the popular computer vision pipeline
into a taxonomy of stages and investigate approaches to incorporate governing
physical equations in each stage. Existing approaches in each task are analyzed
with regard to what governing physical processes are modeled, formulated and
how they are incorporated, i.e. modify data (observation bias), modify networks
(inductive bias), and modify losses (learning bias). The taxonomy offers a
unified view of the application of the physics-informed capability,
highlighting where physics-informed learning has been conducted and where the
gaps and opportunities are. Finally, we highlight open problems and challenges
to inform future research. While still in its early days, the study of
physics-informed computer vision has the promise to develop better computer
vision models that can improve physical plausibility, accuracy, data efficiency
and generalization in increasingly realistic applications
๋ณต๋ถ CT์์ ๊ฐ๊ณผ ํ๊ด ๋ถํ ๊ธฐ๋ฒ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ์ ์๊ธธ.๋ณต๋ถ ์ ์ฐํ ๋จ์ธต ์ดฌ์ (CT) ์์์์ ์ ํํ ๊ฐ ๋ฐ ํ๊ด ๋ถํ ์ ์ฒด์ ์ธก์ , ์น๋ฃ ๊ณํ ์๋ฆฝ ๋ฐ ์ถ๊ฐ์ ์ธ ์ฆ๊ฐ ํ์ค ๊ธฐ๋ฐ ์์ ๊ฐ์ด๋์ ๊ฐ์ ์ปดํจํฐ ์ง๋จ ๋ณด์กฐ ์์คํ
์ ๊ตฌ์ถํ๋๋ฐ ํ์์ ์ธ ์์์ด๋ค. ์ต๊ทผ ๋ค์ด ์ปจ๋ณผ๋ฃจ์
๋ ์ธ๊ณต ์ ๊ฒฝ๋ง (CNN) ํํ์ ๋ฅ ๋ฌ๋์ด ๋ง์ด ์ ์ฉ๋๋ฉด์ ์๋ฃ ์์ ๋ถํ ์ ์ฑ๋ฅ์ด ํฅ์๋๊ณ ์์ง๋ง, ์ค์ ์์์ ์ ์ฉํ ์ ์๋ ๋์ ์ผ๋ฐํ ์ฑ๋ฅ์ ์ ๊ณตํ๊ธฐ๋ ์ฌ์ ํ ์ด๋ ต๋ค. ๋ํ ๋ฌผ์ฒด์ ๊ฒฝ๊ณ๋ ์ ํต์ ์ผ๋ก ์์ ๋ถํ ์์ ๋งค์ฐ ์ค์ํ ์์๋ก ์ด์ฉ๋์์ง๋ง, CT ์์์์ ๊ฐ์ ๋ถ๋ถ๋ช
ํ ๊ฒฝ๊ณ๋ฅผ ์ถ์ถํ๊ธฐ๊ฐ ์ด๋ ต๊ธฐ ๋๋ฌธ์ ํ๋ CNN์์๋ ์ด๋ฅผ ์ฌ์ฉํ์ง ์๊ณ ์๋ค. ๊ฐ ํ๊ด ๋ถํ ์์
์ ๊ฒฝ์ฐ, ๋ณต์กํ ํ๊ด ์์์ผ๋ก๋ถํฐ ํ์ต ๋ฐ์ดํฐ๋ฅผ ๋ง๋ค๊ธฐ ์ด๋ ต๊ธฐ ๋๋ฌธ์ ๋ฅ ๋ฌ๋์ ์ ์ฉํ๊ธฐ๊ฐ ์ด๋ ต๋ค. ๋ํ ์์ ํ๊ด ๋ถ๋ถ์ ์์ ๋ฐ๊ธฐ ๋๋น๊ฐ ์ฝํ์ฌ ์๋ณธ ์์์์ ์๋ณํ๊ธฐ๊ฐ ๋งค์ฐ ์ด๋ ต๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์ ์ธ๊ธํ ๋ฌธ์ ๋ค์ ํด๊ฒฐํ๊ธฐ ์ํด ์ผ๋ฐํ ์ฑ๋ฅ์ด ํฅ์๋ CNN๊ณผ ์์ ํ๊ด์ ํฌํจํ๋ ๋ณต์กํ ๊ฐ ํ๊ด์ ์ ํํ๊ฒ ๋ถํ ํ๋ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค.
๊ฐ ๋ถํ ์์
์์ ์ฐ์ํ ์ผ๋ฐํ ์ฑ๋ฅ์ ๊ฐ๋ CNN์ ๊ตฌ์ถํ๊ธฐ ์ํด, ๋ด๋ถ์ ์ผ๋ก ๊ฐ ๋ชจ์์ ์ถ์ ํ๋ ๋ถ๋ถ์ด ํฌํจ๋ ์๋ ์ปจํ
์คํธ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. ๋ํ, CNN์ ์ฌ์ฉํ ํ์ต์ ๊ฒฝ๊ณ์ ์ ๊ฐ๋
์ด ์๋กญ๊ฒ ์ ์๋๋ค. ๋ชจํธํ ๊ฒฝ๊ณ๋ถ๊ฐ ํฌํจ๋์ด ์์ด ์ ์ฒด ๊ฒฝ๊ณ ์์ญ์ CNN์ ํ๋ จํ๋ ๊ฒ์ ๋งค์ฐ ์ด๋ ต๊ธฐ ๋๋ฌธ์ ๋ฐ๋ณต๋๋ ํ์ต ๊ณผ์ ์์ ์ธ๊ณต ์ ๊ฒฝ๋ง์ด ์ค์ค๋ก ์์ธกํ ํ๋ฅ ์์ ๋ถ์ ํํ๊ฒ ์ถ์ ๋ ๋ถ๋ถ์ ๊ฒฝ๊ณ๋ง์ ์ฌ์ฉํ์ฌ ์ธ๊ณต ์ ๊ฒฝ๋ง์ ํ์ตํ๋ค. ์คํ์ ๊ฒฐ๊ณผ๋ฅผ ํตํด ์ ์๋ CNN์ด ๋ค๋ฅธ ์ต์ ๊ธฐ๋ฒ๋ค๋ณด๋ค ์ ํ๋๊ฐ ์ฐ์ํ๋ค๋ ๊ฒ์ ๋ณด์ธ๋ค. ๋ํ, ์ ์๋ CNN์ ์ผ๋ฐํ ์ฑ๋ฅ์ ๊ฒ์ฆํ๊ธฐ ์ํด ๋ค์ํ ์คํ์ ์ํํ๋ค.
๊ฐ ํ๊ด ๋ถํ ์์๋ ๊ฐ ๋ด๋ถ์ ๊ด์ฌ ์์ญ์ ์ง์ ํ๊ธฐ ์ํด ์์ ํ๋ํ ๊ฐ ์์ญ์ ํ์ฉํ๋ค. ์ ํํ ๊ฐ ํ๊ด ๋ถํ ์ ์ํด ํ๊ด ํ๋ณด ์ ๋ค์ ์ถ์ถํ์ฌ ์ฌ์ฉํ๋ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. ํ์คํ ํ๋ณด ์ ๋ค์ ์ป๊ธฐ ์ํด, ์ผ์ฐจ์ ์์์ ์ฐจ์์ ๋จผ์ ์ต๋ ๊ฐ๋ ํฌ์ ๊ธฐ๋ฒ์ ํตํด ์ด์ฐจ์์ผ๋ก ๋ฎ์ถ๋ค. ์ด์ฐจ์ ์์์์๋ ๋ณต์กํ ํ๊ด์ ๊ตฌ์กฐ๊ฐ ๋ณด๋ค ๋จ์ํ๋ ์ ์๋ค. ์ด์ด์, ์ด์ฐจ์ ์์์์ ํ๊ด ๋ถํ ์ ์ํํ๊ณ ํ๊ด ํฝ์
๋ค์ ์๋์ ์ผ์ฐจ์ ๊ณต๊ฐ์์ผ๋ก ์ญ ํฌ์๋๋ค. ๋ง์ง๋ง์ผ๋ก, ์ ์ฒด ํ๊ด์ ๋ถํ ์ ์ํด ์๋ณธ ์์๊ณผ ํ๊ด ํ๋ณด ์ ๋ค์ ๋ชจ๋ ์ฌ์ฉํ๋ ์๋ก์ด ๋ ๋ฒจ ์
๊ธฐ๋ฐ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. ์ ์๋ ์๊ณ ๋ฆฌ์ฆ์ ๋ณต์กํ ๊ตฌ์กฐ๊ฐ ๋จ์ํ๋๊ณ ์์ ํ๊ด์ด ๋ ์ ๋ณด์ด๋ ์ด์ฐจ์ ์์์์ ์ป์ ํ๋ณด ์ ๋ค์ ์ฌ์ฉํ๊ธฐ ๋๋ฌธ์ ์์ ํ๊ด ๋ถํ ์์ ๋์ ์ ํ๋๋ฅผ ๋ณด์ธ๋ค. ์คํ์ ๊ฒฐ๊ณผ์ ์ํ๋ฉด ์ ์๋ ์๊ณ ๋ฆฌ์ฆ์ ์๋ชป๋ ์์ญ์ ์ถ์ถ ์์ด ๋ค๋ฅธ ๋ ๋ฒจ ์
๊ธฐ๋ฐ ์๊ณ ๋ฆฌ์ฆ๋ค๋ณด๋ค ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ธ๋ค.
์ ์๋ ์๊ณ ๋ฆฌ์ฆ์ ๊ฐ๊ณผ ํ๊ด์ ๋ถํ ํ๋ ์๋ก์ด ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์ ์๋ ์๋ ์ปจํ
์คํธ ๊ตฌ์กฐ๋ ์ฌ๋์ด ๋์์ธํ ํ์ต ๊ณผ์ ์ด ์ผ๋ฐํ ์ฑ๋ฅ์ ํฌ๊ฒ ํฅ์ํ ์ ์๋ค๋ ๊ฒ์ ๋ณด์ธ๋ค. ๊ทธ๋ฆฌ๊ณ ์ ์๋ ๊ฒฝ๊ณ์ ํ์ต ๊ธฐ๋ฒ์ผ๋ก CNN์ ์ฌ์ฉํ ์์ ๋ถํ ์ ์ฑ๋ฅ์ ํฅ์ํ ์ ์์์ ๋ดํฌํ๋ค. ๊ฐ ํ๊ด์ ๋ถํ ์ ์ด์ฐจ์ ์ต๋ ๊ฐ๋ ํฌ์ ๊ธฐ๋ฐ ์ด๋ฏธ์ง๋ก๋ถํฐ ํ๋๋ ํ๊ด ํ๋ณด ์ ๋ค์ ํตํด ์์ ํ๊ด๋ค์ด ์ฑ๊ณต์ ์ผ๋ก ๋ถํ ๋ ์ ์์์ ๋ณด์ธ๋ค. ๋ณธ ๋
ผ๋ฌธ์์ ์ ์๋ ์๊ณ ๋ฆฌ์ฆ์ ๊ฐ์ ํด๋ถํ์ ๋ถ์๊ณผ ์๋ํ๋ ์ปดํจํฐ ์ง๋จ ๋ณด์กฐ ์์คํ
์ ๊ตฌ์ถํ๋ ๋ฐ ๋งค์ฐ ์ค์ํ ๊ธฐ์ ์ด๋ค.Accurate liver and its vessel segmentation on abdominal computed tomography (CT) images is one of the most important prerequisites for computer-aided diagnosis (CAD) systems such as volumetric measurement, treatment planning, and further augmented reality-based surgical guide. In recent years, the application of deep learning in the form of convolutional neural network (CNN) has improved the performance of medical image segmentation, but it is difficult to provide high generalization performance for the actual clinical practice. Furthermore, although the contour features are an important factor in the image segmentation problem, they are hard to be employed on CNN due to many unclear boundaries on the image. In case of a liver vessel segmentation, a deep learning approach is impractical because it is difficult to obtain training data from complex vessel images. Furthermore, thin vessels are hard to be identified in the original image due to weak intensity contrasts and noise. In this dissertation, a CNN with high generalization performance and a contour learning scheme is first proposed for liver segmentation. Secondly, a liver vessel segmentation algorithm is presented that accurately segments even thin vessels.
To build a CNN with high generalization performance, the auto-context algorithm is employed. The auto-context algorithm goes through two pipelines: the first predicts the overall area of a liver and the second predicts the final liver using the first prediction as a prior. This process improves generalization performance because the network internally estimates shape-prior. In addition to the auto-context, a contour learning method is proposed that uses only sparse contours rather than the entire contour. Sparse contours are obtained and trained by using only the mispredicted part of the network's final prediction. Experimental studies show that the proposed network is superior in accuracy to other modern networks. Multiple N-fold tests are also performed to verify the generalization performance.
An algorithm for accurate liver vessel segmentation is also proposed by introducing vessel candidate points. To obtain confident vessel candidates, the 3D image is first reduced to 2D through maximum intensity projection. Subsequently, vessel segmentation is performed from the 2D images and the segmented pixels are back-projected into the original 3D space. Finally, a new level set function is proposed that utilizes both the original image and vessel candidate points. The proposed algorithm can segment thin vessels with high accuracy by mainly using vessel candidate points. The reliability of the points can be higher through robust segmentation in the projected 2D images where complex structures are simplified and thin vessels are more visible. Experimental results show that the proposed algorithm is superior to other active contour models.
The proposed algorithms present a new method of segmenting the liver and its vessels. The auto-context algorithm shows that a human-designed curriculum (i.e., shape-prior learning) can improve generalization performance. The proposed contour learning technique can increase the accuracy of a CNN for image segmentation by focusing on its failures, represented by sparse contours. The vessel segmentation shows that minor vessel branches can be successfully segmented through vessel candidate points obtained by reducing the image dimension. The algorithms presented in this dissertation can be employed for later analysis of liver anatomy that requires accurate segmentation techniques.Chapter 1 Introduction 1
1.1 Background and motivation 1
1.2 Problem statement 3
1.3 Main contributions 6
1.4 Contents and organization 9
Chapter 2 Related Works 10
2.1 Overview 10
2.2 Convolutional neural networks 11
2.2.1 Architectures of convolutional neural networks 11
2.2.2 Convolutional neural networks in medical image segmentation 21
2.3 Liver and vessel segmentation 37
2.3.1 Classical methods for liver segmentation 37
2.3.2 Vascular image segmentation 40
2.3.3 Active contour models 46
2.3.4 Vessel topology-based active contour model 54
2.4 Motivation 60
Chapter 3 Liver Segmentation via Auto-Context Neural Network with Self-Supervised Contour Attention 62
3.1 Overview 62
3.2 Single-pass auto-context neural network 65
3.2.1 Skip-attention module 66
3.2.2 V-transition module 69
3.2.3 Liver-prior inference and auto-context 70
3.2.4 Understanding the network 74
3.3 Self-supervising contour attention 75
3.4 Learning the network 81
3.4.1 Overall loss function 81
3.4.2 Data augmentation 81
3.5 Experimental Results 83
3.5.1 Overview 83
3.5.2 Data configurations and target of comparison 84
3.5.3 Evaluation metric 85
3.5.4 Accuracy evaluation 87
3.5.5 Ablation study 93
3.5.6 Performance of generalization 110
3.5.7 Results from ground-truth variations 114
3.6 Discussion 116
Chapter 4 Liver Vessel Segmentation via Active Contour Model with Dense Vessel Candidates 119
4.1 Overview 119
4.2 Dense vessel candidates 124
4.2.1 Maximum intensity slab images 125
4.2.2 Segmentation of 2D vessel candidates and back-projection 130
4.3 Clustering of dense vessel candidates 135
4.3.1 Virtual gradient-assisted regional ACM 136
4.3.2 Localized regional ACM 142
4.4 Experimental results 145
4.4.1 Overview 145
4.4.2 Data configurations and environment 146
4.4.3 2D segmentation 146
4.4.4 ACM comparisons 149
4.4.5 Evaluation of bifurcation points 154
4.4.6 Computational performance 159
4.4.7 Ablation study 160
4.4.8 Parameter study 162
4.5 Application to portal vein analysis 164
4.6 Discussion 168
Chapter 5 Conclusion and Future Works 170
Bibliography 172
์ด๋ก 197Docto
Camera Re-Localization with Data Augmentation by Image Rendering and Image-to-Image Translation
Die Selbstlokalisierung von Automobilen, Robotern oder unbemannten Luftfahrzeugen sowie die Selbstlokalisierung von Fuรgรคngern ist und wird fรผr eine Vielzahl an Anwendungen von hohem Interesse sein.
Eine Hauptaufgabe ist die autonome Navigation von solchen Fahrzeugen, wobei die Lokalisierung in der umgebenden Szene eine Schlรผsselkomponente darstellt.
Da Kameras etablierte fest verbaute Sensoren in Automobilen, Robotern und unbemannten Luftfahrzeugen sind, ist der Mehraufwand diese auch fรผr Aufgaben der Lokalisierung zu verwenden gering bis gar nicht vorhanden.
Das gleiche gilt fรผr die Selbstlokalisierung von Fuรgรคngern, bei der Smartphones als mobile Plattformen fรผr Kameras zum Einsatz kommen.
Kamera-Relokalisierung, bei der die Pose einer Kamera bezรผglich einer festen Umgebung bestimmt wird, ist ein wertvoller Prozess um eine Lรถsung oder Unterstรผtzung der Lokalisierung fรผr Fahrzeuge oder Fuรgรคnger darzustellen.
Kameras sind zudem kostengรผnstige Sensoren welche im Alltag von Menschen und Maschinen etabliert sind.
Die Unterstรผtzung von Kamera-Relokalisierung ist nicht auf Anwendungen bezรผglich der Navigation begrenzt, sondern kann allgemein zur Unterstรผtzung von Bildanalyse oder Bildverarbeitung wie Szenenrekonstruktion, Detektion, Klassifizierung oder รคhnlichen Anwendungen genutzt werden.
Fรผr diese Zwecke, befasst sich diese Arbeit mit der Verbesserung des Prozesses der Kamera-Relokalisierung.
Da Convolutional Neural Networks (CNNs) und hybride Lรถsungen um die Posen von Kameras zu bestimmen in den letzten Jahren mit etablierten manuell entworfenen Methoden konkurrieren, ist der Fokus in dieser Thesis auf erstere Methoden gesetzt.
Die Hauptbeitrรคge dieser Arbeit beinhalten den Entwurf eines CNN zur Schรคtzung von Kameraposen, wobei der Schwerpunkt auf einer flachen Architektur liegt, die den Anforderungen an mobile Plattformen genรผgt.
Dieses Netzwerk erreicht Genauigkeiten in gleichem Grad wie tiefere CNNs mit umfangreicheren Modelgrรถรen.
Desweiteren ist die Performanz von CNNs stark von der Quantitรคt und Qualitรคt der zugrundeliegenden Trainingsdaten, die fรผr die Optimierung genutzt werden, abhรคngig.
Daher, befassen sich die weiteren Beitrรคge dieser Thesis mit dem Rendern von Bildern und Bild-zu-Bild Umwandlungen zur Erweiterung solcher Trainingsdaten. Das generelle Erweitern solcher Trainingsdaten wird Data Augmentation (DA) genannt.
Fรผr das Rendern von Bildern zur nรผtzlichen Erweiterung von Trainingsdaten werden 3D Modelle genutzt.
Generative Adversarial Networks (GANs) dienen zur Bild-zu-Bild Umwandlung. Wรคhrend das Rendern von Bildern die Quantitรคt in einem Bilddatensatz erhรถht, verbessert die Bild-zu-Bild Umwandlung die Qualitรคt dieser gerenderten Daten.
Experimente werden sowohl mit erweiterten Datensรคtzen aus gerenderten Bildern als auch mit umgewandelten Bildern durchgefรผhrt.
Beide Ansรคtze der DA tragen zur Verbesserung der Genauigkeit der Lokalisierung bei.
Somit werden in dieser Arbeit Kamera-Relokalisierung mit modernsten Methoden durch DA verbessert
Advanced Biometrics with Deep Learning
Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others
System-Characterized Artificial Intelligence Approaches for Cardiac cellular systems and Molecular Signature analysis
The dissertation presents a significant advancement in the field of cardiac cellular systems and molecular signature systems by employing machine learning and generative artificial intelligence techniques. These methodologies are systematically characterized and applied to address critical challenges in these domains. A novel computational model is developed, which combines machine learning tools and multi-physics models. The main objective of this model is to accurately predict complex cellular dynamics, taking into account the intricate interactions within the cardiac cellular system. Furthermore, a comprehensive framework based on generative adversarial networks (GANs) is proposed. This framework is designed to generate synthetic data that faithfully represents an in-vitro cardiac cellular system. The generated data can be used to enhance the understanding and analysis of the systemโs behavior. Additionally, a novel AI approach is formulated, which integrates deep learning and GAN techniques for Raman characterization. This approach enables efficient detection of multi-analyte mixtures by leveraging the power of deep learning algorithms and the generation of synthetic data through GANs. Overall, the integration of machine learning, generative artificial intelligence, and multi-physics modeling provides valuable insights and tools for precise prediction and efficient detection in cardiac cellular systems and molecular signature systems
Light Field Diffusion for Single-View Novel View Synthesis
Single-view novel view synthesis, the task of generating images from new
viewpoints based on a single reference image, is an important but challenging
task in computer vision. Recently, Denoising Diffusion Probabilistic Model
(DDPM) has become popular in this area due to its strong ability to generate
high-fidelity images. However, current diffusion-based methods directly rely on
camera pose matrices as viewing conditions, globally and implicitly introducing
3D constraints. These methods may suffer from inconsistency among generated
images from different perspectives, especially in regions with intricate
textures and structures. In this work, we present Light Field Diffusion (LFD),
a conditional diffusion-based model for single-view novel view synthesis.
Unlike previous methods that employ camera pose matrices, LFD transforms the
camera view information into light field encoding and combines it with the
reference image. This design introduces local pixel-wise constraints within the
diffusion models, thereby encouraging better multi-view consistency.
Experiments on several datasets show that our LFD can efficiently generate
high-fidelity images and maintain better 3D consistency even in intricate
regions. Our method can generate images with higher quality than NeRF-based
models, and we obtain sample quality similar to other diffusion-based models
but with only one-third of the model size
- โฆ