505 research outputs found

    3D Shape Descriptor-Based Facial Landmark Detection: A Machine Learning Approach

    Get PDF
    Facial landmark detection on 3D human faces has had numerous applications in the literature such as establishing point-to-point correspondence between 3D face models which is itself a key step for a wide range of applications like 3D face detection and authentication, matching, reconstruction, and retrieval, to name a few. Two groups of approaches, namely knowledge-driven and data-driven approaches, have been employed for facial landmarking in the literature. Knowledge-driven techniques are the traditional approaches that have been widely used to locate landmarks on human faces. In these approaches, a user with sucient knowledge and experience usually denes features to be extracted as the landmarks. Data-driven techniques, on the other hand, take advantage of machine learning algorithms to detect prominent features on 3D face models. Besides the key advantages, each category of these techniques has limitations that prevent it from generating the most reliable results. In this work we propose to combine the strengths of the two approaches to detect facial landmarks in a more ecient and precise way. The suggested approach consists of two phases. First, some salient features of the faces are extracted using expert systems. Afterwards, these points are used as the initial control points in the well-known Thin Plate Spline (TPS) technique to deform the input face towards a reference face model. Second, by exploring and utilizing multiple machine learning algorithms another group of landmarks are extracted. The data-driven landmark detection step is performed in a supervised manner providing an information-rich set of training data in which a set of local descriptors are computed and used to train the algorithm. We then, use the detected landmarks for establishing point-to-point correspondence between the 3D human faces mainly using an improved version of Iterative Closest Point (ICP) algorithms. Furthermore, we propose to use the detected landmarks for 3D face matching applications

    Computational Modeling of Facial Response for Detecting Differential Traits in Autism Spectrum Disorders

    Get PDF
    This dissertation proposes novel computational modeling and computer vision methods for the analysis and discovery of differential traits in subjects with Autism Spectrum Disorders (ASD) using video and three-dimensional (3D) images of face and facial expressions. ASD is a neurodevelopmental disorder that impairs an individual’s nonverbal communication skills. This work studies ASD from the pathophysiology of facial expressions which may manifest atypical responses in the face. State-of-the-art psychophysical studies mostly employ na¨ıve human raters to visually score atypical facial responses of individuals with ASD, which may be subjective, tedious, and error prone. A few quantitative studies use intrusive sensors on the face of the subjects with ASD, which in turn, may inhibit or bias the natural facial responses of these subjects. This dissertation proposes non-intrusive computer vision methods to alleviate these limitations in the investigation for differential traits from the spontaneous facial responses of individuals with ASD. Two IRB-approved psychophysical studies are performed involving two groups of age-matched subjects: one for subjects diagnosed with ASD and the other for subjects who are typically-developing (TD). The facial responses of the subjects are computed from their facial images using the proposed computational models and then statistically analyzed to infer about the differential traits for the group with ASD. A novel computational model is proposed to represent the large volume of 3D facial data in a small pose-invariant Frenet frame-based feature space. The inherent pose-invariant property of the proposed features alleviates the need for an expensive 3D face registration in the pre-processing step. The proposed modeling framework is not only computationally efficient but also offers competitive performance in 3D face and facial expression recognition tasks when compared with that of the state-ofthe-art methods. This computational model is applied in the first experiment to quantify subtle facial muscle response from the geometry of 3D facial data. Results show a statistically significant asymmetry in specific pair of facial muscle activation (p\u3c0.05) for the group with ASD, which suggests the presence of a psychophysical trait (also known as an ’oddity’) in the facial expressions. For the first time in the ASD literature, the facial action coding system (FACS) is employed to classify the spontaneous facial responses based on facial action units (FAUs). Statistical analyses reveal significantly (p\u3c0.01) higher prevalence of smile expression (FAU 12) for the ASD group when compared with the TD group. The high prevalence of smile has co-occurred with significantly averted gaze (p\u3c0.05) in the group with ASD, which is indicative of an impaired reciprocal communication. The metric associated with incongruent facial and visual responses suggests a behavioral biomarker for ASD. The second experiment shows a higher prevalence of mouth frown (FAU 15) and significantly lower correlations between the activation of several FAU pairs (p\u3c0.05) in the group with ASD when compared with the TD group. The proposed computational modeling in this dissertation offers promising biomarkers, which may aid in early detection of subtle ASD-related traits, and thus enable an effective intervention strategy in the future

    Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

    Full text link
    Two approaches are proposed for cross-pose face recognition, one is based on the 3D reconstruction of facial components and the other is based on the deep Convolutional Neural Network (CNN). Unlike most 3D approaches that consider holistic faces, the proposed approach considers 3D facial components. It segments a 2D gallery face into components, reconstructs the 3D surface for each component, and recognizes a probe face by component features. The segmentation is based on the landmarks located by a hierarchical algorithm that combines the Faster R-CNN for face detection and the Reduced Tree Structured Model for landmark localization. The core part of the CNN-based approach is a revised VGG network. We study the performances with different settings on the training set, including the synthesized data from 3D reconstruction, the real-life data from an in-the-wild database, and both types of data combined. We investigate the performances of the network when it is employed as a classifier or designed as a feature extractor. The two recognition approaches and the fast landmark localization are evaluated in extensive experiments, and compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table

    Using AI and Robotics for EV battery cable detection.: Development and implementation of end-to-end model-free 3D instance segmentation for industrial purposes

    Get PDF
    Master's thesis in Information- and communication technology (IKT590)This thesis describes a novel method for capturing point clouds and segmenting instances of cabling found on electric vehicle battery packs. The use of cutting-edge perception algorithm architectures, such as graph-based and voxel-based convolution, in industrial autonomous lithium-ion battery pack disassembly is being investigated. The thesis focuses on the challenge of getting a desirable representation of any battery pack using an ABB robot in conjunction with a high-end structured light camera, with "end-to-end" and "model-free" as design constraints. The thesis employs self-captured datasets comprised of several battery packs that have been captured and labeled. Following that, the datasets are used to create a perception system. This thesis recommends using HDR functionality in an industrial application to capture the full dynamic range of the battery packs. To adequately depict 3D features, a three-point-of-view capture sequence is deemed necessary. A general capture process for an entire battery pack is also presented, but a next-best-scan algorithm is likely required to ensure a "close to complete" representation. Graph-based deep-learning algorithms have been shown to be capable of being scaled up to50,000inputs while still exhibiting strong performance in terms of accuracy and processing time. The results show that an instance segmenting system can be implemented in less than two seconds. Using off-the-shelf hardware, demonstrate that a 3D perception system is industrially viable and competitive with a 2D perception system

    복부 CT에서 간과 혈관 분할 기법

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 신영길.복부 전산화 단층 촬영 (CT) 영상에서 정확한 간 및 혈관 분할은 체적 측정, 치료 계획 수립 및 추가적인 증강 현실 기반 수술 가이드와 같은 컴퓨터 진단 보조 시스템을 구축하는데 필수적인 요소이다. 최근 들어 컨볼루셔널 인공 신경망 (CNN) 형태의 딥 러닝이 많이 적용되면서 의료 영상 분할의 성능이 향상되고 있지만, 실제 임상에 적용할 수 있는 높은 일반화 성능을 제공하기는 여전히 어렵다. 또한 물체의 경계는 전통적으로 영상 분할에서 매우 중요한 요소로 이용되었지만, CT 영상에서 간의 불분명한 경계를 추출하기가 어렵기 때문에 현대 CNN에서는 이를 사용하지 않고 있다. 간 혈관 분할 작업의 경우, 복잡한 혈관 영상으로부터 학습 데이터를 만들기 어렵기 때문에 딥 러닝을 적용하기가 어렵다. 또한 얇은 혈관 부분의 영상 밝기 대비가 약하여 원본 영상에서 식별하기가 매우 어렵다. 본 논문에서는 위 언급한 문제들을 해결하기 위해 일반화 성능이 향상된 CNN과 얇은 혈관을 포함하는 복잡한 간 혈관을 정확하게 분할하는 알고리즘을 제안한다. 간 분할 작업에서 우수한 일반화 성능을 갖는 CNN을 구축하기 위해, 내부적으로 간 모양을 추정하는 부분이 포함된 자동 컨텍스트 알고리즘을 제안한다. 또한, CNN을 사용한 학습에 경계선의 개념이 새롭게 제안된다. 모호한 경계부가 포함되어 있어 전체 경계 영역을 CNN에 훈련하는 것은 매우 어렵기 때문에 반복되는 학습 과정에서 인공 신경망이 스스로 예측한 확률에서 부정확하게 추정된 부분적 경계만을 사용하여 인공 신경망을 학습한다. 실험적 결과를 통해 제안된 CNN이 다른 최신 기법들보다 정확도가 우수하다는 것을 보인다. 또한, 제안된 CNN의 일반화 성능을 검증하기 위해 다양한 실험을 수행한다. 간 혈관 분할에서는 간 내부의 관심 영역을 지정하기 위해 앞서 획득한 간 영역을 활용한다. 정확한 간 혈관 분할을 위해 혈관 후보 점들을 추출하여 사용하는 알고리즘을 제안한다. 확실한 후보 점들을 얻기 위해, 삼차원 영상의 차원을 먼저 최대 강도 투영 기법을 통해 이차원으로 낮춘다. 이차원 영상에서는 복잡한 혈관의 구조가 보다 단순화될 수 있다. 이어서, 이차원 영상에서 혈관 분할을 수행하고 혈관 픽셀들은 원래의 삼차원 공간상으로 역 투영된다. 마지막으로, 전체 혈관의 분할을 위해 원본 영상과 혈관 후보 점들을 모두 사용하는 새로운 레벨 셋 기반 알고리즘을 제안한다. 제안된 알고리즘은 복잡한 구조가 단순화되고 얇은 혈관이 더 잘 보이는 이차원 영상에서 얻은 후보 점들을 사용하기 때문에 얇은 혈관 분할에서 높은 정확도를 보인다. 실험적 결과에 의하면 제안된 알고리즘은 잘못된 영역의 추출 없이 다른 레벨 셋 기반 알고리즘들보다 우수한 성능을 보인다. 제안된 알고리즘은 간과 혈관을 분할하는 새로운 방법을 제시한다. 제안된 자동 컨텍스트 구조는 사람이 디자인한 학습 과정이 일반화 성능을 크게 향상할 수 있다는 것을 보인다. 그리고 제안된 경계선 학습 기법으로 CNN을 사용한 영상 분할의 성능을 향상할 수 있음을 내포한다. 간 혈관의 분할은 이차원 최대 강도 투영 기반 이미지로부터 획득된 혈관 후보 점들을 통해 얇은 혈관들이 성공적으로 분할될 수 있음을 보인다. 본 논문에서 제안된 알고리즘은 간의 해부학적 분석과 자동화된 컴퓨터 진단 보조 시스템을 구축하는 데 매우 중요한 기술이다.Accurate liver and its vessel segmentation on abdominal computed tomography (CT) images is one of the most important prerequisites for computer-aided diagnosis (CAD) systems such as volumetric measurement, treatment planning, and further augmented reality-based surgical guide. In recent years, the application of deep learning in the form of convolutional neural network (CNN) has improved the performance of medical image segmentation, but it is difficult to provide high generalization performance for the actual clinical practice. Furthermore, although the contour features are an important factor in the image segmentation problem, they are hard to be employed on CNN due to many unclear boundaries on the image. In case of a liver vessel segmentation, a deep learning approach is impractical because it is difficult to obtain training data from complex vessel images. Furthermore, thin vessels are hard to be identified in the original image due to weak intensity contrasts and noise. In this dissertation, a CNN with high generalization performance and a contour learning scheme is first proposed for liver segmentation. Secondly, a liver vessel segmentation algorithm is presented that accurately segments even thin vessels. To build a CNN with high generalization performance, the auto-context algorithm is employed. The auto-context algorithm goes through two pipelines: the first predicts the overall area of a liver and the second predicts the final liver using the first prediction as a prior. This process improves generalization performance because the network internally estimates shape-prior. In addition to the auto-context, a contour learning method is proposed that uses only sparse contours rather than the entire contour. Sparse contours are obtained and trained by using only the mispredicted part of the network's final prediction. Experimental studies show that the proposed network is superior in accuracy to other modern networks. Multiple N-fold tests are also performed to verify the generalization performance. An algorithm for accurate liver vessel segmentation is also proposed by introducing vessel candidate points. To obtain confident vessel candidates, the 3D image is first reduced to 2D through maximum intensity projection. Subsequently, vessel segmentation is performed from the 2D images and the segmented pixels are back-projected into the original 3D space. Finally, a new level set function is proposed that utilizes both the original image and vessel candidate points. The proposed algorithm can segment thin vessels with high accuracy by mainly using vessel candidate points. The reliability of the points can be higher through robust segmentation in the projected 2D images where complex structures are simplified and thin vessels are more visible. Experimental results show that the proposed algorithm is superior to other active contour models. The proposed algorithms present a new method of segmenting the liver and its vessels. The auto-context algorithm shows that a human-designed curriculum (i.e., shape-prior learning) can improve generalization performance. The proposed contour learning technique can increase the accuracy of a CNN for image segmentation by focusing on its failures, represented by sparse contours. The vessel segmentation shows that minor vessel branches can be successfully segmented through vessel candidate points obtained by reducing the image dimension. The algorithms presented in this dissertation can be employed for later analysis of liver anatomy that requires accurate segmentation techniques.Chapter 1 Introduction 1 1.1 Background and motivation 1 1.2 Problem statement 3 1.3 Main contributions 6 1.4 Contents and organization 9 Chapter 2 Related Works 10 2.1 Overview 10 2.2 Convolutional neural networks 11 2.2.1 Architectures of convolutional neural networks 11 2.2.2 Convolutional neural networks in medical image segmentation 21 2.3 Liver and vessel segmentation 37 2.3.1 Classical methods for liver segmentation 37 2.3.2 Vascular image segmentation 40 2.3.3 Active contour models 46 2.3.4 Vessel topology-based active contour model 54 2.4 Motivation 60 Chapter 3 Liver Segmentation via Auto-Context Neural Network with Self-Supervised Contour Attention 62 3.1 Overview 62 3.2 Single-pass auto-context neural network 65 3.2.1 Skip-attention module 66 3.2.2 V-transition module 69 3.2.3 Liver-prior inference and auto-context 70 3.2.4 Understanding the network 74 3.3 Self-supervising contour attention 75 3.4 Learning the network 81 3.4.1 Overall loss function 81 3.4.2 Data augmentation 81 3.5 Experimental Results 83 3.5.1 Overview 83 3.5.2 Data configurations and target of comparison 84 3.5.3 Evaluation metric 85 3.5.4 Accuracy evaluation 87 3.5.5 Ablation study 93 3.5.6 Performance of generalization 110 3.5.7 Results from ground-truth variations 114 3.6 Discussion 116 Chapter 4 Liver Vessel Segmentation via Active Contour Model with Dense Vessel Candidates 119 4.1 Overview 119 4.2 Dense vessel candidates 124 4.2.1 Maximum intensity slab images 125 4.2.2 Segmentation of 2D vessel candidates and back-projection 130 4.3 Clustering of dense vessel candidates 135 4.3.1 Virtual gradient-assisted regional ACM 136 4.3.2 Localized regional ACM 142 4.4 Experimental results 145 4.4.1 Overview 145 4.4.2 Data configurations and environment 146 4.4.3 2D segmentation 146 4.4.4 ACM comparisons 149 4.4.5 Evaluation of bifurcation points 154 4.4.6 Computational performance 159 4.4.7 Ablation study 160 4.4.8 Parameter study 162 4.5 Application to portal vein analysis 164 4.6 Discussion 168 Chapter 5 Conclusion and Future Works 170 Bibliography 172 초록 197Docto

    Graph-based Data Modeling and Analysis for Data Fusion in Remote Sensing

    Get PDF
    Hyperspectral imaging provides the capability of increased sensitivity and discrimination over traditional imaging methods by combining standard digital imaging with spectroscopic methods. For each individual pixel in a hyperspectral image (HSI), a continuous spectrum is sampled as the spectral reflectance/radiance signature to facilitate identification of ground cover and surface material. The abundant spectrum knowledge allows all available information from the data to be mined. The superior qualities within hyperspectral imaging allow wide applications such as mineral exploration, agriculture monitoring, and ecological surveillance, etc. The processing of massive high-dimensional HSI datasets is a challenge since many data processing techniques have a computational complexity that grows exponentially with the dimension. Besides, a HSI dataset may contain a limited number of degrees of freedom due to the high correlations between data points and among the spectra. On the other hand, merely taking advantage of the sampled spectrum of individual HSI data point may produce inaccurate results due to the mixed nature of raw HSI data, such as mixed pixels, optical interferences and etc. Fusion strategies are widely adopted in data processing to achieve better performance, especially in the field of classification and clustering. There are mainly three types of fusion strategies, namely low-level data fusion, intermediate-level feature fusion, and high-level decision fusion. Low-level data fusion combines multi-source data that is expected to be complementary or cooperative. Intermediate-level feature fusion aims at selection and combination of features to remove redundant information. Decision level fusion exploits a set of classifiers to provide more accurate results. The fusion strategies have wide applications including HSI data processing. With the fast development of multiple remote sensing modalities, e.g. Very High Resolution (VHR) optical sensors, LiDAR, etc., fusion of multi-source data can in principal produce more detailed information than each single source. On the other hand, besides the abundant spectral information contained in HSI data, features such as texture and shape may be employed to represent data points from a spatial perspective. Furthermore, feature fusion also includes the strategy of removing redundant and noisy features in the dataset. One of the major problems in machine learning and pattern recognition is to develop appropriate representations for complex nonlinear data. In HSI processing, a particular data point is usually described as a vector with coordinates corresponding to the intensities measured in the spectral bands. This vector representation permits the application of linear and nonlinear transformations with linear algebra to find an alternative representation of the data. More generally, HSI is multi-dimensional in nature and the vector representation may lose the contextual correlations. Tensor representation provides a more sophisticated modeling technique and a higher-order generalization to linear subspace analysis. In graph theory, data points can be generalized as nodes with connectivities measured from the proximity of a local neighborhood. The graph-based framework efficiently characterizes the relationships among the data and allows for convenient mathematical manipulation in many applications, such as data clustering, feature extraction, feature selection and data alignment. In this thesis, graph-based approaches applied in the field of multi-source feature and data fusion in remote sensing area are explored. We will mainly investigate the fusion of spatial, spectral and LiDAR information with linear and multilinear algebra under graph-based framework for data clustering and classification problems

    Geometric data understanding : deriving case specific features

    Get PDF
    There exists a tradition using precise geometric modeling, where uncertainties in data can be considered noise. Another tradition relies on statistical nature of vast quantity of data, where geometric regularity is intrinsic to data and statistical models usually grasp this level only indirectly. This work focuses on point cloud data of natural resources and the silhouette recognition from video input as two real world examples of problems having geometric content which is intangible at the raw data presentation. This content could be discovered and modeled to some degree by such machine learning (ML) approaches like deep learning, but either a direct coverage of geometry in samples or addition of special geometry invariant layer is necessary. Geometric content is central when there is a need for direct observations of spatial variables, or one needs to gain a mapping to a geometrically consistent data representation, where e.g. outliers or noise can be easily discerned. In this thesis we consider transformation of original input data to a geometric feature space in two example problems. The first example is curvature of surfaces, which has met renewed interest since the introduction of ubiquitous point cloud data and the maturation of the discrete differential geometry. Curvature spectra can characterize a spatial sample rather well, and provide useful features for ML purposes. The second example involves projective methods used to video stereo-signal analysis in swimming analytics. The aim is to find meaningful local geometric representations for feature generation, which also facilitate additional analysis based on geometric understanding of the model. The features are associated directly to some geometric quantity, and this makes it easier to express the geometric constraints in a natural way, as shown in the thesis. Also, the visualization and further feature generation is much easier. Third, the approach provides sound baseline methods to more traditional ML approaches, e.g. neural network methods. Fourth, most of the ML methods can utilize the geometric features presented in this work as additional features.Geometriassa käytetään perinteisesti tarkkoja malleja, jolloin datassa esiintyvät epätarkkuudet edustavat melua. Toisessa perinteessä nojataan suuren datamäärän tilastolliseen luonteeseen, jolloin geometrinen säännönmukaisuus on datan sisäsyntyinen ominaisuus, joka hahmotetaan tilastollisilla malleilla ainoastaan epäsuorasti. Tämä työ keskittyy kahteen esimerkkiin: luonnonvaroja kuvaaviin pistepilviin ja videohahmontunnistukseen. Nämä ovat todellisia ongelmia, joissa geometrinen sisältö on tavoittamattomissa raakadatan tasolla. Tämä sisältö voitaisiin jossain määrin löytää ja mallintaa koneoppimisen keinoin, esim. syväoppimisen avulla, mutta joko geometria pitää kattaa suoraan näytteistämällä tai tarvitaan neuronien lisäkerros geometrisia invariansseja varten. Geometrinen sisältö on keskeinen, kun tarvitaan suoraa avaruudellisten suureiden havainnointia, tai kun tarvitaan kuvaus geometrisesti yhtenäiseen dataesitykseen, jossa poikkeavat näytteet tai melu voidaan helposti erottaa. Tässä työssä tarkastellaan datan muuntamista geometriseen piirreavaruuteen kahden esimerkkiohjelman suhteen. Ensimmäinen esimerkki on pintakaarevuus, joka on uudelleen virinneen kiinnostuksen kohde kaikkialle saatavissa olevan datan ja diskreetin geometrian kypsymisen takia. Kaarevuusspektrit voivat luonnehtia avaruudellista kohdetta melko hyvin ja tarjota koneoppimisessa hyödyllisiä piirteitä. Toinen esimerkki koskee projektiivisia menetelmiä käytettäessä stereovideosignaalia uinnin analytiikkaan. Tavoite on löytää merkityksellisiä paikallisen geometrian esityksiä, jotka samalla mahdollistavat muun geometrian ymmärrykseen perustuvan analyysin. Piirteet liittyvät suoraan johonkin geometriseen suureeseen, ja tämä helpottaa luonnollisella tavalla geometristen rajoitteiden käsittelyä, kuten väitöstyössä osoitetaan. Myös visualisointi ja lisäpiirteiden luonti muuttuu helpommaksi. Kolmanneksi, lähestymistapa suo selkeän vertailumenetelmän perinteisemmille koneoppimisen lähestymistavoille, esim. hermoverkkomenetelmille. Neljänneksi, useimmat koneoppimismenetelmät voivat hyödyntää tässä työssä esitettyjä geometrisia piirteitä lisäämällä ne muiden piirteiden joukkoon
    corecore