Search CORE

25 research outputs found

부분 정보를 이용한 시각 데이터의 구조화 된 이해: 희소성, 무작위성, 연관성, 그리고 딥 네트워크

Author: 이동훈
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. Oh, Songhwai.For a deeper understanding of visual data, a relationship between local parts and a global scene has to be carefully examined. Examples of such relationships related to vision problems include but not limited to detecting a region of interest in the scene, classifying an image based on limited visual cues, and synthesizing new images conditioned on the local or global inputs. In this thesis, we aim to learn the relationship and demonstrate its importance by showing that it is one of critical keys to address four challenging vision problems mentioned above. For each problem, we construct deep neural networks that suit for each task. The first problem considered in the thesis is object detection. It requires not only finding local patches that look like target objects conditioned on the context of input scene but also comparing local patches themselves to assign a single detection for each object. To this end, we introduce individualness of detection candidates as a complement to objectness for object detection. The individualness assigns a single detection for each object out of raw detection candidates given by either object proposals or sliding windows. We show that conventional approaches, such as non-maximum suppression, are sub-optimal since they suppress nearby detections using only detection scores. We use a determinantal point process combined with the individualness to optimally select final detections. It models each detection using its quality and similarity to other detections based on the individualness. Then, detections with high detection scores and low correlations are selected by measuring their probability using a determinant of a matrix, which is composed of quality terms on the diagonal entries and similarities on the off-diagonal entries. For concreteness, we focus on the pedestrian detection problem as it is one of the most challenging problems due to frequent occlusions and unpredictable human motions. Experimental results demonstrate that the proposed algorithm works favorably against existing methods, including non-maximal suppression and a quadratic unconstrained binary optimization based method. For a second problem, we classify images based on observations of local patches. More specifically, we consider the problem of estimating the head pose and body orientation of a person from a low-resolution image. Under this setting, it is difficult to reliably extract facial features or detect body parts. We propose a convolutional random projection forest (CRPforest) algorithm for these tasks. A convolutional random projection network (CRPnet) is used at each node of the forest. It maps an input image to a high-dimensional feature space using a rich filter bank. The filter bank is designed to generate sparse responses so that they can be efficiently computed by compressive sensing. A sparse random projection matrix can capture most essential information contained in the filter bank without using all the filters in it. Therefore, the CRPnet is fast, e.g., it requires 0.04ms to process an image of 50×50 pixels, due to the small number of convolutions (e.g., 0.01% of a layer of a neural network) at the expense of less than 2% accuracy. The overall forest estimates head and body pose well on benchmark datasets, e.g., over 98% on the HIIT dataset, while requiring at 3.8ms without using a GPU. Extensive experiments on challenging datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in low-resolution images with noise, occlusion, and motion blur. Then, we shift our attention to image synthesis based on the local-global relationship. Learning how to synthesize and place object instances into an image (semantic map) based on the scene context is a challenging and interesting problem in vision and learning. On one hand, solving this problem requires a joint decision of (a) generating an object mask from a certain class at a plausible scale, location, and shape, and (b) inserting the object instance mask into an existing scene so that the synthesized content is semantically realistic. On the other hand, such a model can synthesize realistic outputs to potentially facilitate numerous image editing and scene parsing tasks. In this paper, we propose an end-to-end trainable neural network that can synthesize and insert object instances into an image via a semantic map. The proposed network contains two generative modules that determine where the inserted object should be (i.e., location and scale) and what the object shape (and pose) should look like. The two modules are connected together with a spatial transformation network and jointly trained and optimized in a purely data-driven way. Specifically, we propose a novel network architecture with parallel supervised and unsupervised paths to guarantee diverse results. We show that the proposed network architecture learns the context-aware distribution of the location and shape of object instances to be inserted, and it can generate realistic and statistically meaningful object instances that simultaneously address the where and what sub-problems. As the final topic of the thesis, we introduce a new vision problem: generating an image based on a small number of key local patches without any geometric prior. In this work, key local patches are defined as informative regions of the target object or scene. This is a challenging problem since it requires generating realistic images and predicting locations of parts at the same time. We construct adversarial networks to tackle this problem. A generator network generates a fake image as well as a mask based on the encoder-decoder framework. On the other hand, a discriminator network aims to detect fake images. The network is trained with three losses to consider spatial, appearance, and adversarial information. The spatial loss determines whether the locations of predicted parts are correct. Input patches are restored in the output image without much modification due to the appearance loss. The adversarial loss ensures output images are realistic. The proposed network is trained without supervisory signals since no labels of key parts are required. Experimental results on seven datasets demonstrate that the proposed algorithm performs favorably on challenging objects and scenes.시각 데이터를 심도 깊게 이해하기 위해서는 전체 영역과 부분 영역들 간의 연관성 혹은 상호 작용을 주의 깊게 분석하는 것이 필요하다. 이에 관련된 컴퓨터 비전 문제로는 이미지에서 원하는 부분을 검출한다던지, 제한된 부분적인 정보만으로 전체 이미지를 판별 하거나, 혹은 주어진 정보로부터 원하는 이미지를 생성하는 등이 있다. 이 논문에서는, 그 연관성을 학습하는 것이 앞서 언급된 다양한 문제들을 푸는데 중요한 열쇠가 된다는 것을 보여주고자 한다. 이에 더해서, 각각의 문제에 알맞는 딥 네트워크의 디자인 또한 토의하고자 한다. 첫 주제로, 물체 검출 방식에 대해 분석하고자 한다. 이 문제는 타겟 물체와 비슷하게 생긴 영역을 찾아야 할 뿐 아니라, 찾아진 영역들 사이에 연관성을 분석함으로써 각 물체 마다 단 하나의 검출 결과를 할당시켜야 한다. 이를 위해, 우리는 objectness에 대한 보완으로써 individualness라는 개념을 제안 하였다. 이는 임의의 방식으로 얻어진 후보 물체 영역 중 하나씩을 물체 마다 할당하는데 쓰이는데, 이것은 검출 스코어만을 바탕으로 후처리를 하는 기존의 non-maximum suppression 등의 방식이 sub-optimal 결과를 얻을 수 밖에 없기 때문에 이를 개선하고자 도입하였다. 우리는 후보 물체 영역으로부터 최적의 영역들을 선택하기 위해서, determinantal point process라는 random process의 일종을 사용하였다. 이것은 먼저 각각의 검출 결과를 그것의 quality(검출 스코어)와 다른 검출 결과들 사이에 individualness를 바탕으 로 계산된 similarity(상관 관계)를 이용해 모델링 한다. 그 후, 각각의 검출 결과가 선택될 확률을 quality와 similarity에 기반한 커널의 determinant로 표현한다. 그 커널에 diagonal 부분에는 quality가 들어가고, off-diagonal에는 similarity가 대입 된다. 따라서, 어떤 검출 후보가 최종 검출 결과로 선택될 확률이 높아지기 위해서는, 높은 quality를 가짐과 동시에 다른 검출 결과들과 낮은 similarity를 가져야 한다. 이 논문에서는 보행자 검출에 집중하였는데, 이는 보행자 검출이 중요한 문제이면서도, 다른 물체들에 비해 자주 가려지고 다양한 움직임을 보이는 검출이 어려운 물체이기 때문이다. 실험 결과는 제안한 방법이 non-maximum suppression 혹은 quadratic unconstrained binary optimization 방식들 보다 우수함을 보여주었다. 다음 문제로는, 부분 정보를 이용해서 전체 이미지를 classify하는 것을 고려한다. 다양한 classification 문제 중에, 이 논문에서는 저해상도 이미지로부터 사람의 머리와 몸이 향하는 방향을 알아내는 문제에 집중하였다. 이 경우에는, 눈, 코, 입 등을 찾거나, 몸의 파트를 정확히 알아내는 것이 어렵다. 이를 위해, 우리는 convolutional random projection forest (CRPforest)라는 방식을 제안하였다. 이 forest에 각각의 node 안에는 convolutional random projection network (CRPnet)이 들어있는데, 이는 다양한 필터를 이용해서 인풋 이미지를 높은 차원으로 mapping 한다. 이를 효율적으로 다루기 위해 sparse한 결과를 얻을 수 있는 필터들을 사용함으로써, 압축 센싱 개념을 도입 할 수 있도록 하였다. 즉, 실제로는 적은 수의 필터만을 사용해서 전체 이미지의 중요한 정보를 모두 담고자 하는 것이다. 따라서 CRPnet은 50×50 픽셀 이미지에서 0.04ms 만에 동작 할 수 있을 정도로 매우 빠르며, 동시에 성능 하락은 2% 정도로 미미한 결과를 보여주었다. 이를 바탕으로 한 전체 forest는 GPU 없이 3.8ms 안에 동작하며, 머리와 몸통 방향 측정에 대해 다양한 데이터셋에서 최고의 성능을 보여주었다. 또한, 저해상도, 노이즈, 가려짐, 블러 등의 다양한 경우에도 좋은 성능을 보여주었다. 다음으로 부분-전체의 연관성을 통한 이미지 생성 문제를 탐구한다. 입력 이미지 상에 어떤 물체를 어떻게 놓을 것인지를 유추하는 것은 컴퓨터 비전과 기계 학습의 입장에서 아주 흥미로운 문제이다. 이는 먼저, 물체의 마스크를 적절한 크기, 위치, 모양으로 만들면서 동시에 그 물체가 입력 이미지 상에 놓여졌을 때에도 합리적으로 보일 수 있도록 해야 한다. 그렇게 된다면, image editing 혹은 scene parsing 등의 다양한 문제에 응용 될 수 있다. 이 논문에서는, 입력 semantic map으로 부터 새로운 물체를 알맞은 곳에 놓는 문제를 end-to-end 방식으로 학습 가능한 딥 네트워크를 구성하고자 한다. 이를 위해, where 모듈과 what 모듈을 바탕으로 하는 네트워크를 구성하였으며, 두 모듈을 spatial transformer network을 통해 연결하여 동시에 학습이 가능하도록 하였다. 또한, 각각의 모듈에 지도적 학습 경로와 비지도적 학습 경로를 병렬적으로 배치하여 동일한 입력으로 부터 다양한 결과를 얻을 수 있게 하였다. 실험을 통해, 제안한 방식이 삽입될 물체의 위치와 모양에 대한 분포를 동시에 학습 할 수 있고, 그 분포로부터 실제와 유사한 물체를 알맞은 곳에 놓을 수 있음을 보였다. 마지막으로 고려할 문제는, 컴퓨터 비전 분야에 새로운 문제로써, 위치 정보가 상실 된 적은 수의 부분 패치들을 바탕으로 전체 이미지를 복원하는 것이다. 이것은 이미지 생성과 동시에 각 패치의 위치 정보를 추측해야 하기에 어려운 문제가 된다. 우리는 적대적 네트워크를 바탕으로 이 문제를 해결하고자 하였다. 즉, 생성 네트워크는 encoder-decoder 방식을 이용해서 이미지와 위치 마스크를 찾고자 하는 반면에, 판별 네트워크는 생성된 가짜 이미지를 찾으려고 한다. 그리고 전체 네트워크는 위치, 겉보기, 적대적 경쟁의 세 가지 목적 함수들로 학습이 된다. 위치 목적 함수는 알맞은 위치를 예측하기 위해 사용되었고, 겉보기 목적 함수는 입력 패치 들이 결과 이미지 상에 적은 변화만을 가지고 남아있도록 하기 위해 사용되었으며, 적대적 경쟁 목적 함수는 생성된 이미지가 실제 이미지와 비슷할 수 있도록 하기 위해 적용되었다. 이렇게 구성된 네트워크는 별도의 annotation 없이 기존 데이터셋 들을 바탕으로 학습이 가능한 장점이 있다. 또한 실험을 통해, 제안한 방식이 다양한 데이터셋에서 잘 동작함을 보였다.1 Introduction 1 1.1 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . 5 2 Related Work 9 2.1 Detection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Orientation estimation methods . . . . . . . . . . . . . . . . . . . . 11 2.3 Instance synthesis methods . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Image generation methods . . . . . . . . . . . . . . . . . . . . . . . 15 3 Pedestrian detection 19 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Determinantal Point Process Formulation . . . . . . . . . . 22 3.2.2 Quality Term . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Individualness and Diversity Feature . . . . . . . . . . . . . 25 3.2.4 Mode Finding . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.5 Relationship to Quadratic Unconstrained Binary Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.3 DET curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.5 Effectiveness of the quality and similarity term design . . . 44 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Head and body orientation estimation 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Algorithmic Overview . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Rich Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3.1 Compressed Filter Bank . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Box Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Convolutional Random Projection Net . . . . . . . . . . . . . . . . 58 4.4.1 Input Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4.2 Convolutional and ReLU Layers . . . . . . . . . . . . . . . 60 4.4.3 Random Projection Layer . . . . . . . . . . . . . . . . . . . 61 4.4.4 Fully-Connected and Output Layers . . . . . . . . . . . . . 62 4.5 Convolutional Random Projection Forest . . . . . . . . . . . . . . 62 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.1 Evaluation Datasets . . . . . . . . . . . . . . . . . . . . . . 65 4.6.2 CRPnet Characteristics . . . . . . . . . . . . . . . . . . . . 66 4.6.3 Head and Body Orientation Estimation . . . . . . . . . . . 67 4.6.4 Analysis of the Proposed Algorithm . . . . . . . . . . . . . 87 4.6.5 Classification Examples . . . . . . . . . . . . . . . . . . . . 87 4.6.6 Regression Examples . . . . . . . . . . . . . . . . . . . . . . 100 4.6.7 Experiments on the Original Datasets . . . . . . . . . . . . 100 4.6.8 Dataset Corrections . . . . . . . . . . . . . . . . . . . . . . 100 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5 Instance synthesis and placement 109 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2.1 The where module: learning a spatial distribution of object instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2.2 The what module: learning a shape distribution of object instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2.3 The complete pipeline . . . . . . . . . . . . . . . . . . . . . 120 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6 Image generation 129 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.2.1 Key Part Detection . . . . . . . . . . . . . . . . . . . . . . 135 6.2.2 Part Encoding Network . . . . . . . . . . . . . . . . . . . . 135 6.2.3 Mask Prediction Network . . . . . . . . . . . . . . . . . . . 137 6.2.4 Image Generation Network . . . . . . . . . . . . . . . . . . 138 6.2.5 Real-Fake Discriminator Network . . . . . . . . . . . . . . . 139 6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.3.2 Image Generation Results . . . . . . . . . . . . . . . . . . . 142 6.3.3 Experimental Details . . . . . . . . . . . . . . . . . . . . . . 150 6.3.4 Image Generation from Local Patches . . . . . . . . . . . . 150 6.3.5 Part Combination . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.6 Unsupervised Feature Learning . . . . . . . . . . . . . . . . 151 6.3.7 An Alternative Objective Function . . . . . . . . . . . . . . 151 6.3.8 An Alternative Network Structure . . . . . . . . . . . . . . 151 6.3.9 Different Number of Input Patches . . . . . . . . . . . . . . 152 6.3.10 Smaller Size of Input Patches . . . . . . . . . . . . . . . . . 153 6.3.11 Degraded Input Patches . . . . . . . . . . . . . . . . . . . . 153 6.3.12 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.3.13 Failure cases . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 7 Conclusion and Future Work 179Docto

Advancing Statistical Inference For Population Studies In Neuroimaging Using Machine Learning

Author: Varol Erdem
Publication venue: ScholarlyCommons
Publication date: 01/01/2018
Field of study

Modern neuroimaging techniques allow us to investigate the brain in vivo and in high resolution, providing us with high dimensional information regarding the structure and the function of the brain in health and disease. Statistical analysis techniques transform this rich imaging information into accessible and interpretable knowledge that can be used for investigative as well as diagnostic and prognostic purposes. A prevalent area of research in neuroimaging is group comparison, i.e., the comparison of the imaging data of two groups (e.g. patients vs. healthy controls or people who respond to treatment vs. people who don\u27t) to identify discriminative imaging patterns that characterize different conditions. In recent years, the neuroimaging community has adopted techniques from mathematics, statistics, and machine learning to introduce novel methodologies targeting the improvement of our understanding of various neuropsychiatric and neurodegenerative disorders. However, existing statistical methods are limited by their reliance on ad-hoc assumptions regarding the homogeneity of disease effect, spatial properties of the underlying signal and the covariate structure of data, which imposes certain constraints about the sampling of datasets. 1. First, the overarching assumption behind most analytical tools, which are commonly used in neuroimaging studies, is that there is a single disease effect that differentiates the patients from controls. In reality, however, the disease effect may be heterogeneously expressed across the patient population. As a consequence, when searching for a single imaging pattern that characterizes the difference between healthy controls and patients, we may only get a partial or incomplete picture of the disease effect. 2. Second, and importantly, most analyses assume a uniform shape and size of disease effect. As a consequence, a common step in most neuroimaging analyses it to apply uniform smoothing of the data to aggregate regional information to each voxel to improve the signal to noise ratio. However, the shape and size of the disease patterns may not be uniformly represented across the brain. 3. Lastly, in practical scenarios, imaging datasets commonly include variations due to multiple covariates, which often have effects that overlap with the searched disease effects. To minimize the covariate effects, studies are carefully designed by appropriately matching the populations under observation. The difficulty of this task is further exacerbated by the advent of big data analyses that often entail the aggregation of large datasets collected across many clinical sites. The goal of this thesis is to address each of the aforementioned assumptions and limitations by introducing robust mathematical formulations, which are founded on multivariate machine learning techniques that integrate discriminative and generative approaches. Specifically, 1. First, we introduce an algorithm termed HYDRA which stands for heterogeneity through discriminative analysis. This method parses the heterogeneity in neuroimaging studies by simultaneously performing clustering and classification by use of piecewise linear decision boundaries. 2. Second, we propose to perform regionally linear multivariate discriminative statistical mapping (MIDAS) toward finding the optimal level of variable smoothing across the brain anatomy and tease out group differences in neuroimaging datasets. This method makes use of overlapping regional discriminative filters to approximate a matched filter that best delineates the underlying disease effect. 3. Lastly, we develop a method termed generative discriminative machines (GDM) toward reducing the effect of confounds in biased samples. The proposed method solves for a discriminative model that can also optimally generate the data when taking into account the covariate structure. We extensively validated the performance of the developed frameworks in the presence of diverse types of simulated scenarios. Furthermore, we applied our methods on a large number of clinical datasets that included structural and functional neuroimaging data as well as genetic data. Specifically, HYDRA was used for identifying distinct subtypes of Alzheimer\u27s Disease. MIDAS was applied for identifying the optimally discriminative patterns that differentiated between truth-telling and lying functional tasks. GDM was applied on a multi-site prediction setting with severely confounded samples. Our promising results demonstrate the potential of our methods to advance neuroimaging analysis beyond the set of assumptions that limit its capacity and improve statistical power

Data-driven methods for statistical verification of uncertain nonlinear systems

Author: Quindlen John Francis
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2018
Field of study

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2018.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 277-290).Due to the increasing complexity of autonomous, adaptive, and nonlinear systems, engineers commonly rely upon statistical techniques to verify that the closed-loop system satisfies specified performance requirements at all possible operating conditions. However, these techniques require a large number of simulations or experiments to exhaustively search the set of possible parametric uncertainties for conditions that lead to failure. This work focuses on resource-constrained applications, such as preliminary control system design or experimental testing, which cannot rely upon exhaustive search to analyze the robustness of the closed-loop system to those requirements. This thesis develops novel statistical verification frameworks that combine data-driven statistical learning techniques and control system verification. First, two frameworks are introduced for verification of deterministic systems with binary and non-binary evaluations of each trajectory's robustness. These frameworks implement machine learning models to learn and predict the satisfaction of the requirements over the entire set of possible parameters from a small set of simulations or experiments. In order to maximize prediction accuracy, closed-loop verification techniques are developed to iteratively select parameter settings for subsequent tests according to their expected improvement of the predictions. Second, extensions of the deterministic verification frameworks redevelop these procedures for stochastic systems and these new stochastic frameworks achieve similar improvements. Lastly, the thesis details a method for transferring information between simulators or from simulators to experiments. Moreover, this method is introduced as part of a new failure-adverse closed-loop verification framework, which is shown to successfully minimize the number of failures during experimental verification without undue conservativeness. Ultimately, these data-driven verification frameworks provide principled approaches for efficient verification of nonlinear systems at all stages in the control system development cycle.by John Francis Quindlen.Ph. D