59,688 research outputs found

    Transfer Learning Improving Predictive Mortality Models for Patients in End-Stage Renal Disease

    Get PDF
    Deep learning is becoming a fundamental piece in the paradigm shift from evidence-based to data-based medicine. However, its learning capacity is rarely exploited when working with small data sets. Through transfer learning (TL), information from a source domain is transferred to a target one to enhance a learning task in such domain. The proposed TL mechanisms are based on sample and feature space augmentation. Thus, deep autoencoders extract complex representations for the data in the TL approach. Their latent representations, the so-called codes, are handled to transfer information among domains. The transfer of samples is carried out by computing a latent space mapping matrix that links codes from both domains for later reconstruction. The feature space augmentation is based on the computation of the average of the most similar codes from one domain. Such an average augments the features in a target domain. The proposed framework is evaluated in the prediction of mortality in patients in end-stage renal disease, transferring information related to the mortality of patients with acute kidney injury from the massive database MIMIC-III. Compared to other TL mechanisms, the proposed approach improves 6-11% in previous mortality predictive models. The integration of TL approaches into learning tasks in pathologies with data volume issues could encourage the use of data-based medicine in a clinical setting

    The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning

    Full text link
    Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision due to the inability of supervised models to learn representations that can generalize in domains with limited labels. The recent popularity of SSL has led to the development of several models that make use of diverse training strategies, architectures, and data augmentation policies with no existing unified framework to study or assess their effectiveness in transfer learning. We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each. Unlike existing approaches that consider mathematical approximations of the parameters, individual components, or optimization landscape, our work aims to explore the geometric properties of the representation manifolds learned by SSL models. Our proposed manifold graph metrics (MGMs) provide insights into the geometric similarities and differences between available SSL models, their invariances with respect to specific augmentations, and their performances on transfer learning tasks. Our key findings are two fold: (i) contrary to popular belief, the geometry of SSL models is not tied to its training paradigm (contrastive, non-contrastive, and cluster-based); (ii) we can predict the transfer learning capability for a specific model based on the geometric properties of its semantic and augmentation manifolds.Comment: 22 page

    ๋ถˆ์ถฉ๋ถ„ํ•œ ๊ณ ์žฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ํšŒ์ „ ๊ธฐ๊ณ„ ์ง„๋‹จ๊ธฐ์ˆ  ํ•™์Šต๋ฐฉ๋ฒ• ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ๊ธฐ๊ณ„ํ•ญ๊ณต๊ณตํ•™๋ถ€,2020. 2. ์œค๋ณ‘๋™.Deep Learning is a promising approach for fault diagnosis in mechanical applications. Deep learning techniques are capable of processing lots of data in once, and modelling them into desired diagnostic model. In industrial fields, however, we can acquire tons of data but barely useful including fault or failure data because failure in industrial fields is usually unacceptable. To cope with this insufficient fault data problem to train diagnostic model for rotating machinery, this thesis proposes three research thrusts: 1) filter-envelope blocks in convolution neural networks (CNNs) to incorporate the preprocessing steps for vibration signal; frequency filtering and envelope extraction for more optimal solution and reduced efforts in building diagnostic model, 2) cepstrum editing based data augmentation (CEDA) for diagnostic dataset consist of vibration signals from rotating machinery, and 3) selective parameter freezing (SPF) for efficient parameter transfer in transfer learning. The first research thrust proposes noble types of functional blocks for neural networks in order to learn robust feature to the vibration data. Conventional neural networks including convolution neural network (CNN), is tend to learn biased features when the training data is acquired from small cases of conditions. This can leads to unfavorable performance to the different conditions or other similar equipment. Therefore this research propose two neural network blocks which can be incorporated to the conventional neural networks and minimize the preprocessing steps, filter block and envelope block. Each block is designed to learn frequency filter and envelope extraction function respectively, in order to induce the neural network to learn more robust and generalized features from limited vibration samples. The second thrust presents a new data augmentation technique specialized for diagnostic data of vibration signals. Many data augmentation techniques exist for image data with no consideration for properties of vibration data. Conventional techniques for data augmentation, such as flipping, rotating, or shearing are not proper for 1-d vibration data can harm the natural property of vibration signal. To augment vibration data without losing the properties of its physics, the proposed method generate new samples by editing the cepstrum which can be done by adjusting the cepstrum component of interest. By doing reverse transform to the edited cepstrum, the new samples is obtained and this results augmented dataset which leads to higher accuracy for the diagnostic model. The third research thrust suggests a new parameter repurposing method for parameter transfer, which is used for transfer learning. The proposed SPF selectively freezes transferred parameters from source network and re-train only unnecessary parameters for target domain to reduce overfitting and preserve useful source features when the target data is limited to train diagnostic model.๋”ฅ๋Ÿฌ๋‹์€ ๊ธฐ๊ณ„ ์‘์šฉ ๋ถ„์•ผ์˜ ๊ฒฐํ•จ ์ง„๋‹จ์„ ์œ„ํ•œ ์œ ๋งํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์ด๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์€ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ง„๋‹จ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ์„ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฐ์—… ๋ถ„์•ผ์—์„œ๋Š” ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์—†๊ฑฐ๋‚˜ ์–ป์„ ์ˆ˜ ์žˆ๋”๋ผ๋„ ๊ณ ์žฅ ๋ฐ์ดํ„ฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํš๋“ํ•˜๊ธฐ ๋งค์šฐ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์˜ ์‚ฌ์šฉ์€ ์‰ฝ์ง€ ์•Š๋‹ค. ํšŒ์ „ ๊ธฐ๊ณ„์˜ ์ง„๋‹จ์„ ์œ„ํ•˜์—ฌ ๋”ฅ๋Ÿฌ๋‹์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๊ณ ์žฅ ๋ฐ์ดํ„ฐ ๋ถ€์กฑ ๋ฌธ์ œ์— ๋Œ€์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด ์ด ๋…ผ๋ฌธ์€ 3 ๊ฐ€์ง€ ์—ฐ๊ตฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. 1) ํ–ฅ์ƒ๋œ ์ง„๋™ ํŠน์ง• ํ•™์Šต์„ ์œ„ํ•œ ํ•„ํ„ฐ-์—”๋ฒจ๋กญ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ 2) ์ง„๋™๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์œ„ํ•œ Cepstrum ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๋Ÿ‰๋ฒ•3) ์ „์ด ํ•™์Šต์—์„œ ํšจ์œจ์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์ „์ด๋ฅผ ์œ„ํ•œ ์„ ํƒ์  ํŒŒ๋ผ๋ฏธํ„ฐ ๋™๊ฒฐ๋ฒ•. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ง„๋™ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฐ•๊ฑดํ•œ ํŠน์ง•์„ ๋ฐฐ์šฐ๊ธฐ ์œ„ํ•ด ์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ ๋„คํŠธ์›Œํฌ ๋ธ”๋ก๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์„ ํฌํ•จํ•˜๋Š” ์ข…๋ž˜์˜ ์‹ ๊ฒฝ๋ง์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์ž‘์€ ๊ฒฝ์šฐ์— ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํŽธํ–ฅ๋œ ํŠน์ง•์„ ๋ฐฐ์šฐ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋‹ค๋ฅธ ์กฐ๊ฑด์—์„œ ์ž‘๋™ํ•˜๋Š” ๊ฒฝ์šฐ๋‚˜ ๋‹ค๋ฅธ ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ์ ์šฉ๋˜์—ˆ์„ ๋•Œ ๋‚ฎ์€ ์ง„๋‹จ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ์‹ ๊ฒฝ๋ง์— ํ•จ๊ป˜ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ํ•„ํ„ฐ ๋ธ”๋ก ๋ฐ ์—”๋ฒจ๋กญ ๋ธ”๋ก์„ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ ๋ธ”๋ก์€ ์ฃผํŒŒ์ˆ˜ ํ•„ํ„ฐ์™€ ์—”๋ฒจ๋กญ ์ถ”์ถœ ๊ธฐ๋Šฅ์„ ๋„คํŠธ์›Œํฌ ๋‚ด์—์„œ ์Šค์Šค๋กœ ํ•™์Šตํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์ด ์ œํ•œ๋œ ํ•™์Šต ์ง„๋™๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ณด๋‹ค ๊ฐ•๊ฑดํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ๋œ ํŠน์ง•์„ ํ•™์Šตํ•˜๋„๋ก ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ง„๋™ ์‹ ํ˜ธ์˜ ์ง„๋‹จ ๋ฐ์ดํ„ฐ์— ํŠนํ™”๋œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ฆ๋Ÿ‰๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋’ค์ง‘๊ธฐ, ํšŒ์ „ ๋˜๋Š” ์ „๋‹จ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ํ™•๋Œ€๋ฅผ ์œ„ํ•œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ๊ธฐ์กด์˜ ๊ธฐ์ˆ ์ด 1 ์ฐจ์› ์ง„๋™ ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ง„๋™ ์‹ ํ˜ธ์˜ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์— ๋งž์ง€ ์•Š๋Š” ์‹ ํ˜ธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์„ ์žƒ์ง€ ์•Š๊ณ  ์ง„๋™ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๋Ÿ‰ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ cepstrum์˜ ์ฃผ์š”์„ฑ๋ถ„์„ ์ถ”์ถœํ•˜๊ณ  ์กฐ์ •ํ•˜์—ฌ ์—ญ cepstrum์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ฆ๋Ÿ‰๋ค ๋ฐ์ดํ„ฐ์„ธํŠธ๋Š” ์ง„๋‹จ ๋ชจ๋ธ ํ•™์Šต์— ๋Œ€ํ•ด ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜จ๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ „์ด ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ „์ด๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํŒŒ๋ผ๋ฏธํ„ฐ ์žฌํ•™์Šต๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์„ ํƒ์  ํŒŒ๋ผ๋ฏธํ„ฐ ๋™๊ฒฐ๋ฒ•์€ ์†Œ์Šค ๋„คํŠธ์›Œํฌ์—์„œ ์ „์ด๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ ํƒ์ ์œผ๋กœ ๋™๊ฒฐํ•˜๊ณ  ๋Œ€์ƒ ๋„๋ฉ”์ธ์— ๋Œ€ํ•ด ๋ถˆํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์žฌํ•™์Šตํ•˜์—ฌ ๋Œ€์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ์ง„๋‹จ ๋ชจ๋ธ์— ์žฌํ•™์Šต๋  ๋•Œ์˜ ๊ณผ์ ํ•ฉ์„ ์ค„์ด๊ณ  ์†Œ์Šค ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ๋ณด์กดํ•œ๋‹ค. ์ œ์•ˆ๋œ ์„ธ ๋ฐฉ๋ฒ•์€ ๋…๋ฆฝ์ ์œผ๋กœ ๋˜๋Š” ๋™์‹œ์— ์ง„๋‹จ๋ชจ๋ธ์— ์‚ฌ์šฉ๋˜์–ด ๋ถ€์กฑํ•œ ๊ณ ์žฅ๋ฐ์ดํ„ฐ๋กœ ์ธํ•œ ์ง„๋‹จ์„ฑ๋Šฅ์˜ ๊ฐ์†Œ๋ฅผ ๊ฒฝ๊ฐํ•˜๊ฑฐ๋‚˜ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.Chapter 1 Introduction 13 1.1 Motivation 13 1.2 Research Scope and Overview 15 1.3 Structure of the Thesis 19 Chapter 2 Literature Review 20 2.1 Deep Neural Networks 20 2.2 Transfer Learning and Parameter Transfer 23 Chapter 3 Description of Testbed Data 26 3.1 Bearing Data I: Case Western Reserve University Data 26 3.2 Bearing Data II: Accelerated Life Test Test-bed 27 Chapter 4 Filter-Envelope Blocks in Neural Network for Robust Feature Learning 32 4.1 Preliminary Study of Problems In Use of CNN for Vibration Signals 34 4.1.1 Class Confusion Problem of CNN Model to Different Conditions 34 4.1.2 Benefits of Frequency Filtering and Envelope Extraction for Fault Diagnosis in Vibration Signals 37 4.2 Proposed Network Block 1: Filter Block 41 4.2.1 Spectral Feature Learning in Neural Network 42 4.2.2 FIR Band-pass Filter in Neural Network 45 4.2.3 Result and Discussion 48 4.3 Proposed Neural Block 2: Envelope Block 48 4.3.1 Max-Average Pooling Block for Envelope Extraction 51 4.3.2 Adaptive Average Pooling for Learnable Envelope Extractor 52 4.3.3 Result and Discussion 54 4.4 Filter-Envelope Network for Fault Diagnosis 56 4.4.1 Combinations of Filter-Envelope Blocks for the use of Rolling Element Bearing Fault Diagnosis 56 4.4.2 Summary and Discussion 58 Chapter 5 Cepstrum Editing Based Data Augmentation for Vibration Signals 59 5.1 Brief Review of Data Augmentation for Deep Learning 59 5.1.1 Image Augmentation to Enlarge Training Dataset 59 5.1.2 Data Augmentation for Vibration Signal 61 5.2 Cepstrum Editing based Data Augmentation 62 5.2.1 Cepstrum Editing as a Signal Preprocessing 62 5.2.2 Cepstrum Editing based Data Augmentation 64 5.3 Results and Discussion 65 5.3.1 Performance validation to rolling element bearing diagnosis 65 Chapter 6 Selective Parameter Freezing for Parameter Transfer with Small Dataset 71 6.1 Overall Procedure of Selective Parameter Freezing 72 6.2 Determination Sensitivity of Source Network Parameters 75 6.3 Case Study 1: Transfer to Different Fault Size 76 6.3.1 Performance by hyperparameter ฮฑ 77 6.3.2 Effect of the number of training samples and network size 79 6.4 Case Study 2: Transfer from Artificial to Natural Fault 81 6.4.1 Diagnostic performance for proposed method 82 6.4.2 Visualization of frozen parameters by hyperparameter ฮฑ 83 6.4.3 Visual inspection of feature space 85 6.5 Conclusion 87 Chapter 7 91 7.1 Contributions and Significance 91Docto

    Integrating Semantic Knowledge to Tackle Zero-shot Text Classification

    Get PDF
    Insufficient or even unavailable training data of emerging classes is a big challenge of many classification tasks, including text classification. Recognising text documents of classes that have never been seen in the learning stage, so-called zero-shot text classification, is therefore difficult and only limited previous works tackled this problem. In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. Four kinds of semantic knowledge (word embeddings, class descriptions, class hierarchy, and a general knowledge graph) are incorporated into the proposed framework to deal with instances of unseen classes effectively. Experimental results show that each and the combination of the two phases achieve the best overall accuracy compared with baselines and recent approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201

    Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

    Full text link
    Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets. This paper addresses both of those challenges, through an image to video feature-level domain adaptation approach, to learn discriminative video frame representations. The framework utilizes large-scale unlabeled video data to reduce the gap between different domains while transferring discriminative knowledge from large-scale labeled still images. Given a face recognition network that is pretrained in the image domain, the adaptation is achieved by (i) distilling knowledge from the network to a video adaptation network through feature matching, (ii) performing feature restoration through synthetic data augmentation and (iii) learning a domain-invariant feature through a domain adversarial discriminator. We further improve performance through a discriminator-guided feature fusion that boosts high-quality frames while eliminating those degraded by video domain-specific factors. Experiments on the YouTube Faces and IJB-A datasets demonstrate that each module contributes to our feature-level domain adaptation framework and substantially improves video face recognition performance to achieve state-of-the-art accuracy. We demonstrate qualitatively that the network learns to suppress diverse artifacts in videos such as pose, illumination or occlusion without being explicitly trained for them.Comment: accepted for publication at International Conference on Computer Vision (ICCV) 201

    Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach

    Get PDF
    BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P \u3c .001 for all measures and all conditions). Using a rich set of learning features contributed to ADS\u27s performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS\u27s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request
    • โ€ฆ
    corecore