44 research outputs found

    ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ฌธ๋งฅ ์ •๋ณด ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์–ดํ…์…˜์„ ํ™œ์šฉํ•˜๋Š” ๊ณ„์ธต์  ๋ฌธ๋งฅ ์ธ์ฝ”๋”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.์ตœ๊ทผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ์œ„ํ•œ ํ‘œ์ค€ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์—์„œ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋กœ ๋ฐœ์ „ํ–ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋Š” ํ† ํฐ ๊ฐ„์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐ ๊ฐ•์ ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์ถ”์ถœํ•œ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์ ์ ˆํ•œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•˜๋Š” attention layer๋“ค๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐœ์ „์€ ์ตœ๊ทผ ๋”ฅ ๋Ÿฌ๋‹ ์‚ฌํšŒ์— ์ฃผ์–ด์ง„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ฐ–์˜ ์ถ”๊ฐ€ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋„์ „์„ ์ œ์‹œํ–ˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž‘์—…์—์„œ ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์™ธ์— ์ถ”๊ฐ€์ ์ธ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๊ณผ ๋ถ„์„์„ attention layer์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ์ด์ „ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๋‚ด์žฅํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๋‚ด์žฅ๋œ ๋ฌธ๋งฅ ํ‘œํ˜„์„ ์ž…๋ ฅ ํ‘œํ˜„์— ์œตํ•ฉํ•˜๋Š” ๊ณ„์ธต์  ๋ฉ”๋ชจ๋ฆฌ ์ปจํ…์ŠคํŠธ ์ธ์ฝ”๋”(HMCE)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ HMCE๋Š” ๋‹ค์–‘ํ•œ ๋ฌธ๋งฅ ์ธ์ง€ ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ์ž‘์—…์—์„œ ์ถ”๊ฐ€ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๋Š” ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋น„๊ตํ•˜์˜€์„ ๋•Œ ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฌธ๋งฅ ํ‘œํ˜„๊ณผ ์ž…๋ ฅ ํ‘œํ˜„ ์‚ฌ์ด์˜ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ๋งฅ ํ‘œํ˜„๊ณผ ์ž…๋ ฅ ํ‘œํ˜„ ์‚ฌ์ด์˜ ํ‘œํ˜„ ์œ ์‚ฌ์„ฑ์„ Centered Kernel Alignment(CKA)๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹ฌ์ธต ๋ถ„์„ํ•˜๋ฉฐ, CKA๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ์‹œ๊ฐ ์–‘์‹์œผ๋กœ ์ฃผ์–ด์ง€๋Š” ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•ด CKA ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ •๋ ฌ ๋ฐฉ๋ฒ•์œผ๋กœ ํ™•์žฅํ•œ๋‹ค. ์ด Modality Alignment ๋ฐฉ๋ฒ•์€ ๋ฉ€ํ‹ฐ ๋ชจ๋‹ฌ๊ฐ„ ํ‘œํ˜„ ์œ ์‚ฌ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜์—ฌ ๋น„๋””์˜ค ์งˆ๋ฌธ ์‘๋‹ต ์ž‘์—…์—์„œ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜จ๋‹ค.Recently, the standard architecture for Natural Language Processing (NLP) has evolved from Recurrent Neural Network to Transformer architecture. Transformer architecture consists of attention layers which show its strength at finding the correlation between tokens and incorporate the correlation information to generate proper output. While many researches leveraging Transformer architecture report the new state-of-the-arts performances on various NLP tasks, These recent improvements propose a new challenge to deep learning society: exploiting additional context information. Because human intelligence perceives signals in everyday life with much rich contextual information (e.g. additional memory, visual information, and common sense), exploiting the context information is a step forward to the ultimate goal for Artificial Intelligence. In this dissertation, I propose novel methodologies and analyses to improve context-awareness of Transformer architecture focusing on the attention mechanism for various natural language processing tasks. The proposed methods utilize the additionally given context information, which is not limited to the modality of natural language, aside the given input information. First, I propose Hierarchical Memory Context Encoder (HMCE) which efficiently embeds the contextual information over preceding sentences via a hierarchical architecture of Transformer and fuses the embedded context representation into the input representation via memory attention mechanism. The proposed HMCE outperforms the original Transformer which does not leverage the additional context information on various context-aware machine translation tasks. It also shows the best performance evaluated in BLEU among the baselines using the additional context. Then, to improve the attention mechanism between context representation and input representation, I deeply analyze the representational similarity between the context representation and the input representation. Based on my analyses on representational similarity inside Transformer architecture, I propose a method for optimizing Centered Kernel Alignment (CKA) between internal representations of Transformer. The proposed CKA optimization method increases the performance of Transformer in various machine translation tasks and language modelling tasks. Lastly, I extend the CKA optimization method to Modality Alignment method for multi-modal scenarios where the context information takes the modality of visual information. My Modality Alignment method enhances the cross-modality attention mechanism by maximizing the representational similarity between visual representation and natural language representation, resulting in performance improvements larger than 3.5% accuracy on video question answering tasks.1 Introduction 1 2 Backgrounds 8 3 Context-aware Hierarchical Transformer Architecture 12 3.1 Related Works 15 3.1.1 Using Multiple Sentences for Context-awareness in Machine Translation 15 3.1.2 Structured Neural Machine Translation Models for Contextawareness 16 3.1.3 Evaluating Context-awareness with Generated Translation 16 3.2 Proposed Approach: Context-aware Hierarchical Text Encoder with Memory Networks 16 3.2.1 Context-aware NMT Encoders 17 3.2.2 Hierarchical Memory Context Encoder 21 3.3 Experiments 25 3.3.1 Data 26 3.3.2 Hyperparameters and Training Details 28 3.3.3 Overall BLEU Evaluation 28 3.3.4 Model Complexity Analysis 30 3.3.5 BLEU Evaluation on Helpful/Unhelpful Context 31 3.3.6 Qualitative Analysis 32 3.3.7 Limitations and Future Directions 34 3.4 Conclusion 35 4 Optimizing Representational Diversity of Transformer Architecture 36 4.1 Related Works 38 4.1.1 Analyses of Diversity in Multi-Head Attention 38 4.1.2 Similarities between Deep Neural Representations 39 4.2 Similarity Measures for Multi-Head Attention 40 4.2.1 Multi-Head Attention 40 4.2.2 Singular Vector Canonical Correlation Analysis (SVCCA) 41 4.2.3 Centered Kernel Alignment (CKA) 43 4.3 Proposed Approach: Controlling Inter-Head Diversity 43 4.3.1 HSIC Regularizer 44 4.3.2 Orthogonality Regularizer 44 4.3.3 Drophead 45 4.4 Inter-Head Similarity Analyses 46 4.4.1 Experimental Details for Similarity Analysis 46 4.4.2 Applying SVCCA and CKA 47 4.4.3 Analyses on Inter-Model Similarity 47 4.4.4 Does Multi-Head Strategy Diversify a Model's Representation Subspaces 49 4.5 Experiments on Controlling Inter-Head Similarity Methods 52 4.5.1 Experimental Details 52 4.5.2 Analysis on Controlling Inter-Head Diversity 54 4.5.3 Quantitative Evaluation 55 4.5.4 Limitations and Future Directions 57 4.6 Conclusions 60 5 Modality Alignment for Cross-modal Attention 61 5.1 Related Works 63 5.1.1 Representation Similarity between Modalities 63 5.1.2 Video Question Answering 64 5.2 Proposed Approach: Modality Align between Multi-modal Representations 65 5.2.1 Centered Kernel Alignment Review 65 5.2.2 Why CKA is Proper to Modality Alignment 66 5.2.3 Proposed Method 69 5.3 Experiments 71 5.3.1 Cosine Similarity Learning with CKA 72 5.3.2 Modality Align on Video Question Answering Task 75 5.4 Conclusion 82 6 Conclusion 83 Abstract (In Korean) 97๋ฐ•

    Performance Evaluation of Phase Optimized Spreading Codes in Non Linear DS-CDMA Receiver

    Get PDF
    Spread spectrum (SS) is a modulation technique in which the signal occupies a bandwidth much larger than the minimum necessary to send the information. A synchronized reception with the code at the receiver is used for despreading the information before data recovery. Bandspread is accomplished by means of a code which is independent of the data. Bandspreading code is pseudo-random, thus the spread signal resembles noise. The coded modulation characteristic of SS system uniquely qualifies it for navigation applications. Any signal used in ranging is subject to time/distance relations. A SS signal has advantage that its phase is easily resolvable. Direct-sequence (DS) form of modulation is mostly preferred over Frequency Hopping system (FH) as FH systems do not normally possess high resolution properties. Higher the chip rate, the better the measurement capability. The basic resolution is one code chip. Initially, some existing code families e.g. Gold, Kasami (large and smal..

    Combining Shape and Learning for Medical Image Analysis

    Get PDF
    Automatic methods with the ability to make accurate, fast and robust assessments of medical images are highly requested in medical research and clinical care. Excellent automatic algorithms are characterized by speed, allowing for scalability, and an accuracy comparable to an expert radiologist. They should produce morphologically and physiologically plausible results while generalizing well to unseen and rare anatomies. Still, there are few, if any, applications where today\u27s automatic methods succeed to meet these requirements.\ua0The focus of this thesis is two tasks essential for enabling automatic medical image assessment, medical image segmentation and medical image registration. Medical image registration, i.e. aligning two separate medical images, is used as an important sub-routine in many image analysis tools as well as in image fusion, disease progress tracking and population statistics. Medical image segmentation, i.e. delineating anatomically or physiologically meaningful boundaries, is used for both diagnostic and visualization purposes in a wide range of applications, e.g. in computer-aided diagnosis and surgery.The thesis comprises five papers addressing medical image registration and/or segmentation for a diverse set of applications and modalities, i.e. pericardium segmentation in cardiac CTA, brain region parcellation in MRI, multi-organ segmentation in CT, heart ventricle segmentation in cardiac ultrasound and tau PET registration. The five papers propose competitive registration and segmentation methods enabled by machine learning techniques, e.g. random decision forests and convolutional neural networks, as well as by shape modelling, e.g. multi-atlas segmentation and conditional random fields

    Combinatorial Generalisation in Machine Vision

    Get PDF
    The human capacity for generalisation, i.e. the fact that we are able to successfully perform a familiar task in novel contexts, is one of the hallmarks of our intelligent behaviour. But what mechanisms enable this capacity that is at the same time so impressive but comes so naturally to us? This is a question that has driven copious amounts of research in both Cognitive Science and Artificial Intelligence for almost a century, with some advocating the need for symbolic systems and others the benefits of distributed representations. In this thesis we will explore which principles help AI systems to generalise to novel combinations of previously observed elements (such as color and shape) in the context of machine vision. We will show that while approaches such as disentangled representation learning showed initial promise, they are fundamentally unable to solve this generalisation problem. In doing so we will illustrate the need to perform severe tests of models in order to properly assess their limitations. We will also see how such failures are robust across different datasets, training modalities and in the internal representations of the models. We then show that a different type of system that attempts to learn object-centric representations is capable of solving the generalisation challenges that previous models could not. We conclude by discussing the implications of these results for long-standing questions regarding the kinds of cognitive systems that are required to solve generalisation problems

    Deep Learning for Medication Recommendation: A Systematic Survey

    Get PDF
    ABSTRACTMaking medication prescriptions in response to the patient's diagnosis is a challenging task. The number of pharmaceutical companies, their inventory of medicines, and the recommended dosage confront a doctor with the well-known problem of information and cognitive overload. To assist a medical practitioner in making informed decisions regarding a medical prescription to a patient, researchers have exploited electronic health records (EHRs) in automatically recommending medication. In recent years, medication recommendation using EHRs has been a salient research direction, which has attracted researchers to apply various deep learning (DL) models to the EHRs of patients in recommending prescriptions. Yet, in the absence of a holistic survey article, it needs a lot of effort and time to study these publications in order to understand the current state of research and identify the best-performing models along with the trends and challenges. To fill this research gap, this survey reports on state-of-the-art DL-based medication recommendation methods. It reviews the classification of DL-based medication recommendation (MR) models, compares their performance, and the unavoidable issues they face. It reports on the most common datasets and metrics used in evaluating MR models. The findings of this study have implications for researchers interested in MR models

    Doubly Orthogonal Wavelet Packets for Multi-Users Indoor Visible Light Communication Systems

    Get PDF
    Visible Light Communication (VLC) is a data communication technology that modulates the intensity of the light to transmit the information mostly by means of Light Emitting Diodes (LEDs). The data rate is mainly throttled by the limited bandwidth of the LEDs. To combat, Multi-carrier Code Division Multiple Access (MC-CDMA) is a favorable technique for achieving higher data rates along with reduced Inter-Symbol Interference (ISI) and easy access to multi-users at the cost of slightly reduced compromised spectral efficiency and Multiple Access Interference (MAI). In this article, a multi-user VLC system is designed using a Discrete Wavelet Transform (DWT) that eradicates the use of cyclic prefix due to the good orthogonality and time-frequency localization properties of wavelets. Moreover, the design also comprises suitable signature codes, which are generated by employing double orthogonality depending upon Walsh codes and Wavelet Packets. The proposed multi-user system is simulated in MATLAB software and its overall performance is assessed using line-of-sight (LoS) and non-line-of-sight (NLoS) configurations. Furthermore, two sub-optimum multi-users detection schemes such as zero forcing (ZF) and minimum-mean-square-error (MMSE) are also used at the receiver. The simulated results illustrate that the doubly orthogonal signature waveform-based DWT-MC-CDMA with MMSE detection scheme outperforms the Walsh code-based multi-user system

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above
    corecore