2,702 research outputs found

    CASTNet: Community-Attentive Spatio-Temporal Networks for Opioid Overdose Forecasting

    Full text link
    Opioid overdose is a growing public health crisis in the United States. This crisis, recognized as "opioid epidemic," has widespread societal consequences including the degradation of health, and the increase in crime rates and family problems. To improve the overdose surveillance and to identify the areas in need of prevention effort, in this work, we focus on forecasting opioid overdose using real-time crime dynamics. Previous work identified various types of links between opioid use and criminal activities, such as financial motives and common causes. Motivated by these observations, we propose a novel spatio-temporal predictive model for opioid overdose forecasting by leveraging the spatio-temporal patterns of crime incidents. Our proposed model incorporates multi-head attentional networks to learn different representation subspaces of features. Such deep learning architecture, called "community-attentive" networks, allows the prediction of a given location to be optimized by a mixture of groups (i.e., communities) of regions. In addition, our proposed model allows for interpreting what features, from what communities, have more contributions to predicting local incidents as well as how these communities are captured through forecasting. Our results on two real-world overdose datasets indicate that our model achieves superior forecasting performance and provides meaningful interpretations in terms of spatio-temporal relationships between the dynamics of crime and that of opioid overdose.Comment: Accepted as conference paper at ECML-PKDD 201

    ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ฌธ๋งฅ ์ •๋ณด ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์–ดํ…์…˜์„ ํ™œ์šฉํ•˜๋Š” ๊ณ„์ธต์  ๋ฌธ๋งฅ ์ธ์ฝ”๋”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.์ตœ๊ทผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ์œ„ํ•œ ํ‘œ์ค€ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์—์„œ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋กœ ๋ฐœ์ „ํ–ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋Š” ํ† ํฐ ๊ฐ„์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐ ๊ฐ•์ ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์ถ”์ถœํ•œ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์ ์ ˆํ•œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•˜๋Š” attention layer๋“ค๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐœ์ „์€ ์ตœ๊ทผ ๋”ฅ ๋Ÿฌ๋‹ ์‚ฌํšŒ์— ์ฃผ์–ด์ง„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ฐ–์˜ ์ถ”๊ฐ€ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋„์ „์„ ์ œ์‹œํ–ˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž‘์—…์—์„œ ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์™ธ์— ์ถ”๊ฐ€์ ์ธ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๊ณผ ๋ถ„์„์„ attention layer์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ์ด์ „ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๋‚ด์žฅํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๋‚ด์žฅ๋œ ๋ฌธ๋งฅ ํ‘œํ˜„์„ ์ž…๋ ฅ ํ‘œํ˜„์— ์œตํ•ฉํ•˜๋Š” ๊ณ„์ธต์  ๋ฉ”๋ชจ๋ฆฌ ์ปจํ…์ŠคํŠธ ์ธ์ฝ”๋”(HMCE)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ HMCE๋Š” ๋‹ค์–‘ํ•œ ๋ฌธ๋งฅ ์ธ์ง€ ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ์ž‘์—…์—์„œ ์ถ”๊ฐ€ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๋Š” ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋น„๊ตํ•˜์˜€์„ ๋•Œ ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฌธ๋งฅ ํ‘œํ˜„๊ณผ ์ž…๋ ฅ ํ‘œํ˜„ ์‚ฌ์ด์˜ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ๋งฅ ํ‘œํ˜„๊ณผ ์ž…๋ ฅ ํ‘œํ˜„ ์‚ฌ์ด์˜ ํ‘œํ˜„ ์œ ์‚ฌ์„ฑ์„ Centered Kernel Alignment(CKA)๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹ฌ์ธต ๋ถ„์„ํ•˜๋ฉฐ, CKA๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ์‹œ๊ฐ ์–‘์‹์œผ๋กœ ์ฃผ์–ด์ง€๋Š” ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•ด CKA ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ •๋ ฌ ๋ฐฉ๋ฒ•์œผ๋กœ ํ™•์žฅํ•œ๋‹ค. ์ด Modality Alignment ๋ฐฉ๋ฒ•์€ ๋ฉ€ํ‹ฐ ๋ชจ๋‹ฌ๊ฐ„ ํ‘œํ˜„ ์œ ์‚ฌ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜์—ฌ ๋น„๋””์˜ค ์งˆ๋ฌธ ์‘๋‹ต ์ž‘์—…์—์„œ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜จ๋‹ค.Recently, the standard architecture for Natural Language Processing (NLP) has evolved from Recurrent Neural Network to Transformer architecture. Transformer architecture consists of attention layers which show its strength at finding the correlation between tokens and incorporate the correlation information to generate proper output. While many researches leveraging Transformer architecture report the new state-of-the-arts performances on various NLP tasks, These recent improvements propose a new challenge to deep learning society: exploiting additional context information. Because human intelligence perceives signals in everyday life with much rich contextual information (e.g. additional memory, visual information, and common sense), exploiting the context information is a step forward to the ultimate goal for Artificial Intelligence. In this dissertation, I propose novel methodologies and analyses to improve context-awareness of Transformer architecture focusing on the attention mechanism for various natural language processing tasks. The proposed methods utilize the additionally given context information, which is not limited to the modality of natural language, aside the given input information. First, I propose Hierarchical Memory Context Encoder (HMCE) which efficiently embeds the contextual information over preceding sentences via a hierarchical architecture of Transformer and fuses the embedded context representation into the input representation via memory attention mechanism. The proposed HMCE outperforms the original Transformer which does not leverage the additional context information on various context-aware machine translation tasks. It also shows the best performance evaluated in BLEU among the baselines using the additional context. Then, to improve the attention mechanism between context representation and input representation, I deeply analyze the representational similarity between the context representation and the input representation. Based on my analyses on representational similarity inside Transformer architecture, I propose a method for optimizing Centered Kernel Alignment (CKA) between internal representations of Transformer. The proposed CKA optimization method increases the performance of Transformer in various machine translation tasks and language modelling tasks. Lastly, I extend the CKA optimization method to Modality Alignment method for multi-modal scenarios where the context information takes the modality of visual information. My Modality Alignment method enhances the cross-modality attention mechanism by maximizing the representational similarity between visual representation and natural language representation, resulting in performance improvements larger than 3.5% accuracy on video question answering tasks.1 Introduction 1 2 Backgrounds 8 3 Context-aware Hierarchical Transformer Architecture 12 3.1 Related Works 15 3.1.1 Using Multiple Sentences for Context-awareness in Machine Translation 15 3.1.2 Structured Neural Machine Translation Models for Contextawareness 16 3.1.3 Evaluating Context-awareness with Generated Translation 16 3.2 Proposed Approach: Context-aware Hierarchical Text Encoder with Memory Networks 16 3.2.1 Context-aware NMT Encoders 17 3.2.2 Hierarchical Memory Context Encoder 21 3.3 Experiments 25 3.3.1 Data 26 3.3.2 Hyperparameters and Training Details 28 3.3.3 Overall BLEU Evaluation 28 3.3.4 Model Complexity Analysis 30 3.3.5 BLEU Evaluation on Helpful/Unhelpful Context 31 3.3.6 Qualitative Analysis 32 3.3.7 Limitations and Future Directions 34 3.4 Conclusion 35 4 Optimizing Representational Diversity of Transformer Architecture 36 4.1 Related Works 38 4.1.1 Analyses of Diversity in Multi-Head Attention 38 4.1.2 Similarities between Deep Neural Representations 39 4.2 Similarity Measures for Multi-Head Attention 40 4.2.1 Multi-Head Attention 40 4.2.2 Singular Vector Canonical Correlation Analysis (SVCCA) 41 4.2.3 Centered Kernel Alignment (CKA) 43 4.3 Proposed Approach: Controlling Inter-Head Diversity 43 4.3.1 HSIC Regularizer 44 4.3.2 Orthogonality Regularizer 44 4.3.3 Drophead 45 4.4 Inter-Head Similarity Analyses 46 4.4.1 Experimental Details for Similarity Analysis 46 4.4.2 Applying SVCCA and CKA 47 4.4.3 Analyses on Inter-Model Similarity 47 4.4.4 Does Multi-Head Strategy Diversify a Model's Representation Subspaces 49 4.5 Experiments on Controlling Inter-Head Similarity Methods 52 4.5.1 Experimental Details 52 4.5.2 Analysis on Controlling Inter-Head Diversity 54 4.5.3 Quantitative Evaluation 55 4.5.4 Limitations and Future Directions 57 4.6 Conclusions 60 5 Modality Alignment for Cross-modal Attention 61 5.1 Related Works 63 5.1.1 Representation Similarity between Modalities 63 5.1.2 Video Question Answering 64 5.2 Proposed Approach: Modality Align between Multi-modal Representations 65 5.2.1 Centered Kernel Alignment Review 65 5.2.2 Why CKA is Proper to Modality Alignment 66 5.2.3 Proposed Method 69 5.3 Experiments 71 5.3.1 Cosine Similarity Learning with CKA 72 5.3.2 Modality Align on Video Question Answering Task 75 5.4 Conclusion 82 6 Conclusion 83 Abstract (In Korean) 97๋ฐ•

    Latent Multi-task Architecture Learning

    Full text link
    Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)--(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.Comment: To appear in Proceedings of AAAI 201

    Deep Divergence-Based Approach to Clustering

    Get PDF
    A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open question. Our contribution to this emerging field is a new deep clustering network that leverages the discriminative power of information-theoretic divergence measures, which have been shown to be effective in traditional clustering. We propose a novel loss function that incorporates geometric regularization constraints, thus avoiding degenerate structures of the resulting clustering partition. Experiments on synthetic benchmarks and real datasets show that the proposed network achieves competitive performance with respect to other state-of-the-art methods, scales well to large datasets, and does not require pre-training steps

    Robust Representation Learning for Unified Online Top-K Recommendation

    Full text link
    In large-scale industrial e-commerce, the efficiency of an online recommendation system is crucial in delivering highly relevant item/content advertising that caters to diverse business scenarios. However, most existing studies focus solely on item advertising, neglecting the significance of content advertising. This oversight results in inconsistencies within the multi-entity structure and unfair retrieval. Furthermore, the challenge of retrieving top-k advertisements from multi-entity advertisements across different domains adds to the complexity. Recent research proves that user-entity behaviors within different domains exhibit characteristics of differentiation and homogeneity. Therefore, the multi-domain matching models typically rely on the hybrid-experts framework with domain-invariant and domain-specific representations. Unfortunately, most approaches primarily focus on optimizing the combination mode of different experts, failing to address the inherent difficulty in optimizing the expert modules themselves. The existence of redundant information across different domains introduces interference and competition among experts, while the distinct learning objectives of each domain lead to varying optimization challenges among experts. To tackle these issues, we propose robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The robust representation learning employs domain adversarial learning and multi-view wasserstein distribution learning to learn robust representations. Moreover, the proposed method balances conflicting objectives through the homoscedastic uncertainty weights and orthogonality constraints. Various experiments validate the effectiveness and rationality of our proposed method, which has been successfully deployed online to serve real business scenarios.Comment: 14 pages, 6 figures, submitted to ICD
    • โ€ฆ
    corecore