Search CORE

241 research outputs found

SAIN: Self-Attentive Integration Network for Recommendation

Author: Kang Jaewoo
Kim Raehyun
Ko Miyoung
Yun Seoungjun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/11/2019
Field of study

With the growing importance of personalized recommendation, numerous recommendation models have been proposed recently. Among them, Matrix Factorization (MF) based models are the most widely used in the recommendation field due to their high performance. However, MF based models suffer from cold start problems where user-item interactions are sparse. To deal with this problem, content based recommendation models which use the auxiliary attributes of users and items have been proposed. Since these models use auxiliary attributes, they are effective in cold start settings. However, most of the proposed models are either unable to capture complex feature interactions or not properly designed to combine user-item feedback information with content information. In this paper, we propose Self-Attentive Integration Network (SAIN) which is a model that effectively combines user-item feedback information and auxiliary information for recommendation task. In SAIN, a self-attention mechanism is used in the feature-level interaction layer to effectively consider interactions between multiple features, while the information integration layer adaptively combines content and feedback information. The experimental results on two public datasets show that our model outperforms the state-of-the-art models by 2.13%Comment: SIGIR 201

arXiv.org e-Print Archive

Crossref

Korean Vocational Secondary School Students’ Metacognition and Lifelong Learning

Author: Jaewoo Choi
Woonsun Kang
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date: 21/02/2014
Field of study

AbstractThe aim of this research is to analyze Korean Vocational Secondary School Students’ metacognition, attitude toward lifelong learning, and motivation factor to lifelong learning, and investigate whether these had an effect on their lifelong learning. This research analyzed after-school deeply as one of lifelong learning activities. I conduct frequency analysis, latent class analysis, and multiple regression analysis as methodology. The results were following: a) 75% of respondents have ever experienced the after-school learning; b) only 23% of the surveyed considered after-school learning is needed; c) the critical obstacles to after- school learning were deficiency of time(32.5%) and finance obstacle(28.2%); d) four underlying types of motivation to after-school learning were identified, namely, Class І(job Search), Class ІІ(leisure centered job skill), Class ІІІ(civic competency), and Class ІV(lack of motivation); e) Based on multiple regression analysis, as a predictor of effect on lifelong learning, variables including experience of after -school learning were significant

Elsevier - Publisher Connector

Look at the First Sentence: Position Bias in Question Answering

Author: Kang Jaewoo
Kim Gangwoo
Kim Hyunjae
Ko Miyoung
Lee Jinhyuk
Publication venue
Publication date: 01/01/2020
Field of study

Many extractive question answering models are trained to predict start and end positions of answers. The choice of predicting answers as positions is mainly due to its simplicity and effectiveness. In this study, we hypothesize that when the distribution of the answer positions is highly skewed in the training set (e.g., answers lie only in the k-th sentence of each passage), QA models predicting answers as positions can learn spurious positional cues and fail to give answers in different positions. We first illustrate this position bias in popular extractive QA models such as BiDAF and BERT and thoroughly examine how position bias propagates through each layer of BERT. To safely deliver position information without position bias, we train models with various de-biasing methods including entropy regularization and bias ensembling. Among them, we found that using the prior distribution of answer positions as a bias model is very effective at reducing position bias, recovering the performance of BERT from 37.48% to 81.64% when trained on a biased SQuAD dataset.Comment: 13 pages, EMNLP 202

arXiv.org e-Print Archive

Crossref

Robust Likelihood-Based Survival Modeling with Microarray Data

Author: Ami Yu
HyungJun Cho
Jaewoo Kang
Seung-Mo Hong
Sukwoo Kim
Publication venue
Publication date
Field of study

Gene expression data can be associated with various clinical outcomes. In particular, these data can be of importance in discovering survival-associated genes for medical applications. As alternatives to traditional statistical methods, sophisticated methods and software programs have been developed to overcome the high-dimensional difficulty of microarray data. Nevertheless, new algorithms and software programs are needed to include practical functions such as the discovery of multiple sets of survival-associated genes and the incorporation of risk factors, and to use in the R environment which many statisticians are familiar with. For survival modeling with microarray data, we have developed a software program (called rbsurv) which can be used conveniently and interactively in the R environment. This program selects survival-associated genes based on the partial likelihood of the Cox model and separates training and validation sets of samples for robustness. It can discover multiple sets of genes by iterative forward selection rather than one large set of genes. It can also allow adjustment for risk factors in microarray survival modeling. This software package, the rbsurv package, can be used to discover survival-associated genes with microarray data conveniently.

Research Papers in Economics

Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Author: Kang Jaewoo
Kim Hyunjae
Yoo Jaehyo
Yoon Seunghyun
Publication venue
Publication date: 01/06/2023
Field of study

Most weakly supervised named entity recognition (NER) models rely on domain-specific dictionaries provided by experts. This approach is infeasible in many domains where dictionaries do not exist. While a phrase retrieval model was used to construct pseudo-dictionaries with entities retrieved from Wikipedia automatically in a recent study, these dictionaries often have limited coverage because the retriever is likely to retrieve popular entities rather than rare ones. In this study, we present a novel framework, HighGEN, that generates NER datasets with high-coverage pseudo-dictionaries. Specifically, we create entity-rich dictionaries with a novel search method, called phrase embedding search, which encourages the retriever to search a space densely populated with various entities. In addition, we use a new verification process based on the embedding distance between candidate entity mentions and entity types to reduce the false-positive noise in weak labels generated by high-coverage dictionaries. We demonstrate that HighGEN outperforms the previous best model by an average F1 score of 4.7 across five NER benchmark datasets.Comment: ACL 202

arXiv.org e-Print Archive